This course has been thoroughly revised and updated since Spring 2025.
Data analysis has shifted from manual downloads and Excel to code-first
workflows. Analysts pull large datasets via APIs and scrapers, work with
NumPy and Pandas at scale, and integrate ML and LLM into end-to-end
pipelines. This course teaches how to do AI-driven financial analytics
using Python, including implementing statistical and ML models, calling
LLM APIs for text understanding and code generation, building simple RAG
workflows over 10-Ks and earnings calls, and building AI agents.
MBA ACCT-GB-3328 specializations
Accounting
Business Analytics
Financial Systems and Analytics
Undergrad ACCT-GB-6028 concentrations
Accounting
New: Computing and Data Science
Takeaways
Use Python and AI agents to access and structure financial data
Use Python APIs to access market and fundamental data (e.g., equities,
factors, fundamentals)
Use scrapers and eXtensible Business Reporting Language (XBRL) parsers
to pull data from SEC filings
Use LLMs as structured parsers to extract tables,
footnotes, and key accounting policies from noisy HTML/PDF text.
Represent financial data in "analysis-ready" Pandas DataFrames and save
in multiple formats (CSV, Excel, Parquet, JSON).
Use Python and AI agents for financial statement analysis
Simulate portfolio returns and plot the efficient frontier.
Build and test CAPM-style and multi-factor models in Python.
Use AI agents that:
design experiments (which portfolios to simulate),
call Python functions to compute risk/return metrics, and
interpret results in plain English while highlighting limitations.
Required Prerequisites
All ACCT courses have the core courses in Financial Accounting as a
prerequisite.
Master's students
COR1-GB 1306: Financial Accounting and Reporting
COR1-GB2206: Accounting (Tech & Luxury)
ACCT-GB-2103: Financial Statement Analysis (MS in Accounting)
Undergraduate students
ACCT-UB.0001: Principles of Financial Accounting
Recommended Background
The course assumes you have taken a statistics course in your graduate or
undergraduate program. It will also be extremely helpful if you have taken
a half-semester Python course.
Materials
I write and distribute my materials. Therefore, no textbook is required,
and you need not purchase anything.
Step-by-step Jupyter notebooks for each topic.
Example scripts for calling finance APIs and LLM APIs.
Template code for RAG pipelines and simple agent frameworks.
LLMs
The course will use VSCode as an IDE with GitHub Copilot as an AI
assistant. Please figure out how to sign up for it and do so before the
class begins. You also have free access to NYU Gemini Pro if you log in to
Gemini using your NYU (not Stern) account. We will also use it on a
standalone basis. Last I checked, the Gemini academic version does not
work with VSCode directly. I personally also subscribe to the paid
versions of ChatGPT and Anthropic.
Attendance and penalty for missing classes
Requiring attendance is necessary for several reasons. First, you
incorrectly assume you can catch up on a missed class by watching a
recording (if available). Videos do not engage your brain as much as a
live class. Second, less than 20% of you watch the recording (if
available). You are then lost in class, which provides the wrong signals
to me as an instructor. Third, your absence hurts class discussions.
Fourth, you miss out on feedback if you do not work through the questions
I pose in class. Fifth, I lose the feedback since there are fewer
questions.
The policy below will be in effect only after the add/drop
period.
Without mandatory attendance, attendance is often below 50%. Therefore,
though I dislike doing this, I penalize absences. If you anticipate being
absent for good reasons, please email me well in advance. Please enter
"Excused" on the attendance sheet described below to avoid the
penalty if I approve. If you miss a class due to emergencies and cannot
tell me in advance, do not panic. Take care of the emergency first, and
then email me. I will permit you to change the "Absent" to
"Excused." But if you miss a class without a valid reason, there
is a penalty, as stated below.
For sections meeting in 150-190 minute sessions, you will lose one
grade (A to A-, A- to B+, B+ to B, B to B-, and so on) for EVERY missed
session unless you were explicitly excused via email. Thus, if you miss
two class sessions, you will lose two grades, and so on.
For sections meeting in 75-80 minute sessions, you will lose one grade
(A to A-, A- to B+, B+ to B, B to B-, and so on) for EVERY TWO missed
sessions unless you were explicitly excused via email. Thus, if you miss
four class sessions, you will lose two grades, and so on.
Please sit in the same seat in every class and display your name tags. For
Zoom classes, you must keep your video on AT ALL TIMES. You must also have
a good working headset or mic, as it is extremely rude to be inaudible and
force me to ask you to repeat yourself. After entering the class, please
mark yourself present in the first 20 minutes on the OneDrive sheet (link
posted on Brightspace).
You will be marked absent if you are more than 20 minutes late unless it
is because of factors beyond your control (traffic, subway, or
interviews running late). You will also be marked absent if you leave
the class early unless you have my permission or get it afterward. You
will get an F in the course if you are caught cheating on the attendance
sheet.
Exams and Grading
There are no in-class quizzes, midterms, or final exams.
Please read about the penalty for missing classes above.
Assignments: 50%
Final project: 50%
System Requirements
You need to be in the following systems before the start of the first
class:
Albert
NYU Brightspace
If you are a non-Stern student, Stern automatically creates a
Stern account for you when registering for a Stern course. All
class emails are sent to your Stern email, not NYU email. Please
forward your Stern email to your NYU email.
Only registered students can attend. I cannot override this NYU rule.
Organization of financial data: Row versus column orientation
The typical format of financial data: Accounts or financial statement
items are row headings, while column headings are dates
Optimal organization of Pandas data frames: Why we transpose financial
statement data so that accounts or financial statement items are in
columns and dates are in rows.
How LLMs “see” text (tokens, context windows) and what that
implies for how we chunk and store financial text.
Python skills
Numpy versus Pandas
Named rows and columns
Inhomogeneous data and missing data
Input and output
Merging and grouping
Speed and memory
Pandas essentials
Series versus data frames
Rows, Columns, Size, Size in memory
Head, tail, and random sampling
Understand data types and simple operators
Numbers, strings, and dates
Broadcast operators
Simple vectorized operations
Manipulate rows and columns in Pandas
Select rows and columns via slices: brackets, loc, and iloc
Add and delete rows and columns
AI skills
Designing DataFrames that are convenient input/output formats for ML
models and LLM calls (e.g., using dict / JSON columns that can be sent
to an API).
Topic 2: Access external financial data and save it in files (APIs + AI
extraction)
Analytical concepts
Understanding XBRL
Structured data and XBRL: taxonomy, current reporting landscape, limits
of XBRL.
Understanding how to access financial data using Python APIs. Handling
authentication, rate limits, and pagination.
External parsers for XBRL; transforming them into clean tables.
File formats
Reading and writing Excel files, formatting the output of Excel files
CSV, JSON, Parquet; reading/writing efficiently.
Handling dates
Parsing dates in Python, Numpy, and Pandas
Data structures
Dictionaries and JSON
AI skills
Using LLMs to extract and normalize data that XBRL does not capture
cleanly (e.g., non-GAAP metrics, segment disclosures).
Representing extraction results as JSON and turning them into
DataFrames.
Topic 3: ROIC and free cash flow drivers: Size, growth, margins, and NOA
turnover
Analytical concepts
Sales growth
Sequential growth
Year-over-year growth
Compounded annual growth rate
ROIC drivers
Operating margin after tax and its components: Various expense ratios
Net operating assets intensity and balance sheet subtotals such as
current and non-current operating assets and liabilities, operating
working capital, fixed capital, total capital, and invested capital
Computing ROIC as net operating profit after tax divided by invested
capital
Unlevered free cash flows
Computing unlevered free cash flows
Understanding how ROIC and growth affect unlevered free cash flows
Python skills
Loops versus vectorized and broadcast operations
Simple row and column operations
Python loops versus vectorized and broadcast operations in Pandas
Why you should avoid writing loops in Pandas
How to avoid loops using lead-lag differences
AI skills
Preparing feature matrices (X) and targets (y) derived from ROIC and FCF
drivers for later supervised learning (e.g., predicting future ROIC or
FCF growth).
Topic 4: Plotting ROIC, FCF drivers, and AI-assisted visualization
Analytical concepts
Cognitive factors
What are the design principles for displaying quantitative information?
We will use the guidelines in Edward Tufte's book “Visual Display
of Quantitative Information.”
Peer company analysis
Comparing and plotting sales, sales growth, expense ratios, and net
operating asset ratios for a selected company and its peers
Python skills
Types of charts
Lines, bars, scatter charts, histograms, area charts
Dual axis charts
Pandas plotting
Concise Pandas plotting commands
State-machine approach
Full power of Matplotlib plotting
Object-oriented approach for Matplotlib plots
Customizing charts
AI skills
Using LLMs to:
auto-draft chart titles and captions based on the underlying data,
and
flag potential anomalies or data quality issues in plots.
Topic 5: Discount rates, time value of money, loans, and bonds
Analytical concepts
Time value functions
Compute present value and future value
Infer internal rate of return
Compute installment payments
Simple financial instruments
Bonds
Loan amortization tables
Python skills
Numpy
Limitations of numpy_financial
Bonds
Loan amortization tables
Date manipulation in Python
Bonds
Loan amortization tables
XLSXWriter
Bonds
Loan amortization tables
scipy.optimize
Bonds
Loan amortization tables
AI skills
Using an LLM to generate:
narrative explanations of an amortization schedule (e.g.,
“explain to a client why interest expense falls over
time”), and
comparisons of fixed-rate vs floating-rate loans based on your
generated tables.
Topic 6: How business risk raises discount rate
Analytical concepts
Operating leverage and business risk
Business cycles and sales variability
Operating leverage and earnings variability
Opex versus capex commodities
Identifying time series patterns
Seasonality
Cyclicality
Identifying discrete events
Restructurings
Acquisitions and dispositions
Python skills
Matplotlib
Visualizing trends and outliers
Statsmodels
Using statistical functions in Statsmodels
AI/ML skills
Introductory machine learning with Sci-Kit Learn
Comparing classical regressions vs. ML models and interpreting
coefficients vs. feature importances.
Topic 7: Business risk drivers: Cyclicality and seasonality
Analytical concepts
Statistical techniques
A simple and brief introduction to time series analysis of financial
statement data
Python skills
Introduction to statistical packages for time series analysis
Time-series packages:
Pandas resampling, rolling averages.
statsmodels for decomposition, ARIMA/SARIMA.
Quick use of prophet or similar for forecasting.
AI/ML skills
Overview of:
PyCaret for automated time-series model comparison,
TensorFlow/keras for simple deep learning models (conceptual;
possibly in a notebook demo).
Using an LLM to:
interpret model diagnostics and forecast plots,
help specify candidate models (“which lags and seasonalities
should I try?”).
Topic 8: Liquidity, leverage, and ROE
Analytical concepts
Liquidity
Financial assets to sales
Leverage
Debt/EBITA, Debt/EBIT
Debt/Equity
Return on equity
Net income/Equity
Sharpe ratio
Python skills
Advanced plotting with Matplotlib and Plotly
Visualizing the higher volatility of ROE vis-a-vis ROIC due to leverage
Making interactive plots with Plotly
AI skills
Building a small interactive “agent” in a Jupyter notebook
that:
lets the user specify a firm and timeframe,
fetches the data via Python,
computes liquidity, leverage, and ROE metrics, and
uses an LLM to produce a short risk commentary.
Topic 9: Three-statement model of growth and ROIC
Analytical concepts
Three-statement financial model
Income statement inputs: Size, growth, and margins
Balance sheet operating inputs: Net operating asset intensity
Operating working capital intensity
Fixed capital intensity
Balance sheet financial inputs: Liquidity and leverage
Business risk, unlevered and levered cost of capital
Monte Carlo simulations
Monte Carlo simulations to plot outcomes for a large number of scenarios
Demonstrating the advantage of Python over Excel
Python skills
Comparing Numpy, Pandas data series, and Pandas data frames
Implementing financial statement models using Numpy, Pandas data series,
and Pandas data frames
Challenges of developing iterative models in Python
Ease of developing iterative models in Excel
Difficulty in developing iterative models in Python