econometrics-python
Academic-style econometric analysis using pyfixest for fixed effects regression and marginaleffects for causal inference interpretation. Use for panel data, difference-in-differences, IV regression, event studies, average marginal effects, treatment effect estimation, and publication-ready tables. Supports clustered standard errors, heterogeneous treatment effects, and model interpretation with predictions and contrasts.
Econometrics with pyfixest and marginaleffects
Academic-grade econometric analysis combining fixed effects estimation with causal inference interpretation.
Important Note
marginaleffects Python package: As of January 2025, the official Python port is under active development. Check https://github.com/vincentarelbundock/pymarginaleffects for current status. If not yet available via pip:
- Use pyfixest's built-in interpretation tools (
.predict(),.tidy(), etc.) - Or install from GitHub:
pip install git+https://github.com/vincentarelbundock/pymarginaleffects.git - Or use the custom utilities in
scripts/marginaleffects_utils.py
๐ How to Use This Skill
This skill guides you through econometric analysis in six phases:
- Phase A: Research Design - Problem formulation, identification strategy, specification planning
- Phase B: Exploratory Analysis - Data patterns, diagnostics, balance tests, parallel trends
- Phase C: Model Estimation - Baseline models, main specification, robustness suite
- Phase D: Effect Interpretation - Treatment effects, marginal effects, counterfactuals
- Phase E: Robustness & Validation - Sensitivity analysis, placebo tests, specification curves
- Phase F: Publication Outputs - LaTeX tables, figures, narrative reports
Usage Patterns:
- Complete Workflow: Follow phases AโF sequentially for new projects
- Quick Analysis: Jump to Phase C if design is clear
- Refinement: Return to earlier phases based on diagnostic findings
- Output Only: Use Phase F for final formatting of completed analysis
Prerequisites
Required packages:
Python version: 3.8+
Bundled Resources
References (references/)
Load these when implementing econometric analyses or needing detailed guidance:
- marginaleffects-guide.md - Average marginal effects and predictions with pyfixest
- staggered-did.md - Staggered difference-in-differences implementation
- rdd.md - Regression discontinuity design patterns
- troubleshooting.md - Common issues and solutions (iplot() fixes, SE issues, data prep)
Assets (assets/)
Production-ready analysis templates and examples:
- minimum_wage_analysis.py - Complete DiD analysis example with real data
Scripts (scripts/)
Executable utilities for common econometric tasks:
- did_pipeline.py - Complete DiD workflow with diagnostics
- marginaleffects_utils.py - Helper functions for marginal effects computation
- event_study_utils.py - Event study plotting and pre-trend testing (fixes iplot() save issues)
๐ PHASE A: RESEARCH DESIGN
When to use Phase A:
- Starting a new econometric analysis
- User describes research question without clear specification
- Need to map data structure to appropriate econometric method
Phase A Workflow
-
Problem Classification
- Identify research question type (causal vs predictive vs descriptive)
- Detect data structure (cross-section, panel, time-series)
- Determine appropriate econometric method
-
Identification Strategy
- Define treatment/exposure variable
- Specify control variables
- Identify potential confounders
- Assess endogeneity concerns
-
Specification Planning
- Choose fixed effects structure
- Determine clustering level
- Plan robustness checks
Method Decision Tree
Is treatment randomly assigned?
โโ Yes โ OLS with controls
โโ No โ Check panel structure
โโ Cross-section โ
โ โโ Discontinuity? โ RDD
โ โโ Instrument? โ IV
โ โโ Selection model
โโ Panel data โ
โโ Treatment varies by time? โ DiD
โโ Staggered adoption? โ Callaway-Sant'Anna
โโ Time-invariant treatment? โ Entity FE
Design Specification Template
Output: Create research_design.md with:
Critical Design Decisions
Fixed Effects:
- Entity FE: Controls for time-invariant unit characteristics
- Time FE: Controls for common time trends
- EntityรTime FE: Rarely used (absorbs too much variation)
Clustering:
- Cluster at the level where treatment varies
- DiD with state-level treatment โ cluster by state
- Small clusters (< 30) โ consider wild bootstrap
When to read detailed references:
- Staggered DiD โ read
references/staggered-did.md - RDD โ read
references/rdd.md - Marginal effects โ read
references/marginaleffects-guide.md
๐ PHASE B: EXPLORATORY ANALYSIS
When to use Phase B:
- After Phase A (design complete)
- Before estimation (Phase C)
- To verify assumptions (parallel trends, balance, common support)
Phase B Workflow
-
Descriptive Statistics
- Summary by treatment group
- Balance tests
- Sample construction documentation
-
Treatment Analysis
- Treatment distribution over time/units
- Staggered adoption patterns
- Treatment intensity
-
Outcome Analysis
- Trends by treatment group
- Visual parallel trends assessment
- Outcome distribution diagnostics
-
Diagnostic Tests
- Covariate balance
- Common support
- Missing data patterns
EDA Code Template
Output: Create eda_analysis.py
Critical EDA Checks
โ Must verify before estimation:
- Parallel trends look reasonable (visual inspection)
- No major covariate imbalances (|std diff| < 0.25)
- Sufficient pre-treatment periods (โฅ 3 for trends)
- No perfect prediction of treatment
- Reasonable sample sizes in both groups
๐จ Red flags:
- Pre-trend test p-value < 0.10
- Large imbalances in key covariates
- Treatment assignment predicted by lagged outcome
- Very few treated units (< 10)
๐ฌ PHASE C: MODEL ESTIMATION
When to use Phase C:
- After Phase B (EDA complete, diagnostics passed)
- To estimate treatment effects and test specifications
Phase C Workflow
-
Baseline Specifications (Always run)
- Model 1: No fixed effects
- Model 2: Entity FE only
- Model 3: Entity + Time FE
- Model 4: Full specification + controls
-
Primary Specification
- Main model with optimal FE structure
- Appropriate standard errors
- Treatment effects
-
Robustness Checks (Automated suite)
- Alternative clustering
- Different time windows
- Subgroup analysis
- Functional form alternatives
-
Specification Tests
- Pre-trend tests (DiD)
- Overidentification (IV)
- Weak instruments (IV)
- Placebo tests
Quick Start: Basic DiD
Event Study Pattern
โ ๏ธ IMPORTANT: pyfixest's iplot() may not save figures correctly (blank saved files). Use the custom utility instead:
Why use custom plotting? pyfixest's iplot() creates interactive plots but doesn't return a saveable figure object. The plot_event_study() function extracts coefficients and creates publication-quality matplotlib figures that save correctly.
IV Regression Pattern
Estimation Code Template
Output: Create estimation.py
Critical Best Practices
Data Preparation:
Standard Errors:
Model Specification:
๐ก PHASE D: EFFECT INTERPRETATION
When to use Phase D:
- After Phase C (models estimated)
- To compute treatment effects and marginal effects
- To create interpretation-focused visualizations
Phase D Workflow
-
Average Treatment Effects
- ATE using marginaleffects
- ATT (average treatment on treated)
- Heterogeneous effects by subgroups
-
Marginal Effects (for continuous treatments)
- Average marginal effects (AME)
- Marginal effects at means (MEM)
- Effects at specific values
-
Predictions
- Counterfactual predictions
- Predicted outcomes by treatment ร covariates
-
Visualization
- Treatment effect plots
- Heterogeneous effect plots
- Prediction plots
Quick Start: Average Treatment Effect
Heterogeneous Effects Pattern
Interpretation Code Template
Output: Create interpretation.py
Interpretation Best Practices
Always report:
- Point estimate with standard error
- 95% confidence interval
- Percentage change from baseline
- Effect size interpretation (Cohen's d)
Visualize:
- Use
plot_predictions()not just coefficients - Show confidence intervals
- Compare treated vs control predictions
โ PHASE E: ROBUSTNESS & VALIDATION
When to use Phase E:
- After Phase D (main results interpreted)
- To test sensitivity of findings
- To address referee concerns
Phase E Workflow
-
Specification Sensitivity
- Add/remove controls systematically
- Different functional forms
- Alternative outcome measures
-
Sample Sensitivity
- Trimming outliers
- Different time windows
- Balanced vs unbalanced panel
-
Standard Error Sensitivity
- Different clustering schemes
- Wild bootstrap (few clusters)
- Spatial standard errors
-
Placebo Tests
- Pre-treatment pseudo-effects
- Randomization inference
- Falsification tests
Multiple Hypothesis Testing Pattern
Robustness Code Template
Output: Create robustness.py
Robustness Decision Rules
When estimates are NOT robust:
- Coefficient changes sign across specifications โ ๐จ Major concern
- Significance disappears with controls โ Likely confounding
- Very sensitive to outliers โ Check data quality
- Placebo test is significant โ Parallel trends violated
๐ PHASE F: PUBLICATION OUTPUTS
When to use Phase F:
- Final stage after all analysis complete
- To generate publication-ready tables and figures
Phase F Workflow
-
LaTeX Tables
- Table 1: Descriptive statistics
- Table 2: Main regression results
- Table 3: Robustness checks
-
Figures
- Event study plots (high-res, 300 DPI)
- Treatment effect plots
- Parallel trends
Publication Output Template
Output: Create publication-ready materials
๐ฏ QUICK REFERENCE: PHASE SELECTOR
When to Use Each Phase
User says... โ Use Phase...
"I have data on X and Y" โ Phase A (Design)
"Help me set up DiD analysis" โ Phase A (Design)
"What model should I use?" โ Phase A (Design)
"Check if parallel trends hold" โ Phase B (EDA)
"Show me balance table" โ Phase B (EDA)
"Are treated/control similar?" โ Phase B (EDA)
"Estimate the treatment effect" โ Phase C (Estimation)
"Run the regressions" โ Phase C (Estimation)
"Test different specifications" โ Phase C (Estimation)
"What's the ATE?" โ Phase D (Interpretation)
"Show me marginal effects" โ Phase D (Interpretation)
"Heterogeneous effects by age?" โ Phase D (Interpretation)
"Test robustness" โ Phase E (Robustness)
"Sensitivity analysis" โ Phase E (Robustness)
"Run placebo tests" โ Phase E (Robustness)
"Make publication tables" โ Phase F (Publication)
"Create LaTeX output" โ Phase F (Publication)
Typical Phase Sequences
Complete Analysis (First Time):
A โ B โ C โ D โ E โ F
Quick Analysis (Experienced User):
C โ D โ F
Iterative Refinement:
A โ B โ C โ [issues found] โ A โ B โ C โ D โ E โ F
Common Pitfalls
โ Forgetting to cluster SEs in panel data
โ Wrong time FE for event studies
โ Interpreting GLM coefficients directly
Troubleshooting
Issue: "Variable not found in data"
Cause: Formula uses column name not in DataFrame
Fix: Check df.columns and use exact names
Issue: Pre-trends in event study
Cause: Wrong specification or confounders Fix:
- Check time FE granularity matches treatment
- Add time-varying controls
- Consider alternative specifications
Issue: Very large standard errors
Cause: Small number of clusters Fix:
- Check cluster variable has enough variation
- Consider wild bootstrap for few clusters
- Report issue transparently
Issue: marginaleffects not working
Cause: Model type not supported
Fix: Check compatibility at marginaleffects.com/vignettes/supported.html
Issue: iplot() creates blank saved figures
Cause: iplot() doesn't return saveable figure object
Fix: Use scripts/event_study_utils.py:
Function Reference
pyfixest Core
| Function | Purpose | Example |
|---|---|---|
feols() | OLS with FE | feols("Y~X|firm+year", df) |
fepois() | Poisson with FE | fepois("count~X|firm", df) |
etable() | Export table | etable([m1,m2], type='tex') |
.vcov() | Adjust SEs | model.vcov("HC3") |
.iplot() | Event study plot | model.iplot() |
.tidy() | Extract results | model.tidy() |
marginaleffects Core
| Function | Purpose | Example |
|---|---|---|
avg_slopes() | Average ME | avg_slopes(m, variables="X") |
avg_comparisons() | Average TE | avg_comparisons(m, variables="D") |
avg_predictions() | Avg predictions | avg_predictions(m, by="group") |
predictions() | Unit predictions | predictions(m, newdata=grid) |
plot_predictions() | Viz predictions | plot_predictions(m, condition="X") |
plot_slopes() | Viz ME | plot_slopes(m, variables="X") |
datagrid() | Create grid | datagrid(X=[1,2,3], model=m) |
Resources
- pyfixest docs: https://py-fixest.github.io/pyfixest/
- marginaleffects docs: https://marginaleffects.com/
- Scripts: Pre-built workflows in
scripts/ - Examples: Complete analyses in
assets/
Installation Check
Run this to verify setup:
Getting Help
- Check function docstrings:
help(pf.feols) - Review examples:
assets/minimum_wage_analysis.py - Consult references:
references/marginaleffects-guide.md - Search docs: https://marginaleffects.com/vignettes/