academic-writer
End-to-end academic paper development for empirical social science research. Guides from data exploration through publication-ready manuscript using Python (pyfixest, samplics, matplotlib/seaborn) and Quarto PDF. Automates discovery, planning, analysis, visualization, and paper compilation with one approval checkpoint.
Academic Writer - Empirical Social Science Papers
Complete pipeline for developing publication-ready academic papers from raw data and literature to formatted manuscript.
When to Use This Skill
Trigger patterns:
- "Write an academic paper from this data"
- "Create a research paper analyzing [dataset]"
- "Help me write up my empirical analysis"
- "Generate a paper from my survey/panel data"
- "Turn this analysis into a publication"
Best for:
- Empirical social science research (economics, political science, sociology, public health)
- Papers based on survey data, panel data, or observational studies
- Quantitative analysis with tables and figures
- Standard academic structure (5-6 sections)
Not for:
- Purely theoretical papers
- Qualitative research
- Meta-analyses (though can incorporate)
- Grant proposals or non-research documents
Core Philosophy
Structured Pipeline Approach:
- Discovery - Understand data and context
- Planning - Create comprehensive analysis plan
- Approval - Get user sign-off on plan (ONE checkpoint)
- Execution - Run analysis, create visualizations, compile paper
- Delivery - Publication-ready Quarto PDF
Key Principles:
- Leverage specialized skills (econometrics-python, samplics-survey-analysis)
- Python-only workflow (pyfixest, samplics, matplotlib/seaborn)
- Publication-quality figures and tables
- Reproducible, well-documented code
- Quarto PDF as primary output
Prerequisites
Required Python packages:
Required folder structure:
project/
├── data/ # User provides
│ ├── dataset.csv # Main data file
│ └── metadata.json # Optional: data documentation
└── literature/ # Optional: user provides
├── paper1.pdf
└── paper2-summary.md
Quarto: Must be installed system-wide
Bundled Resources
References (references/)
Detailed guides for each phase:
- discovery-protocols.md - How to explore data, detect types, extract metadata
- planning-templates.md - Templates for paper outlines and analysis plans
- analysis-workflows.md - Common analysis patterns (DiD, survey analysis, panel regression)
- paper-structure.md - Social science paper conventions and section guidelines
- visualization-guide.md - Publication-quality matplotlib/seaborn patterns
- table-formatting.md - LaTeX table creation for Quarto PDF
Scripts (scripts/)
Utilities for automation:
- data_discovery.py - Automated data scanning and type detection
- plan_generator.py - Generate structured analysis plans
- table_formatter.py - Create publication-ready LaTeX tables
Assets (assets/)
Templates and examples:
- paper-template.qmd - Complete Quarto paper template
- _quarto.yml - Quarto configuration for academic PDF
- outline-example.md - Sample paper outline
- plan-example.md - Complete example of analysis plan
Complete Workflow
Phase 1: DISCOVERY (Automated)
What happens:
- Scan data folder for CSV/DTA/Parquet files
- Load main dataset and inspect:
- Variable names and types
- Sample size and time coverage
- Detect data type (survey, panel, cross-section)
- Check for literature folder
- Ask user for research context
How to execute:
Outputs:
discovery-report.md- What was found- Console questions for user about research goals
Phase 2: PLANNING (Automated → User Approval)
What happens:
- Generate paper outline (5-6 sections)
- Create analysis strategy:
- Which skill to use (econometrics vs. survey)
- Specific models and specifications
- Required tables and figures
- Design visualization plan
- Output comprehensive plan as markdown
- WAIT for user approval
How to execute:
Critical: Show plan to user and get approval:
- "Here is the analysis plan. Type 'approved' to proceed, or suggest revisions."
- Do NOT proceed until user confirms
Outputs:
development-plan.md- Complete analysis plan with outline
Phase 3: EXECUTION (Automated)
What happens: After user approval, execute the plan:
3.1 Data Preparation
- Load and clean data
- Create treatment/outcome variables
- Apply sample restrictions
- Save processed data
3.2 Descriptive Analysis
- Determine data type (survey vs. panel)
- If survey data: Use
samplics-survey-analysisskill - If panel/experimental: Use standard descriptive stats
- Create summary tables
- Generate trend visualizations
3.3 Main Analysis
- If causal inference needed: Use
econometrics-pythonskill - If survey analysis: Use
samplics-survey-analysisskill - If other: Use statsmodels as needed
- Compute main estimates with proper SE
- Calculate marginal effects
- Run robustness checks
3.4 Visualization
- Create publication-quality figures using matplotlib/seaborn
- Follow visualization-guide.md patterns
- Export as high-DPI PNG (300dpi minimum)
- Save both figures and code
3.5 Tables
- Format results as LaTeX tables
- Use table-formatter.py utilities
- Follow discipline conventions (standard errors in parentheses, stars for significance)
3.6 Paper Compilation
- Use Quarto template from assets/
- Integrate all tables and figures
- Write narrative sections
- Compile to PDF
How to execute: Read the plan and execute each step:
- Read relevant skill files
- Run analysis scripts
- Generate visualizations
- Create tables
- Compile paper
Phase 4: DELIVERY
Outputs:
project/
├── analysis/
│ ├── 01-data-prep.py
│ ├── 02-descriptive.py
│ ├── 03-main-analysis.py
│ └── 04-robustness.py
├── outputs/
│ ├── tables/
│ │ ├── table1-summary.tex
│ │ ├── table2-main.tex
│ │ └── table3-robustness.tex
│ ├── figures/
│ │ ├── figure1-trend.png
│ │ ├── figure2-event-study.png
│ │ └── figure3-heterogeneity.png
│ └── results/
│ └── model-results.csv
├── paper/
│ ├── paper.qmd
│ ├── paper.pdf # Final output
│ ├── references.bib
│ └── _quarto.yml
├── development-plan.md
└── discovery-report.md
Skill Integration
This skill orchestrates other skills:
When to use econometrics-python
- Panel data with fixed effects
- Difference-in-differences designs
- Event studies
- Instrumental variables
- Any causal inference analysis
How to invoke:
When to use samplics-survey-analysis
- Survey data with sampling weights
- Complex survey designs (stratified, clustered)
- Weighted descriptive statistics
- Subgroup analysis
- Trend analysis in surveys
How to invoke:
When to use statsmodels
- Simple OLS regression
- Logistic regression
- Time series analysis not covered by pyfixest
- Any standard statistical model
Key Patterns
Pattern 1: Complete Paper Pipeline
Pattern 2: Data Type Detection
Pattern 3: Publication Figure
Pattern 4: LaTeX Table
Critical Guidelines
✅ DO:
- Always read relevant skill files before using them
- Use Python only - no R or other languages
- Create high-quality figures (300 dpi minimum, clear labels, publication fonts)
- Format tables properly (LaTeX for Quarto, standard errors in parentheses)
- Document everything (comments in code, clear variable names)
- Wait for approval after showing the plan
- Use specialized skills (econometrics-python, samplics-survey-analysis) when appropriate
- Save all outputs in organized folders (tables/, figures/, results/)
❌ DON'T:
- Don't proceed without user approval of the plan
- Don't use R or other languages - Python only
- Don't create low-quality figures - always use publication standards
- Don't ignore survey weights - use samplics when weights present
- Don't reinvent the wheel - use existing skills for specialized tasks
- Don't create messy LaTeX - use proper formatting functions
- Don't skip documentation - future readers need to understand
Visualization Standards
Figure Requirements:
- DPI: 300 minimum
- Format: PNG for papers, SVG for web
- Fonts: Serif (Times New Roman or similar)
- Size: 7" wide × 5" tall (standard), 7" × 7" (square)
- Colors: Professional palette (avoid bright colors)
- Labels: Clear axis labels with units
- Legend: Only if multiple series
- Grid: Light gray, alpha=0.3
- Title: Descriptive but concise
Common Figure Types:
- Time series with confidence intervals
- Coefficient plots (forest plots)
- Bar charts with error bars
- Event study plots
- Scatter plots with regression lines
See references/visualization-guide.md for detailed examples.
Table Standards
Table Requirements:
- Format: LaTeX for Quarto PDF
- Standard errors: In parentheses below coefficients
- Significance: Stars (*** p<0.01, ** p<0.05, * p<0.1)
- Decimal places: 3 for coefficients, 2 for summary stats
- Alignment: Left for row labels, center for columns
- Caption: Above table
- Notes: Below table
Common Table Types:
- Summary statistics (Table 1)
- Main regression results (Table 2)
- Robustness checks (Table 3)
- Heterogeneity analysis (Table 4)
See references/table-formatting.md for detailed examples.
Troubleshooting
Issue: "I don't know what type of data this is"
Solution: Run data_discovery.py script, check for:
- Weight variables → survey data
- ID + time variables → panel data
- Neither → cross-sectional
Issue: "Which skill should I use?"
Solution:
- Has weights/survey design → samplics-survey-analysis
- Panel data + causal inference → econometrics-python
- Simple regression → statsmodels
- Descriptive only → pandas + matplotlib
Issue: "Figures look unprofessional"
Solution:
- Check DPI (should be 300)
- Use serif fonts
- Follow visualization-guide.md patterns
- Use professional color palette
- Add clear labels and titles
Issue: "LaTeX table won't compile"
Solution:
- Check for special characters (_, &, %)
- Escape properly in LaTeX
- Use table_formatter.py utilities
- Test in minimal Quarto document first
Issue: "Standard errors seem wrong"
Solution:
- For survey data: Use samplics with proper design
- For panel data: Cluster at appropriate level
- Check if heteroskedasticity-robust SE needed
- Verify sample size is sufficient
Resources
- Quarto documentation: https://quarto.org/docs/authoring/tables.html
- pyfixest docs: https://py-fixest.github.io/pyfixest/
- samplics docs: https://samplics-org.github.io/samplics/
- matplotlib gallery: https://matplotlib.org/stable/gallery/
- seaborn examples: https://seaborn.pydata.org/examples/
Example: Complete Session
Version
Version: 1.0.0
Updated: November 2025
Compatible With: Python 3.8+, Quarto 1.3+