skills /academic-writer
Python Referenced

academic-writer

End-to-end academic paper development for empirical social science research. Guides from data exploration through publication-ready manuscript using Python (pyfixest, samplics, matplotlib/seaborn) and Quarto PDF. Automates discovery, planning, analysis, visualization, and paper compilation with one approval checkpoint.

Academic Writer - Empirical Social Science Papers

Complete pipeline for developing publication-ready academic papers from raw data and literature to formatted manuscript.

When to Use This Skill

Trigger patterns:

  • "Write an academic paper from this data"
  • "Create a research paper analyzing [dataset]"
  • "Help me write up my empirical analysis"
  • "Generate a paper from my survey/panel data"
  • "Turn this analysis into a publication"

Best for:

  • Empirical social science research (economics, political science, sociology, public health)
  • Papers based on survey data, panel data, or observational studies
  • Quantitative analysis with tables and figures
  • Standard academic structure (5-6 sections)

Not for:

  • Purely theoretical papers
  • Qualitative research
  • Meta-analyses (though can incorporate)
  • Grant proposals or non-research documents

Core Philosophy

Structured Pipeline Approach:

  1. Discovery - Understand data and context
  2. Planning - Create comprehensive analysis plan
  3. Approval - Get user sign-off on plan (ONE checkpoint)
  4. Execution - Run analysis, create visualizations, compile paper
  5. Delivery - Publication-ready Quarto PDF

Key Principles:

  • Leverage specialized skills (econometrics-python, samplics-survey-analysis)
  • Python-only workflow (pyfixest, samplics, matplotlib/seaborn)
  • Publication-quality figures and tables
  • Reproducible, well-documented code
  • Quarto PDF as primary output

Prerequisites

Required Python packages:

hljs bash
pip install pyfixest samplics pandas numpy matplotlib seaborn scipy statsmodels --break-system-packages pip install quarto-cli # For paper compilation

Required folder structure:

project/ ├── data/ # User provides │ ├── dataset.csv # Main data file │ └── metadata.json # Optional: data documentation └── literature/ # Optional: user provides ├── paper1.pdf └── paper2-summary.md

Quarto: Must be installed system-wide

hljs bash
# Check if installed quarto --version

Bundled Resources

References (references/)

Detailed guides for each phase:

  • discovery-protocols.md - How to explore data, detect types, extract metadata
  • planning-templates.md - Templates for paper outlines and analysis plans
  • analysis-workflows.md - Common analysis patterns (DiD, survey analysis, panel regression)
  • paper-structure.md - Social science paper conventions and section guidelines
  • visualization-guide.md - Publication-quality matplotlib/seaborn patterns
  • table-formatting.md - LaTeX table creation for Quarto PDF

Scripts (scripts/)

Utilities for automation:

  • data_discovery.py - Automated data scanning and type detection
  • plan_generator.py - Generate structured analysis plans
  • table_formatter.py - Create publication-ready LaTeX tables

Assets (assets/)

Templates and examples:

  • paper-template.qmd - Complete Quarto paper template
  • _quarto.yml - Quarto configuration for academic PDF
  • outline-example.md - Sample paper outline
  • plan-example.md - Complete example of analysis plan

Complete Workflow

Phase 1: DISCOVERY (Automated)

What happens:

  1. Scan data folder for CSV/DTA/Parquet files
  2. Load main dataset and inspect:
    • Variable names and types
    • Sample size and time coverage
    • Detect data type (survey, panel, cross-section)
  3. Check for literature folder
  4. Ask user for research context

How to execute:

hljs python
# Read the discovery protocol file_read /mnt/skills/user/academic-writer/references/discovery-protocols.md # Run discovery script python scripts/data_discovery.py --data-folder ./data

Outputs:

  • discovery-report.md - What was found
  • Console questions for user about research goals

Phase 2: PLANNING (Automated → User Approval)

What happens:

  1. Generate paper outline (5-6 sections)
  2. Create analysis strategy:
    • Which skill to use (econometrics vs. survey)
    • Specific models and specifications
    • Required tables and figures
  3. Design visualization plan
  4. Output comprehensive plan as markdown
  5. WAIT for user approval

How to execute:

hljs python
# Read planning templates file_read /mnt/skills/user/academic-writer/references/planning-templates.md # Generate plan python scripts/plan_generator.py \ --discovery discovery-report.md \ --output development-plan.md

Critical: Show plan to user and get approval:

  • "Here is the analysis plan. Type 'approved' to proceed, or suggest revisions."
  • Do NOT proceed until user confirms

Outputs:

  • development-plan.md - Complete analysis plan with outline

Phase 3: EXECUTION (Automated)

What happens: After user approval, execute the plan:

3.1 Data Preparation

  • Load and clean data
  • Create treatment/outcome variables
  • Apply sample restrictions
  • Save processed data

3.2 Descriptive Analysis

  • Determine data type (survey vs. panel)
  • If survey data: Use samplics-survey-analysis skill
  • If panel/experimental: Use standard descriptive stats
  • Create summary tables
  • Generate trend visualizations

3.3 Main Analysis

  • If causal inference needed: Use econometrics-python skill
  • If survey analysis: Use samplics-survey-analysis skill
  • If other: Use statsmodels as needed
  • Compute main estimates with proper SE
  • Calculate marginal effects
  • Run robustness checks

3.4 Visualization

  • Create publication-quality figures using matplotlib/seaborn
  • Follow visualization-guide.md patterns
  • Export as high-DPI PNG (300dpi minimum)
  • Save both figures and code

3.5 Tables

  • Format results as LaTeX tables
  • Use table-formatter.py utilities
  • Follow discipline conventions (standard errors in parentheses, stars for significance)

3.6 Paper Compilation

  • Use Quarto template from assets/
  • Integrate all tables and figures
  • Write narrative sections
  • Compile to PDF

How to execute: Read the plan and execute each step:

  1. Read relevant skill files
  2. Run analysis scripts
  3. Generate visualizations
  4. Create tables
  5. Compile paper

Phase 4: DELIVERY

Outputs:

project/ ├── analysis/ │ ├── 01-data-prep.py │ ├── 02-descriptive.py │ ├── 03-main-analysis.py │ └── 04-robustness.py ├── outputs/ │ ├── tables/ │ │ ├── table1-summary.tex │ │ ├── table2-main.tex │ │ └── table3-robustness.tex │ ├── figures/ │ │ ├── figure1-trend.png │ │ ├── figure2-event-study.png │ │ └── figure3-heterogeneity.png │ └── results/ │ └── model-results.csv ├── paper/ │ ├── paper.qmd │ ├── paper.pdf # Final output │ ├── references.bib │ └── _quarto.yml ├── development-plan.md └── discovery-report.md

Skill Integration

This skill orchestrates other skills:

When to use econometrics-python

  • Panel data with fixed effects
  • Difference-in-differences designs
  • Event studies
  • Instrumental variables
  • Any causal inference analysis

How to invoke:

hljs python
# Read the skill documentation file_read /mnt/skills/user/econometrics-python/SKILL.md # Follow pyfixest patterns import pyfixest as pf model = pf.feols("outcome ~ treatment | unit_id + time", data=df, vcov={"CRV1": "cluster_var"})

When to use samplics-survey-analysis

  • Survey data with sampling weights
  • Complex survey designs (stratified, clustered)
  • Weighted descriptive statistics
  • Subgroup analysis
  • Trend analysis in surveys

How to invoke:

hljs python
# Read the skill documentation file_read /mnt/skills/user/samplics-survey-analysis/SKILL.md # Follow samplics patterns from samplics.estimation import TaylorEstimator estimator = TaylorEstimator("mean") result = estimator.estimate( y=df['outcome'].values, samp_weight=df['weight'].values, stratum=df['stratum'].values, psu=df['cluster'].values )

When to use statsmodels

  • Simple OLS regression
  • Logistic regression
  • Time series analysis not covered by pyfixest
  • Any standard statistical model

Key Patterns

Pattern 1: Complete Paper Pipeline

hljs python
# Phase 1: Discovery python scripts/data_discovery.py --data-folder ./data # Review discovery-report.md # Phase 2: Planning python scripts/plan_generator.py --discovery discovery-report.md # Review development-plan.md # GET USER APPROVAL - STOP HERE # Phase 3: Execution (only after approval) # 3.1 Data prep python analysis/01-data-prep.py # 3.2 Descriptive # Read appropriate skill file_read /mnt/skills/user/samplics-survey-analysis/SKILL.md # OR file_read /mnt/skills/user/econometrics-python/SKILL.md # Run analysis python analysis/02-descriptive.py # 3.3 Main analysis python analysis/03-main-analysis.py # 3.4 Tables and figures python analysis/04-create-tables.py python analysis/05-create-figures.py # 3.5 Compile paper cd paper && quarto render paper.qmd

Pattern 2: Data Type Detection

hljs python
import pandas as pd df = pd.read_csv('data/dataset.csv') # Check for survey data indicators has_weights = any('weight' in col.lower() for col in df.columns) has_strata = any('strat' in col.lower() for col in df.columns) has_cluster = any('cluster' in col.lower() or 'psu' in col.lower() for col in df.columns) if has_weights: print("→ Survey data detected. Use samplics-survey-analysis skill") else: # Check for panel data id_candidates = [col for col in df.columns if 'id' in col.lower()] time_candidates = [col for col in df.columns if any(t in col.lower() for t in ['year', 'quarter', 'month', 'time', 'date'])] if id_candidates and time_candidates: print("→ Panel data detected. Use econometrics-python skill") else: print("→ Cross-sectional data. Use statsmodels for regression")

Pattern 3: Publication Figure

hljs python
import matplotlib.pyplot as plt import seaborn as sns # Set publication style sns.set_style("whitegrid") plt.rcParams.update({ 'font.size': 10, 'axes.labelsize': 11, 'axes.titlesize': 12, 'xtick.labelsize': 10, 'ytick.labelsize': 10, 'legend.fontsize': 10, 'figure.titlesize': 12, 'font.family': 'serif', 'font.serif': ['Times New Roman', 'DejaVu Serif'], }) # Create figure fig, ax = plt.subplots(figsize=(7, 5)) # Plot data ax.plot(data['year'], data['estimate'], marker='o', linewidth=2, color='#2E86AB', label='Estimate') ax.fill_between(data['year'], data['ci_lower'], data['ci_upper'], alpha=0.3, color='#2E86AB') # Formatting ax.set_xlabel('Year', fontsize=11) ax.set_ylabel('Outcome Variable', fontsize=11) ax.set_title('Trend Over Time', fontsize=12, pad=10) ax.legend(frameon=True, loc='best') ax.grid(True, alpha=0.3, linestyle='--') # Save high-quality plt.tight_layout() plt.savefig('outputs/figures/figure1-trend.png', dpi=300, bbox_inches='tight') plt.close()

Pattern 4: LaTeX Table

hljs python
def create_latex_table(results_df, caption, label): """Create publication-ready LaTeX table""" # Format numbers results_df['Estimate'] = results_df['Estimate'].apply(lambda x: f"{x:.3f}") results_df['SE'] = results_df['SE'].apply(lambda x: f"({x:.3f})") # Add significance stars results_df['Estimate'] = results_df.apply( lambda row: row['Estimate'] + ('***' if row['p_value'] < 0.01 else '**' if row['p_value'] < 0.05 else '*' if row['p_value'] < 0.1 else ''), axis=1 ) # Create table latex = r"""\begin{table}[htbp] \centering \caption{""" + caption + r"""} \label{""" + label + r"""} \begin{tabular}{l""" + "c" * (len(results_df.columns) - 1) + r"""} \hline\hline """ # Header latex += " & ".join(results_df.columns) + r" \\" + "\n\\hline\n" # Data rows for _, row in results_df.iterrows(): latex += " & ".join(str(v) for v in row.values) + r" \\" + "\n" # Footer latex += r"""\hline\hline \end{tabular} \begin{tablenotes} \small \item Notes: Standard errors in parentheses. \item *** p$<$0.01, ** p$<$0.05, * p$<$0.1 \end{tablenotes} \end{table} """ return latex

Critical Guidelines

✅ DO:

  1. Always read relevant skill files before using them
  2. Use Python only - no R or other languages
  3. Create high-quality figures (300 dpi minimum, clear labels, publication fonts)
  4. Format tables properly (LaTeX for Quarto, standard errors in parentheses)
  5. Document everything (comments in code, clear variable names)
  6. Wait for approval after showing the plan
  7. Use specialized skills (econometrics-python, samplics-survey-analysis) when appropriate
  8. Save all outputs in organized folders (tables/, figures/, results/)

❌ DON'T:

  1. Don't proceed without user approval of the plan
  2. Don't use R or other languages - Python only
  3. Don't create low-quality figures - always use publication standards
  4. Don't ignore survey weights - use samplics when weights present
  5. Don't reinvent the wheel - use existing skills for specialized tasks
  6. Don't create messy LaTeX - use proper formatting functions
  7. Don't skip documentation - future readers need to understand

Visualization Standards

Figure Requirements:

  • DPI: 300 minimum
  • Format: PNG for papers, SVG for web
  • Fonts: Serif (Times New Roman or similar)
  • Size: 7" wide × 5" tall (standard), 7" × 7" (square)
  • Colors: Professional palette (avoid bright colors)
  • Labels: Clear axis labels with units
  • Legend: Only if multiple series
  • Grid: Light gray, alpha=0.3
  • Title: Descriptive but concise

Common Figure Types:

  1. Time series with confidence intervals
  2. Coefficient plots (forest plots)
  3. Bar charts with error bars
  4. Event study plots
  5. Scatter plots with regression lines

See references/visualization-guide.md for detailed examples.

Table Standards

Table Requirements:

  • Format: LaTeX for Quarto PDF
  • Standard errors: In parentheses below coefficients
  • Significance: Stars (*** p<0.01, ** p<0.05, * p<0.1)
  • Decimal places: 3 for coefficients, 2 for summary stats
  • Alignment: Left for row labels, center for columns
  • Caption: Above table
  • Notes: Below table

Common Table Types:

  1. Summary statistics (Table 1)
  2. Main regression results (Table 2)
  3. Robustness checks (Table 3)
  4. Heterogeneity analysis (Table 4)

See references/table-formatting.md for detailed examples.

Troubleshooting

Issue: "I don't know what type of data this is"

Solution: Run data_discovery.py script, check for:

  • Weight variables → survey data
  • ID + time variables → panel data
  • Neither → cross-sectional

Issue: "Which skill should I use?"

Solution:

  • Has weights/survey design → samplics-survey-analysis
  • Panel data + causal inference → econometrics-python
  • Simple regression → statsmodels
  • Descriptive only → pandas + matplotlib

Issue: "Figures look unprofessional"

Solution:

  1. Check DPI (should be 300)
  2. Use serif fonts
  3. Follow visualization-guide.md patterns
  4. Use professional color palette
  5. Add clear labels and titles

Issue: "LaTeX table won't compile"

Solution:

  1. Check for special characters (_, &, %)
  2. Escape properly in LaTeX
  3. Use table_formatter.py utilities
  4. Test in minimal Quarto document first

Issue: "Standard errors seem wrong"

Solution:

  1. For survey data: Use samplics with proper design
  2. For panel data: Cluster at appropriate level
  3. Check if heteroskedasticity-robust SE needed
  4. Verify sample size is sufficient

Resources

Example: Complete Session

hljs bash
# User puts data in ./data/ and optionally literature in ./literature/ # 1. Discovery file_read /mnt/skills/user/academic-writer/references/discovery-protocols.md python scripts/data_discovery.py --data-folder ./data # Review: discovery-report.md # 2. Planning file_read /mnt/skills/user/academic-writer/references/planning-templates.md python scripts/plan_generator.py --discovery discovery-report.md # Review: development-plan.md # User: "Looks good, proceed" # 3. Execution # Read relevant skills file_read /mnt/skills/user/econometrics-python/SKILL.md file_read /mnt/skills/user/academic-writer/references/visualization-guide.md # Run analysis python analysis/01-data-prep.py python analysis/02-descriptive.py python analysis/03-main-analysis.py python analysis/04-create-outputs.py # 4. Compile paper cd paper quarto render paper.qmd # 5. Deliver # User gets paper.pdf with all tables and figures integrated

Version

Version: 1.0.0
Updated: November 2025
Compatible With: Python 3.8+, Quarto 1.3+

Related Categories