skills /academic-writer

Python Referenced

academic-writer

End-to-end academic paper development for empirical social science research. Guides from data exploration through publication-ready manuscript using Python (pyfixest, samplics, matplotlib/seaborn) and Quarto PDF. Automates discovery, planning, analysis, visualization, and paper compilation with one approval checkpoint.

Complete pipeline for developing publication-ready academic papers from raw data and literature to formatted manuscript.

When to Use This Skill

Trigger patterns:

"Write an academic paper from this data"

"Create a research paper analyzing [dataset]"

"Help me write up my empirical analysis"

"Generate a paper from my survey/panel data"

"Turn this analysis into a publication"

Best for:

Empirical social science research (economics, political science, sociology, public health)

Papers based on survey data, panel data, or observational studies

Quantitative analysis with tables and figures

Standard academic structure (5-6 sections)

Not for:

Purely theoretical papers

Qualitative research

Meta-analyses (though can incorporate)

Grant proposals or non-research documents

Core Philosophy

Structured Pipeline Approach:

Discovery - Understand data and context
Planning - Create comprehensive analysis plan
Approval - Get user sign-off on plan (ONE checkpoint)
Execution - Run analysis, create visualizations, compile paper
Delivery - Publication-ready Quarto PDF

Key Principles:

Leverage specialized skills (econometrics-python, samplics-survey-analysis)

Python-only workflow (pyfixest, samplics, matplotlib/seaborn)

Publication-quality figures and tables

Reproducible, well-documented code

Quarto PDF as primary output

Prerequisites

Required Python packages:

hljs bash

pip install pyfixest samplics pandas numpy matplotlib seaborn scipy statsmodels --break-system-packages
pip install quarto-cli  # For paper compilation

Required folder structure:

project/
├── data/                    # User provides
│   ├── dataset.csv         # Main data file
│   └── metadata.json       # Optional: data documentation
└── literature/             # Optional: user provides
    ├── paper1.pdf
    └── paper2-summary.md

Quarto: Must be installed system-wide

hljs bash

# Check if installed
quarto --version

Bundled Resources

References (references/)

Detailed guides for each phase:

discovery-protocols.md - How to explore data, detect types, extract metadata

planning-templates.md - Templates for paper outlines and analysis plans

analysis-workflows.md - Common analysis patterns (DiD, survey analysis, panel regression)

paper-structure.md - Social science paper conventions and section guidelines

visualization-guide.md - Publication-quality matplotlib/seaborn patterns

table-formatting.md - LaTeX table creation for Quarto PDF

Scripts (scripts/)

Utilities for automation:

data_discovery.py - Automated data scanning and type detection

plan_generator.py - Generate structured analysis plans

table_formatter.py - Create publication-ready LaTeX tables

Assets (assets/)

Templates and examples:

paper-template.qmd - Complete Quarto paper template

_quarto.yml - Quarto configuration for academic PDF

outline-example.md - Sample paper outline

plan-example.md - Complete example of analysis plan

Complete Workflow

Phase 1: DISCOVERY (Automated)

What happens:

Scan data folder for CSV/DTA/Parquet files
Load main dataset and inspect:
Check for literature folder
Ask user for research context

How to execute:

hljs python

# Read the discovery protocol
file_read /mnt/skills/user/academic-writer/references/discovery-protocols.md

# Run discovery script
python scripts/data_discovery.py --data-folder ./data

Outputs:

discovery-report.md - What was found

Console questions for user about research goals

Phase 2: PLANNING (Automated → User Approval)

What happens:

Generate paper outline (5-6 sections)
Create analysis strategy:
Design visualization plan
Output comprehensive plan as markdown
WAIT for user approval

How to execute:

hljs python

# Read planning templates
file_read /mnt/skills/user/academic-writer/references/planning-templates.md

# Generate plan
python scripts/plan_generator.py \
  --discovery discovery-report.md \
  --output development-plan.md

Critical: Show plan to user and get approval:

"Here is the analysis plan. Type 'approved' to proceed, or suggest revisions."

Do NOT proceed until user confirms

Outputs:

development-plan.md - Complete analysis plan with outline

Phase 3: EXECUTION (Automated)

What happens: After user approval, execute the plan:

3.1 Data Preparation

Load and clean data

Create treatment/outcome variables

Apply sample restrictions

Save processed data

3.2 Descriptive Analysis

Determine data type (survey vs. panel)

If survey data: Use samplics-survey-analysis skill

If panel/experimental: Use standard descriptive stats

Create summary tables

Generate trend visualizations

3.3 Main Analysis

If causal inference needed: Use econometrics-python skill

If survey analysis: Use samplics-survey-analysis skill

If other: Use statsmodels as needed

Compute main estimates with proper SE

Calculate marginal effects

Run robustness checks

3.4 Visualization

Create publication-quality figures using matplotlib/seaborn

Follow visualization-guide.md patterns

Export as high-DPI PNG (300dpi minimum)

Save both figures and code

3.5 Tables

Format results as LaTeX tables

Use table-formatter.py utilities

Follow discipline conventions (standard errors in parentheses, stars for significance)

3.6 Paper Compilation

Use Quarto template from assets/

Integrate all tables and figures

Write narrative sections

Compile to PDF

How to execute: Read the plan and execute each step:

Read relevant skill files
Run analysis scripts
Generate visualizations
Create tables
Compile paper

Phase 4: DELIVERY

Outputs:

project/
├── analysis/
│   ├── 01-data-prep.py
│   ├── 02-descriptive.py
│   ├── 03-main-analysis.py
│   └── 04-robustness.py
├── outputs/
│   ├── tables/
│   │   ├── table1-summary.tex
│   │   ├── table2-main.tex
│   │   └── table3-robustness.tex
│   ├── figures/
│   │   ├── figure1-trend.png
│   │   ├── figure2-event-study.png
│   │   └── figure3-heterogeneity.png
│   └── results/
│       └── model-results.csv
├── paper/
│   ├── paper.qmd
│   ├── paper.pdf              # Final output
│   ├── references.bib
│   └── _quarto.yml
├── development-plan.md
└── discovery-report.md

Skill Integration

This skill orchestrates other skills:

When to use econometrics-python

Panel data with fixed effects

Difference-in-differences designs

Event studies

Instrumental variables

Any causal inference analysis

How to invoke:

hljs python

# Read the skill documentation
file_read /mnt/skills/user/econometrics-python/SKILL.md

# Follow pyfixest patterns
import pyfixest as pf
model = pf.feols("outcome ~ treatment | unit_id + time", 
                 data=df, vcov={"CRV1": "cluster_var"})

When to use samplics-survey-analysis

Survey data with sampling weights

Complex survey designs (stratified, clustered)

Weighted descriptive statistics

Subgroup analysis

Trend analysis in surveys

How to invoke:

hljs python

# Read the skill documentation
file_read /mnt/skills/user/samplics-survey-analysis/SKILL.md

# Follow samplics patterns
from samplics.estimation import TaylorEstimator
estimator = TaylorEstimator("mean")
result = estimator.estimate(
    y=df['outcome'].values,
    samp_weight=df['weight'].values,
    stratum=df['stratum'].values,
    psu=df['cluster'].values
)

When to use statsmodels

Simple OLS regression

Logistic regression

Time series analysis not covered by pyfixest

Any standard statistical model

Key Patterns

Pattern 1: Complete Paper Pipeline

hljs python

# Phase 1: Discovery
python scripts/data_discovery.py --data-folder ./data
# Review discovery-report.md

# Phase 2: Planning
python scripts/plan_generator.py --discovery discovery-report.md
# Review development-plan.md
# GET USER APPROVAL - STOP HERE

# Phase 3: Execution (only after approval)
# 3.1 Data prep
python analysis/01-data-prep.py

# 3.2 Descriptive
# Read appropriate skill
file_read /mnt/skills/user/samplics-survey-analysis/SKILL.md
# OR
file_read /mnt/skills/user/econometrics-python/SKILL.md
# Run analysis
python analysis/02-descriptive.py

# 3.3 Main analysis
python analysis/03-main-analysis.py

# 3.4 Tables and figures
python analysis/04-create-tables.py
python analysis/05-create-figures.py

# 3.5 Compile paper
cd paper && quarto render paper.qmd

Pattern 2: Data Type Detection

hljs python

import pandas as pd

df = pd.read_csv('data/dataset.csv')

# Check for survey data indicators
has_weights = any('weight' in col.lower() for col in df.columns)
has_strata = any('strat' in col.lower() for col in df.columns)
has_cluster = any('cluster' in col.lower() or 'psu' in col.lower() 
                  for col in df.columns)

if has_weights:
    print("→ Survey data detected. Use samplics-survey-analysis skill")
else:
    # Check for panel data
    id_candidates = [col for col in df.columns if 'id' in col.lower()]
    time_candidates = [col for col in df.columns if any(t in col.lower() 
                       for t in ['year', 'quarter', 'month', 'time', 'date'])]
    
    if id_candidates and time_candidates:
        print("→ Panel data detected. Use econometrics-python skill")
    else:
        print("→ Cross-sectional data. Use statsmodels for regression")

Pattern 3: Publication Figure

hljs python

import matplotlib.pyplot as plt
import seaborn as sns

# Set publication style
sns.set_style("whitegrid")
plt.rcParams.update({
    'font.size': 10,
    'axes.labelsize': 11,
    'axes.titlesize': 12,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'legend.fontsize': 10,
    'figure.titlesize': 12,
    'font.family': 'serif',
    'font.serif': ['Times New Roman', 'DejaVu Serif'],
})

# Create figure
fig, ax = plt.subplots(figsize=(7, 5))

# Plot data
ax.plot(data['year'], data['estimate'], marker='o', linewidth=2, 
        color='#2E86AB', label='Estimate')
ax.fill_between(data['year'], data['ci_lower'], data['ci_upper'],
                alpha=0.3, color='#2E86AB')

# Formatting
ax.set_xlabel('Year', fontsize=11)
ax.set_ylabel('Outcome Variable', fontsize=11)
ax.set_title('Trend Over Time', fontsize=12, pad=10)
ax.legend(frameon=True, loc='best')
ax.grid(True, alpha=0.3, linestyle='--')

# Save high-quality
plt.tight_layout()
plt.savefig('outputs/figures/figure1-trend.png', dpi=300, bbox_inches='tight')
plt.close()

Pattern 4: LaTeX Table

hljs python

def create_latex_table(results_df, caption, label):
    """Create publication-ready LaTeX table"""
    
    # Format numbers
    results_df['Estimate'] = results_df['Estimate'].apply(lambda x: f"{x:.3f}")
    results_df['SE'] = results_df['SE'].apply(lambda x: f"({x:.3f})")
    
    # Add significance stars
    results_df['Estimate'] = results_df.apply(
        lambda row: row['Estimate'] + 
        ('***' if row['p_value'] < 0.01 else 
         '**' if row['p_value'] < 0.05 else 
         '*' if row['p_value'] < 0.1 else ''),
        axis=1
    )
    
    # Create table
    latex = r"""\begin{table}[htbp]
\centering
\caption{""" + caption + r"""}
\label{""" + label + r"""}
\begin{tabular}{l""" + "c" * (len(results_df.columns) - 1) + r"""}
\hline\hline
"""
    
    # Header
    latex += " & ".join(results_df.columns) + r" \\" + "\n\\hline\n"
    
    # Data rows
    for _, row in results_df.iterrows():
        latex += " & ".join(str(v) for v in row.values) + r" \\" + "\n"
    
    # Footer
    latex += r"""\hline\hline
\end{tabular}
\begin{tablenotes}
\small
\item Notes: Standard errors in parentheses. 
\item *** p$<$0.01, ** p$<$0.05, * p$<$0.1
\end{tablenotes}
\end{table}
"""
    
    return latex

Critical Guidelines

✅ DO:

Always read relevant skill files before using them
Use Python only - no R or other languages
Create high-quality figures (300 dpi minimum, clear labels, publication fonts)
Format tables properly (LaTeX for Quarto, standard errors in parentheses)
Document everything (comments in code, clear variable names)
Wait for approval after showing the plan
Use specialized skills (econometrics-python, samplics-survey-analysis) when appropriate
Save all outputs in organized folders (tables/, figures/, results/)

❌ DON'T:

Don't proceed without user approval of the plan
Don't use R or other languages - Python only
Don't create low-quality figures - always use publication standards
Don't ignore survey weights - use samplics when weights present
Don't reinvent the wheel - use existing skills for specialized tasks
Don't create messy LaTeX - use proper formatting functions
Don't skip documentation - future readers need to understand

Visualization Standards

Figure Requirements:

DPI: 300 minimum

Format: PNG for papers, SVG for web

Fonts: Serif (Times New Roman or similar)

Size: 7" wide × 5" tall (standard), 7" × 7" (square)

Colors: Professional palette (avoid bright colors)

Labels: Clear axis labels with units

Legend: Only if multiple series

Grid: Light gray, alpha=0.3

Title: Descriptive but concise

Common Figure Types:

Time series with confidence intervals
Coefficient plots (forest plots)
Bar charts with error bars
Event study plots
Scatter plots with regression lines

See references/visualization-guide.md for detailed examples.

Table Standards

Table Requirements:

Format: LaTeX for Quarto PDF

Standard errors: In parentheses below coefficients

Significance: Stars (*** p<0.01, ** p<0.05, * p<0.1)

Decimal places: 3 for coefficients, 2 for summary stats

Alignment: Left for row labels, center for columns

Caption: Above table

Notes: Below table

Common Table Types:

Summary statistics (Table 1)
Main regression results (Table 2)
Robustness checks (Table 3)
Heterogeneity analysis (Table 4)

See references/table-formatting.md for detailed examples.

Troubleshooting

Issue: "I don't know what type of data this is"

Solution: Run data_discovery.py script, check for:

Weight variables → survey data

ID + time variables → panel data

Neither → cross-sectional

Issue: "Which skill should I use?"

Solution:

Has weights/survey design → samplics-survey-analysis

Panel data + causal inference → econometrics-python

Simple regression → statsmodels

Descriptive only → pandas + matplotlib

Issue: "Figures look unprofessional"

Solution:

Check DPI (should be 300)
Use serif fonts
Follow visualization-guide.md patterns
Use professional color palette
Add clear labels and titles

Issue: "LaTeX table won't compile"

Solution:

Check for special characters (_, &, %)
Escape properly in LaTeX
Use table_formatter.py utilities
Test in minimal Quarto document first

Issue: "Standard errors seem wrong"

Solution:

For survey data: Use samplics with proper design
For panel data: Cluster at appropriate level
Check if heteroskedasticity-robust SE needed
Verify sample size is sufficient

Resources

Quarto documentation: https://quarto.org/docs/authoring/tables.html

pyfixest docs: https://py-fixest.github.io/pyfixest/

samplics docs: https://samplics-org.github.io/samplics/

matplotlib gallery: https://matplotlib.org/stable/gallery/

seaborn examples: https://seaborn.pydata.org/examples/

Example: Complete Session

hljs bash

# User puts data in ./data/ and optionally literature in ./literature/

# 1. Discovery
file_read /mnt/skills/user/academic-writer/references/discovery-protocols.md
python scripts/data_discovery.py --data-folder ./data
# Review: discovery-report.md

# 2. Planning
file_read /mnt/skills/user/academic-writer/references/planning-templates.md
python scripts/plan_generator.py --discovery discovery-report.md
# Review: development-plan.md
# User: "Looks good, proceed"

# 3. Execution
# Read relevant skills
file_read /mnt/skills/user/econometrics-python/SKILL.md
file_read /mnt/skills/user/academic-writer/references/visualization-guide.md

# Run analysis
python analysis/01-data-prep.py
python analysis/02-descriptive.py
python analysis/03-main-analysis.py
python analysis/04-create-outputs.py

# 4. Compile paper
cd paper
quarto render paper.qmd

# 5. Deliver
# User gets paper.pdf with all tables and figures integrated

Version

Version: 1.0.0
Updated: November 2025
Compatible With: Python 3.8+, Quarto 1.3+

Related Categories

← All Skills

Academic Writer - Empirical Social Science Papers

References (references/)

Scripts (scripts/)

Assets (assets/)

Phase 1: DISCOVERY (Automated)

Phase 2: PLANNING (Automated → User Approval)

Phase 3: EXECUTION (Automated)

Phase 4: DELIVERY

When to use econometrics-python

When to use samplics-survey-analysis

When to use statsmodels

Pattern 1: Complete Paper Pipeline

Pattern 2: Data Type Detection

Pattern 3: Publication Figure

Pattern 4: LaTeX Table

✅ DO:

❌ DON'T:

Figure Requirements:

Common Figure Types:

Table Requirements:

Common Table Types:

Issue: "I don't know what type of data this is"

Issue: "Which skill should I use?"

Issue: "Figures look unprofessional"

Issue: "LaTeX table won't compile"

Issue: "Standard errors seem wrong"

Related Categories