Disentangling Math vs Story Reasoning in the Human Brain

TL;DR

Context: Computational neuroscience research at Neuromatch Academy using Human Connectome Project fMRI data (339 subjects)
Problem: Unclear how brain differentiates mathematical vs story reasoning despite shared linguistic structure
Intervention: Combined task-based fMRI analysis with logistic regression and statistical validation against random baselines
Impact: Achieved 97% classification accuracy using just 2 brain parcels; revealed representational structure invisible to activation-only analysis

Intro

As part of Neuromatch Academy's Computational Neuroscience program (2021), this project investigated a foundational cognitive question: How does the brain differentiate between mathematical reasoning and narrative comprehension, despite both relying on shared semantic and syntactic processing? The work combined task-based fMRI analysis, statistical modeling, and machine learning interpretability using large-scale human neuroimaging data from the Human Connectome Project.

Problem

Math and language tasks activate overlapping brain regions, making differentiation non-obvious
Simple activation magnitude comparisons miss the representational structure driving discrimination
High-dimensional fMRI data (360 brain parcels × 339 subjects) required principled analysis
Needed statistical validation against random baselines to ensure findings weren't measurement artifacts

Intervention

Selected 40 task-relevant brain parcels based on math vs cue and story vs cue activation contrasts
Trained logistic GLM classifier to predict task type from parcel-wise activation patterns
Validated performance against randomly sampled parcels using Wilcoxon signed-rank tests
Interpreted model weights to identify discriminative regions beyond raw activation strength
Connected findings to executive control and default mode network literature

Impact

Achieved 97% classification accuracy using activity from just 2 brain parcels
Demonstrated task discrimination was highly significant (p < 10⁻¹²) compared to random parcel baselines
Revealed highest-activation parcels weren't always most discriminative (representational structure matters)
Identified executive control and default mode network regions as critical for task differentiation

Why This Matters

High-dimensional biological data analysis requires separating signal from confound through rigorous statistical modeling. This work demonstrates the importance of model-driven interpretability—looking at how systems make decisions, not just what they decide—applicable to any domain with noisy, high-dimensional data where understanding mechanism matters as much as prediction.

Technical Deep Dive (Optional)

This section expands on the dataset, methodology, and findings for readers who want technical depth.

View technical deep dive

Dataset & Experimental Setup

Data Source

Human Connectome Project (HCP 2020)
N = 339 subjects
Task-based functional MRI (fMRI)

Task Paradigm: Language Three conditions:

Math reasoning: Mathematical problems requiring calculation
Story comprehension: Narrative text requiring semantic understanding
Cue: Control condition (baseline)

Spatial Resolution

360 cortical parcels (brain regions)
Each parcel represents aggregated neural activity
Parcel-wise activation vectors per subject

Measurement Beta values representing task-evoked activity in each parcel

Methodology

Phase 1: Parcel Selection

Approach

Identified 20 most active parcels for:
- Math vs cue contrast
- Story vs cue contrast
Union of both sets → 40 task-relevant parcels

Rationale

Reduces dimensionality while preserving task-relevant information
Focuses analysis on regions demonstrably engaged by cognitive tasks
Prevents overfitting from 360-dimensional space

Phase 2: Classification Modeling

Model: Logistic GLM (Generalized Linear Model)

Input

Matrix: 40 parcels × 339 subjects
Each cell: parcel activation during task

Output

Binary classification: Math vs Story

Training

Cross-validation across subjects
Held-out test sets for evaluation
Evaluated accuracy, precision, recall

Interpretation

Model weights indicate discriminative parcels
High weight = parcel strongly differentiates tasks
Weight magnitude ≠ activation magnitude (key insight)

Phase 3: Statistical Validation

Null Hypothesis Task-relevant parcels perform no better than random parcels.

Validation Procedure

Randomly sample 40 parcels (multiple iterations)
Train classifier on random parcel sets
Compare performance: task-relevant vs random
Statistical test: Wilcoxon signed-rank test

Result

Task-relevant parcels significantly outperformed random parcels
p < 10⁻¹² (extremely significant)
Validates that findings are not artifacts

Key Findings

1. Near-Perfect Discrimination with Minimal Features

97% accuracy using just 2 brain parcels
Demonstrates task differentiation relies on small subset of regions
Implies highly efficient neural coding

2. Activation ≠ Discrimination

Critical Insight

Parcels with highest activation were NOT always most discriminative
Model weights revealed regions with:
- Executive control functions
- Default Mode Network involvement
- Prefrontal integration roles

Implication Task differentiation depends on representational structure, not raw activity magnitude.

3. Neurobiological Interpretation

Math vs Story are Neurally Distinct

Despite shared linguistic processing
Differentiation occurs at representational level
Small subset of regions sufficient for classification

Key Brain Networks Involved

Executive Control Network: Task switching, cognitive control
Default Mode Network: Self-referential vs abstract reasoning
Prefrontal Cortex: Higher-order integration

Scientific Insights

Model-Driven Neuroscience

Interpretability (model weights) revealed structure invisible to simple correlation
ML valuable beyond prediction: understanding why decisions are made
Demonstrates value of multivariate pattern analysis over univariate activation

Task Representation in Brain

Brain differentiates tasks through distributed patterns
Not localized "math region" vs "story region"
Networks dynamically reconfigure based on task demands

Limitations & Future Directions

Limitations

fMRI temporal resolution: Slow hemodynamic response (~2s lag)
Beta-value estimation: Averages across task blocks
Sensory modality: Visual presentation for both tasks
Language specificity: English-only stimuli

Future Work

Cross-linguistic validation: Test in multiple languages
Temporal dynamics: Use higher temporal resolution (MEG/EEG)
Causal interventions: TMS or lesion studies
Individual differences: Relate to behavioral performance

Technical Tools & Methods

Data Analysis

Python: NumPy, SciPy, Pandas
Machine Learning: Scikit-learn (logistic regression, cross-validation)
Statistical Testing: Wilcoxon signed-rank test
Visualization: Matplotlib, Seaborn

Computational Pipeline

Data loading and preprocessing (HCP format)
Parcel selection via contrast analysis
Model training with cross-validation
Statistical validation against null distribution
Interpretation and visualization

Presentation & Communication

NMA Project Presentation

Delivered to Neuromatch Academy cohort
Explained methods, results, interpretations
Received feedback from computational neuroscience experts

Skills Demonstrated

Translating complex data into clear narrative
Statistical rigor in biological data analysis
Bridging neuroscience and machine learning

Industry Relevance

Transferable Skills

Working with large-scale, high-dimensional data
Statistical rigor and null hypothesis testing
ML interpretability beyond black-box prediction
Signal extraction from noisy biological measurements
Translating complex datasets into defensible conclusions

Applicable Domains

Healthcare ML diagnostics: Medical imaging, biomarker discovery
Biometric analysis systems: Physiological signal processing
Complex signal processing: Any high-dimensional, noisy data
Scientific ML products: Research-driven product teams

Why This Matters for Startups

Many ML problems require understanding why predictions work
Interpretability builds trust and enables iteration
Statistical validation prevents false discoveries
Biological data complexity mirrors many real-world messy datasets

Completed as part of Neuromatch Academy 2021 Computational Neuroscience program using Human Connectome Project fMRI data.