NeuroscienceMachine LearningResearchData Science

Disentangling Math vs Story Reasoning in the Human Brain

Researcher, Neuromatch Academy (Computational Neuroscience)

Used fMRI and machine learning to identify neural basis for mathematical vs narrative reasoning with 97% accuracy

TL;DR

  • Context: Computational neuroscience research at Neuromatch Academy using Human Connectome Project fMRI data (339 subjects)
  • Problem: Unclear how brain differentiates mathematical vs story reasoning despite shared linguistic structure
  • Intervention: Combined task-based fMRI analysis with logistic regression and statistical validation against random baselines
  • Impact: Achieved 97% classification accuracy using just 2 brain parcels; revealed representational structure invisible to activation-only analysis

Intro

As part of Neuromatch Academy's Computational Neuroscience program (2021), this project investigated a foundational cognitive question: How does the brain differentiate between mathematical reasoning and narrative comprehension, despite both relying on shared semantic and syntactic processing? The work combined task-based fMRI analysis, statistical modeling, and machine learning interpretability using large-scale human neuroimaging data from the Human Connectome Project.


Problem

  • Math and language tasks activate overlapping brain regions, making differentiation non-obvious
  • Simple activation magnitude comparisons miss the representational structure driving discrimination
  • High-dimensional fMRI data (360 brain parcels × 339 subjects) required principled analysis
  • Needed statistical validation against random baselines to ensure findings weren't measurement artifacts

Intervention

  • Selected 40 task-relevant brain parcels based on math vs cue and story vs cue activation contrasts
  • Trained logistic GLM classifier to predict task type from parcel-wise activation patterns
  • Validated performance against randomly sampled parcels using Wilcoxon signed-rank tests
  • Interpreted model weights to identify discriminative regions beyond raw activation strength
  • Connected findings to executive control and default mode network literature

Impact

  • Achieved 97% classification accuracy using activity from just 2 brain parcels
  • Demonstrated task discrimination was highly significant (p < 10⁻¹²) compared to random parcel baselines
  • Revealed highest-activation parcels weren't always most discriminative (representational structure matters)
  • Identified executive control and default mode network regions as critical for task differentiation

Why This Matters

High-dimensional biological data analysis requires separating signal from confound through rigorous statistical modeling. This work demonstrates the importance of model-driven interpretability—looking at how systems make decisions, not just what they decide—applicable to any domain with noisy, high-dimensional data where understanding mechanism matters as much as prediction.


Technical Deep Dive (Optional)

This section expands on the dataset, methodology, and findings for readers who want technical depth.

View technical deep dive

Dataset & Experimental Setup

Data Source

  • Human Connectome Project (HCP 2020)
  • N = 339 subjects
  • Task-based functional MRI (fMRI)

Task Paradigm: Language Three conditions:

  1. Math reasoning: Mathematical problems requiring calculation
  2. Story comprehension: Narrative text requiring semantic understanding
  3. Cue: Control condition (baseline)

Spatial Resolution

  • 360 cortical parcels (brain regions)
  • Each parcel represents aggregated neural activity
  • Parcel-wise activation vectors per subject

Measurement Beta values representing task-evoked activity in each parcel


Methodology

Phase 1: Parcel Selection

Approach

  • Identified 20 most active parcels for:
    • Math vs cue contrast
    • Story vs cue contrast
  • Union of both sets → 40 task-relevant parcels

Rationale

  • Reduces dimensionality while preserving task-relevant information
  • Focuses analysis on regions demonstrably engaged by cognitive tasks
  • Prevents overfitting from 360-dimensional space

Phase 2: Classification Modeling

Model: Logistic GLM (Generalized Linear Model)

Input

  • Matrix: 40 parcels × 339 subjects
  • Each cell: parcel activation during task

Output

  • Binary classification: Math vs Story

Training

  • Cross-validation across subjects
  • Held-out test sets for evaluation
  • Evaluated accuracy, precision, recall

Interpretation

  • Model weights indicate discriminative parcels
  • High weight = parcel strongly differentiates tasks
  • Weight magnitude ≠ activation magnitude (key insight)

Phase 3: Statistical Validation

Null Hypothesis Task-relevant parcels perform no better than random parcels.

Validation Procedure

  1. Randomly sample 40 parcels (multiple iterations)
  2. Train classifier on random parcel sets
  3. Compare performance: task-relevant vs random
  4. Statistical test: Wilcoxon signed-rank test

Result

  • Task-relevant parcels significantly outperformed random parcels
  • p < 10⁻¹² (extremely significant)
  • Validates that findings are not artifacts

Key Findings

1. Near-Perfect Discrimination with Minimal Features

  • 97% accuracy using just 2 brain parcels
  • Demonstrates task differentiation relies on small subset of regions
  • Implies highly efficient neural coding

2. Activation ≠ Discrimination

Critical Insight

  • Parcels with highest activation were NOT always most discriminative
  • Model weights revealed regions with:
    • Executive control functions
    • Default Mode Network involvement
    • Prefrontal integration roles

Implication Task differentiation depends on representational structure, not raw activity magnitude.


3. Neurobiological Interpretation

Math vs Story are Neurally Distinct

  • Despite shared linguistic processing
  • Differentiation occurs at representational level
  • Small subset of regions sufficient for classification

Key Brain Networks Involved

  • Executive Control Network: Task switching, cognitive control
  • Default Mode Network: Self-referential vs abstract reasoning
  • Prefrontal Cortex: Higher-order integration

Scientific Insights

Model-Driven Neuroscience

  • Interpretability (model weights) revealed structure invisible to simple correlation
  • ML valuable beyond prediction: understanding why decisions are made
  • Demonstrates value of multivariate pattern analysis over univariate activation

Task Representation in Brain

  • Brain differentiates tasks through distributed patterns
  • Not localized "math region" vs "story region"
  • Networks dynamically reconfigure based on task demands

Limitations & Future Directions

Limitations

  • fMRI temporal resolution: Slow hemodynamic response (~2s lag)
  • Beta-value estimation: Averages across task blocks
  • Sensory modality: Visual presentation for both tasks
  • Language specificity: English-only stimuli

Future Work

  • Cross-linguistic validation: Test in multiple languages
  • Temporal dynamics: Use higher temporal resolution (MEG/EEG)
  • Causal interventions: TMS or lesion studies
  • Individual differences: Relate to behavioral performance

Technical Tools & Methods

Data Analysis

  • Python: NumPy, SciPy, Pandas
  • Machine Learning: Scikit-learn (logistic regression, cross-validation)
  • Statistical Testing: Wilcoxon signed-rank test
  • Visualization: Matplotlib, Seaborn

Computational Pipeline

  1. Data loading and preprocessing (HCP format)
  2. Parcel selection via contrast analysis
  3. Model training with cross-validation
  4. Statistical validation against null distribution
  5. Interpretation and visualization

Presentation & Communication

NMA Project Presentation

  • Delivered to Neuromatch Academy cohort
  • Explained methods, results, interpretations
  • Received feedback from computational neuroscience experts

Skills Demonstrated

  • Translating complex data into clear narrative
  • Statistical rigor in biological data analysis
  • Bridging neuroscience and machine learning

Industry Relevance

Transferable Skills

  • Working with large-scale, high-dimensional data
  • Statistical rigor and null hypothesis testing
  • ML interpretability beyond black-box prediction
  • Signal extraction from noisy biological measurements
  • Translating complex datasets into defensible conclusions

Applicable Domains

  • Healthcare ML diagnostics: Medical imaging, biomarker discovery
  • Biometric analysis systems: Physiological signal processing
  • Complex signal processing: Any high-dimensional, noisy data
  • Scientific ML products: Research-driven product teams

Why This Matters for Startups

  • Many ML problems require understanding why predictions work
  • Interpretability builds trust and enables iteration
  • Statistical validation prevents false discoveries
  • Biological data complexity mirrors many real-world messy datasets

Completed as part of Neuromatch Academy 2021 Computational Neuroscience program using Human Connectome Project fMRI data.

Interested in working together?

Let's build something exceptional together.