TL;DR
- Context: Semester-long research mentorship at IIT Kanpur studying multi-agent behavior emergence through RL and game theory
- Problem: Students needed guidance translating abstract behavioral economics concepts into executable computational models
- Intervention: Structured problem formulation around reward shaping, population dynamics, and Q-learning implementation
- Impact: Demonstrated emergence of Tit-for-Tat strategies and documented how cooperation declines without punishment mechanisms
Intro
This project was conducted as a semester-long research mentorship under the Brain and Cognitive Society at IIT Kanpur. The research question combined neuroeconomics, evolutionary game theory, and reinforcement learning: how do micro-level agent decisions lead to macro-level behavioral patterns in competitive environments? My role was to mentor a student team through problem formulation, modeling choices, and experimental design, with emphasis on translating abstract behavioral theory into executable simulations.
Problem
- Abstract behavioral concepts (cooperation, altruism, selfishness) needed computational operationalization
- Students risked building RL systems with degenerate reward functions producing meaningless results
- Challenge of designing rewards that reflect population-level evolution, not just individual gain
- Required experimental design producing interpretable results aligned with evolutionary game theory
Intervention
- Guided formalization of behavioral economics into multi-agent simulation with resource constraints
- Designed reward shaping based on population-level advantage rather than individual interaction outcomes
- Advised on Q-learning implementation using global state-strategy representation
- Structured experiments mapping individual incentives to emergent collective behavior
- Ensured theoretical consistency with evolutionary game theory and neuroeconomics literature
Impact
- Learning agents converged to hybrid strategies blending cooperation and retaliation based on population composition
- Demonstrated Tit-for-Tat-like behaviors emerged as robust equilibrium under repeated interactions
- Showed absence of punishment mechanisms leads to gradual decline of pure cooperators and long-term instability
- Team produced publication-quality results with clear mapping from micro-decisions to macro-behavior
Why This Matters
Multi-agent systems are notoriously difficult to formalize correctly. The gap between theoretical models and working implementations often produces degenerate solutions or artifacts. Mentorship focusing on reward shaping, state representation, and experimental validity ensures research produces meaningful insights rather than implementation bugs masquerading as discoveries.
Technical Deep Dive (Optional)
This section expands on the multi-agent framework, RL formulation, and research findings for readers who want technical depth.
View technical deep dive
Research Questions
Primary Question How do micro-level agent decisions lead to macro-level behavioral patterns in multi-agent competitive environments?
Sub-Questions
- How do cooperation, selfishness, and altruism emerge naturally from simple rules?
- How does the absence of punishment mechanisms affect long-term cooperation sustainability?
- Can RL agents learn stable strategies when interacting with fixed-strategy agents?
- What population dynamics emerge from different strategy mixtures?
Environment Design
Multi-Agent Simulation
- Agents repeatedly interact under constrained resources (food-sharing scenario)
- Each interaction outcome influenced by:
- Agent's chosen strategy
- Opponent's strategy
- Interaction history
- Current population composition
Resource Constraints
- Limited food creates competitive pressure
- Sharing vs hoarding decisions affect survival
- Population-level effects create evolutionary pressure
Strategy Space
Fixed Strategies
- Always Cooperate (AC): Unconditional sharing/cooperation
- Tit-for-Tat (TFT): History-based reciprocity (copy opponent's last move)
- Alternating Cooperate (ALT): Oscillation between cooperation and competition
- Always Defect (AD): Never cooperates; always takes advantage
Adaptive Strategy
- Learning Agent: Q-learning-based adaptive policy
- Learns optimal response strategy based on population dynamics
- Can develop hybrid strategies not present in fixed strategies
Reinforcement Learning Formulation
State Representation
- Global Q-table mapping:
- State: Strategy type of the interacting agent
- Action: Chosen response strategy
Critical Design Decision: Reward Function
Individual-Level Rewards (Rejected Approach)
- Reward based on immediate interaction outcome
- Problem: Produces degenerate strategies exploiting interaction mechanics
- Doesn't capture evolutionary pressure
Population-Level Rewards (Chosen Approach)
- Rewards based on relative population growth/decline of strategy
- Reflects evolutionary advantage over time
- Simulates natural selection at population level
Why This Mattered
- Aligns learning with evolutionary game theory predictions
- Produces meaningful strategies rather than implementation artifacts
- Enables comparison with theoretical literature
Key Findings
1. Cooperation Dynamics
Always Cooperate Agents
- Declined gradually in absence of punishment mechanisms
- Exploited by Always Defect agents
- Population share decreased over time
Always Defect Agents
- Dominated in short-term interactions
- Destabilized environment long-term
- Eventually suffered from lack of cooperators
Tit-for-Tat Strategies
- Emerged as stable equilibrium under repeated interactions
- Balanced cooperation with retaliation
- Most robust strategy across varying population compositions
2. Learning Agent Behavior
Convergence Pattern
- Converged to hybrid strategies
- Blended cooperation and retaliation dynamically
- Adapted policy as population dynamics shifted
Adaptive Response
- Learned to cooperate with cooperators
- Learned to retaliate against defectors
- Adjusted strategy mix based on population feedback
3. Non-Linear Parameter Effects
Sensitivity Analysis
- Small changes in:
- Agent count
- Interaction frequency
- Initial population distribution
- Produced large behavioral shifts
Implication: Multi-agent systems exhibit complex, non-linear dynamics requiring careful experimental design
Mentorship Approach
Problem Formalization
- Guided translation of behavioral concepts into mathematical models
- Ensured computable representations maintained theoretical validity
- Helped define clear state spaces and action spaces
Reward Shaping Guidance
- Critical decision: population-level vs individual-level rewards
- Explained common RL pitfalls (reward hacking, degenerate solutions)
- Validated reward functions against expected theoretical behavior
Experimental Design
- Structured parameter sweeps for systematic exploration
- Designed visualizations producing interpretable results
- Ensured metrics answered research questions directly
Theoretical Validation
- Connected simulation results to evolutionary game theory literature
- Validated findings against established behavioral economics research
- Ensured claims were defensible and properly scoped
Scientific Contributions
Demonstrated
- Clear mapping from individual incentives to emergent collective behavior
- Applicability of RL to evolutionary game theory questions
- How absence of punishment affects cooperation sustainability
Produced
- Publication-quality experimental results
- Interpretable visualizations of population dynamics
- Reproducible codebase for future research
Student Development Outcomes
Through this mentorship, the team:
- Gained experience in RL formulation for complex systems
- Learned rigorous experimental design for multi-agent research
- Developed skills translating abstract theory to implementation
- Understood importance of reward shaping in RL systems
Consulting Relevance
This mentorship demonstrates ability to:
- Translate theoretical research into working ML systems
- Design multi-agent simulations with meaningful metrics
- Guide teams through RL formulation pitfalls
- Mentor engineers at intersection of ML, economics, and complex systems
Directly applicable to startups building:
- Market simulations and dynamic pricing systems
- Agent-based economic models
- Multi-stakeholder optimization systems
- Adaptive decision engines with competing objectives
Completed as semester-long research mentorship project at IIT Kanpur's Brain and Cognitive Society, 2021.