MentorshipReinforcement LearningGame TheoryResearch

Multi-Agent Decision-Making via Reinforcement Learning & Game Theory

Research Mentor, Brain & Cognitive Society, IIT Kanpur

Mentored research on emergent cooperation and altruism using RL agents in game-theoretic environments

TL;DR

  • Context: Semester-long research mentorship at IIT Kanpur studying multi-agent behavior emergence through RL and game theory
  • Problem: Students needed guidance translating abstract behavioral economics concepts into executable computational models
  • Intervention: Structured problem formulation around reward shaping, population dynamics, and Q-learning implementation
  • Impact: Demonstrated emergence of Tit-for-Tat strategies and documented how cooperation declines without punishment mechanisms

Intro

This project was conducted as a semester-long research mentorship under the Brain and Cognitive Society at IIT Kanpur. The research question combined neuroeconomics, evolutionary game theory, and reinforcement learning: how do micro-level agent decisions lead to macro-level behavioral patterns in competitive environments? My role was to mentor a student team through problem formulation, modeling choices, and experimental design, with emphasis on translating abstract behavioral theory into executable simulations.


Problem

  • Abstract behavioral concepts (cooperation, altruism, selfishness) needed computational operationalization
  • Students risked building RL systems with degenerate reward functions producing meaningless results
  • Challenge of designing rewards that reflect population-level evolution, not just individual gain
  • Required experimental design producing interpretable results aligned with evolutionary game theory

Intervention

  • Guided formalization of behavioral economics into multi-agent simulation with resource constraints
  • Designed reward shaping based on population-level advantage rather than individual interaction outcomes
  • Advised on Q-learning implementation using global state-strategy representation
  • Structured experiments mapping individual incentives to emergent collective behavior
  • Ensured theoretical consistency with evolutionary game theory and neuroeconomics literature

Impact

  • Learning agents converged to hybrid strategies blending cooperation and retaliation based on population composition
  • Demonstrated Tit-for-Tat-like behaviors emerged as robust equilibrium under repeated interactions
  • Showed absence of punishment mechanisms leads to gradual decline of pure cooperators and long-term instability
  • Team produced publication-quality results with clear mapping from micro-decisions to macro-behavior

Why This Matters

Multi-agent systems are notoriously difficult to formalize correctly. The gap between theoretical models and working implementations often produces degenerate solutions or artifacts. Mentorship focusing on reward shaping, state representation, and experimental validity ensures research produces meaningful insights rather than implementation bugs masquerading as discoveries.


Technical Deep Dive (Optional)

This section expands on the multi-agent framework, RL formulation, and research findings for readers who want technical depth.

View technical deep dive

Research Questions

Primary Question How do micro-level agent decisions lead to macro-level behavioral patterns in multi-agent competitive environments?

Sub-Questions

  • How do cooperation, selfishness, and altruism emerge naturally from simple rules?
  • How does the absence of punishment mechanisms affect long-term cooperation sustainability?
  • Can RL agents learn stable strategies when interacting with fixed-strategy agents?
  • What population dynamics emerge from different strategy mixtures?

Environment Design

Multi-Agent Simulation

  • Agents repeatedly interact under constrained resources (food-sharing scenario)
  • Each interaction outcome influenced by:
    • Agent's chosen strategy
    • Opponent's strategy
    • Interaction history
    • Current population composition

Resource Constraints

  • Limited food creates competitive pressure
  • Sharing vs hoarding decisions affect survival
  • Population-level effects create evolutionary pressure

Strategy Space

Fixed Strategies

  • Always Cooperate (AC): Unconditional sharing/cooperation
  • Tit-for-Tat (TFT): History-based reciprocity (copy opponent's last move)
  • Alternating Cooperate (ALT): Oscillation between cooperation and competition
  • Always Defect (AD): Never cooperates; always takes advantage

Adaptive Strategy

  • Learning Agent: Q-learning-based adaptive policy
  • Learns optimal response strategy based on population dynamics
  • Can develop hybrid strategies not present in fixed strategies

Reinforcement Learning Formulation

State Representation

  • Global Q-table mapping:
    • State: Strategy type of the interacting agent
    • Action: Chosen response strategy

Critical Design Decision: Reward Function

Individual-Level Rewards (Rejected Approach)

  • Reward based on immediate interaction outcome
  • Problem: Produces degenerate strategies exploiting interaction mechanics
  • Doesn't capture evolutionary pressure

Population-Level Rewards (Chosen Approach)

  • Rewards based on relative population growth/decline of strategy
  • Reflects evolutionary advantage over time
  • Simulates natural selection at population level

Why This Mattered

  • Aligns learning with evolutionary game theory predictions
  • Produces meaningful strategies rather than implementation artifacts
  • Enables comparison with theoretical literature

Key Findings

1. Cooperation Dynamics

Always Cooperate Agents

  • Declined gradually in absence of punishment mechanisms
  • Exploited by Always Defect agents
  • Population share decreased over time

Always Defect Agents

  • Dominated in short-term interactions
  • Destabilized environment long-term
  • Eventually suffered from lack of cooperators

Tit-for-Tat Strategies

  • Emerged as stable equilibrium under repeated interactions
  • Balanced cooperation with retaliation
  • Most robust strategy across varying population compositions

2. Learning Agent Behavior

Convergence Pattern

  • Converged to hybrid strategies
  • Blended cooperation and retaliation dynamically
  • Adapted policy as population dynamics shifted

Adaptive Response

  • Learned to cooperate with cooperators
  • Learned to retaliate against defectors
  • Adjusted strategy mix based on population feedback

3. Non-Linear Parameter Effects

Sensitivity Analysis

  • Small changes in:
    • Agent count
    • Interaction frequency
    • Initial population distribution
  • Produced large behavioral shifts

Implication: Multi-agent systems exhibit complex, non-linear dynamics requiring careful experimental design


Mentorship Approach

Problem Formalization

  • Guided translation of behavioral concepts into mathematical models
  • Ensured computable representations maintained theoretical validity
  • Helped define clear state spaces and action spaces

Reward Shaping Guidance

  • Critical decision: population-level vs individual-level rewards
  • Explained common RL pitfalls (reward hacking, degenerate solutions)
  • Validated reward functions against expected theoretical behavior

Experimental Design

  • Structured parameter sweeps for systematic exploration
  • Designed visualizations producing interpretable results
  • Ensured metrics answered research questions directly

Theoretical Validation

  • Connected simulation results to evolutionary game theory literature
  • Validated findings against established behavioral economics research
  • Ensured claims were defensible and properly scoped

Scientific Contributions

Demonstrated

  • Clear mapping from individual incentives to emergent collective behavior
  • Applicability of RL to evolutionary game theory questions
  • How absence of punishment affects cooperation sustainability

Produced

  • Publication-quality experimental results
  • Interpretable visualizations of population dynamics
  • Reproducible codebase for future research

Student Development Outcomes

Through this mentorship, the team:

  • Gained experience in RL formulation for complex systems
  • Learned rigorous experimental design for multi-agent research
  • Developed skills translating abstract theory to implementation
  • Understood importance of reward shaping in RL systems

Consulting Relevance

This mentorship demonstrates ability to:

  • Translate theoretical research into working ML systems
  • Design multi-agent simulations with meaningful metrics
  • Guide teams through RL formulation pitfalls
  • Mentor engineers at intersection of ML, economics, and complex systems

Directly applicable to startups building:

  • Market simulations and dynamic pricing systems
  • Agent-based economic models
  • Multi-stakeholder optimization systems
  • Adaptive decision engines with competing objectives

Completed as semester-long research mentorship project at IIT Kanpur's Brain and Cognitive Society, 2021.

Interested in working together?

Let's build something exceptional together.