Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 3: Multiagent Search

In games, the opponent is part of the environment but with their own intelligence.

Stuart Russell & Peter Norvig

3.1 Introduction

Multiagent environments involve multiple entities making decisions that affect each other. This chapter focuses on adversarial search where agents have conflicting goals.

Single-Agent vs. Multi-Agent

Single-Agent:

  • Agent vs. nature

  • Complete control over outcomes

  • Solution is a sequence of actions

  • OR nodes only

Multi-Agent:

  • Agent vs. intelligent opponent(s)

  • Cannot control opponent’s actions

  • Solution is a strategy (contingent plan)

  • AND-OR nodes (agent chooses, opponent responds)

Game Theory Basics

Components:

  1. Players: Decision-makers (usually 2)

  2. States: Game configurations

  3. Actions: Legal moves for each player

  4. Terminal Test: Is the game over?

  5. Utility Function: Payoff at terminal states

Game Types:

  • Zero-Sum: One player’s gain is another’s loss

  • Perfect Information: All state info visible (chess)

  • Deterministic: No chance involved

  • Turn-Taking: Players alternate moves

3.2 Minimax Algorithm

Minimax is the fundamental algorithm for two-player zero-sum games with perfect information.

Core Idea

Assumptions:

  1. Both players play optimally

  2. MAX tries to maximize utility

  3. MIN tries to minimize utility

  4. Perfect information (both see full state)

Minimax Value:

MINIMAX(s)={UTILITY(s)if s is terminalmaxaMINIMAX(RESULT(s,a))if MAX’s turnminaMINIMAX(RESULT(s,a))if MIN’s turn\text{MINIMAX}(s) = \begin{cases} \text{UTILITY}(s) & \text{if } s \text{ is terminal} \\ \max_{a} \text{MINIMAX}(\text{RESULT}(s, a)) & \text{if MAX's turn} \\ \min_{a} \text{MINIMAX}(\text{RESULT}(s, a)) & \text{if MIN's turn} \end{cases}

Minimax Algorithm

Algorithm: MINIMAX(state, maximizing_player)

begin
    if state is terminal then
        return UTILITY(state)
    
    if maximizing_player then
        value ← -∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← max(value, MINIMAX(successor, false))
        return value
    else
        value ← +∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← min(value, MINIMAX(successor, true))
        return value
end

Properties:

PropertyValue
CompleteYes (finite games)
OptimalYes (vs optimal opponent)
TimeO(b^m)
SpaceO(bm)

3.3 Alpha-Beta Pruning

Alpha-Beta pruning optimizes Minimax by eliminating branches that cannot influence the final decision.

Key Idea

Parameters:

  • α (alpha): Best value MAX can guarantee (lower bound)

  • β (beta): Best value MIN can guarantee (upper bound)

Pruning Condition: When α ≥ β, stop searching that branch.

Alpha-Beta Algorithm

Algorithm: ALPHA-BETA(state, α, β, maximizing)

begin
    if state is terminal then
        return UTILITY(state)
    
    if maximizing then
        value ← -∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← max(value, ALPHA-BETA(successor, α, β, false))
            α ← max(α, value)
            if β ≤ α then
                break  // Beta cutoff
        return value
    else
        value ← +∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← min(value, ALPHA-BETA(successor, α, β, true))
            β ← min(β, value)
            if β ≤ α then
                break  // Alpha cutoff
        return value
end

Effectiveness:

CaseComplexitySpeedup
BestO(b^(m/2))2x depth
AverageO(b^(3m/4))Significant
WorstO(b^m)None

MCTS is a learning-based approach that estimates game values through random simulations.

Four Phases

  1. Selection: Navigate tree using UCT formula

  2. Expansion: Add new node to tree

  3. Simulation: Play random game to terminal state

  4. Backpropagation: Update statistics in visited nodes

UCT Formula

Upper Confidence Bound applied to Trees:

UCT(n)=wnnn+clnNnnn\text{UCT}(n) = \frac{w_n}{n_n} + c \sqrt{\frac{\ln N_n}{n_n}}

Where:

  • wnw_n: wins from node n

  • nnn_n: visits to node n

  • NnN_n: visits to parent

  • cc: exploration constant (typically 1.41)

First term: Exploitation (prefer high win rate)

Second term: Exploration (prefer less-visited)

MCTS Algorithm

Algorithm: MCTS(root_state, num_simulations)

begin
    root ← new MCTSNode(root_state)
    
    for i = 1 to num_simulations do
        // 1. Selection
        node ← root
        while node is fully expanded and not terminal do
            node ← SELECT-CHILD(node) using UCT
        
        // 2. Expansion
        if node is not terminal then
            node ← EXPAND(node) with random untried action
        
        // 3. Simulation
        result ← SIMULATE(node) // Random playout
        
        // 4. Backpropagation
        while node is not null do
            UPDATE-STATS(node, result)
            node ← node.parent
    
    return action of most visited child of root
end

Summary

Algorithm Comparison

AlgorithmTypeTimeOptimalityBest For
MinimaxDeductiveO(b^m)YesSmall games
Alpha-BetaDeductiveO(b^(m/2))YesChess-like
MCTSInductiveAnytimeConvergesLarge branching

Key Takeaways

  1. Minimax = Perfect Play: Assumes optimal opponents

  2. Alpha-Beta = Smart Pruning: Doubles search depth

  3. MCTS = Learning: No evaluation function needed

  4. Hybrid Approaches: Modern AI combines search + learning

3.5 Implementation

Complete Python implementations are provided in a separate notebook.

Navigate to: Chapter 3: Multiagent Search - Implementation

The implementation notebook includes:

  • GameState abstract class

  • Minimax algorithm

  • Alpha-Beta pruning

  • MCTS implementation

  • Tic-Tac-Toe example

  • Connect Four game

  • Performance comparisons

Further Reading

Textbooks

  • Aggarwal, C. C. (2021). Artificial Intelligence: A Textbook. [Chapter 3]

  • Russell, S., & Norvig, P. (2020). AI: A Modern Approach (4th ed.). [Chapter 5]

Seminal Papers

  • Shannon, C. E. (1950). Programming a computer for playing chess.

  • Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo Planning.

Modern Breakthroughs

  • Silver, D., et al. (2016). Mastering the game of Go with deep neural networks.

  • Silver, D., et al. (2017). Mastering Chess and Shogi by Self-Play.