Chapter 3: Multiagent Search - Artificial Intelligence

In games, the opponent is part of the environment but with their own intelligence.

Stuart Russell & Peter Norvig

3.1 Introduction¶

Multiagent environments involve multiple entities making decisions that affect each other. This chapter focuses on adversarial search where agents have conflicting goals.

Single-Agent vs. Multi-Agent¶

Single-Agent:

Agent vs. nature
Complete control over outcomes
Solution is a sequence of actions
OR nodes only

Multi-Agent:

Agent vs. intelligent opponent(s)
Cannot control opponent’s actions
Solution is a strategy (contingent plan)
AND-OR nodes (agent chooses, opponent responds)

Game Theory Basics¶

Components:

Players: Decision-makers (usually 2)
States: Game configurations
Actions: Legal moves for each player
Terminal Test: Is the game over?
Utility Function: Payoff at terminal states

Game Types:

Zero-Sum: One player’s gain is another’s loss
Perfect Information: All state info visible (chess)
Deterministic: No chance involved
Turn-Taking: Players alternate moves

3.2 Minimax Algorithm¶

Minimax is the fundamental algorithm for two-player zero-sum games with perfect information.

Core Idea¶

Assumptions:

Both players play optimally
MAX tries to maximize utility
MIN tries to minimize utility
Perfect information (both see full state)

Minimax Value:

\text{MINIMAX}(s) = \begin{cases} \text{UTILITY}(s) & \text{if } s \text{ is terminal} \\ \max_{a} \text{MINIMAX}(\text{RESULT}(s, a)) & \text{if MAX's turn} \\ \min_{a} \text{MINIMAX}(\text{RESULT}(s, a)) & \text{if MIN's turn} \end{cases}

(1)

Minimax Algorithm¶

Algorithm: MINIMAX(state, maximizing_player)

begin
    if state is terminal then
        return UTILITY(state)
    
    if maximizing_player then
        value ← -∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← max(value, MINIMAX(successor, false))
        return value
    else
        value ← +∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← min(value, MINIMAX(successor, true))
        return value
end

Properties:

Property	Value
Complete	Yes (finite games)
Optimal	Yes (vs optimal opponent)
Time	O(b^m)
Space	O(bm)

3.3 Alpha-Beta Pruning¶

Alpha-Beta pruning optimizes Minimax by eliminating branches that cannot influence the final decision.

Key Idea¶

Parameters:

α (alpha): Best value MAX can guarantee (lower bound)
β (beta): Best value MIN can guarantee (upper bound)

Pruning Condition: When α ≥ β, stop searching that branch.

Alpha-Beta Algorithm¶

Algorithm: ALPHA-BETA(state, α, β, maximizing)

begin
    if state is terminal then
        return UTILITY(state)
    
    if maximizing then
        value ← -∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← max(value, ALPHA-BETA(successor, α, β, false))
            α ← max(α, value)
            if β ≤ α then
                break  // Beta cutoff
        return value
    else
        value ← +∞
        for each action in ACTIONS(state) do
            successor ← RESULT(state, action)
            value ← min(value, ALPHA-BETA(successor, α, β, true))
            β ← min(β, value)
            if β ≤ α then
                break  // Alpha cutoff
        return value
end

Effectiveness:

Case	Complexity	Speedup
Best	O(b^(m/2))	2x depth
Average	O(b^(3m/4))	Significant
Worst	O(b^m)	None

3.4 Monte Carlo Tree Search¶

MCTS is a learning-based approach that estimates game values through random simulations.

Four Phases¶

Selection: Navigate tree using UCT formula
Expansion: Add new node to tree
Simulation: Play random game to terminal state
Backpropagation: Update statistics in visited nodes

UCT Formula¶

Upper Confidence Bound applied to Trees:

\text{UCT}(n) = \frac{w_n}{n_n} + c \sqrt{\frac{\ln N_n}{n_n}}

(2)

Where:

$w_n$ : wins from node n
$n_n$ : visits to node n
$N_n$ : visits to parent
$c$ : exploration constant (typically 1.41)

First term: Exploitation (prefer high win rate)

Second term: Exploration (prefer less-visited)

MCTS Algorithm¶

Algorithm: MCTS(root_state, num_simulations)

begin
    root ← new MCTSNode(root_state)
    
    for i = 1 to num_simulations do
        // 1. Selection
        node ← root
        while node is fully expanded and not terminal do
            node ← SELECT-CHILD(node) using UCT
        
        // 2. Expansion
        if node is not terminal then
            node ← EXPAND(node) with random untried action
        
        // 3. Simulation
        result ← SIMULATE(node) // Random playout
        
        // 4. Backpropagation
        while node is not null do
            UPDATE-STATS(node, result)
            node ← node.parent
    
    return action of most visited child of root
end

Summary¶

Algorithm Comparison¶

Algorithm	Type	Time	Optimality	Best For
Minimax	Deductive	O(b^m)	Yes	Small games
Alpha-Beta	Deductive	O(b^(m/2))	Yes	Chess-like
MCTS	Inductive	Anytime	Converges	Large branching

Key Takeaways¶

Minimax = Perfect Play: Assumes optimal opponents
Alpha-Beta = Smart Pruning: Doubles search depth
MCTS = Learning: No evaluation function needed
Hybrid Approaches: Modern AI combines search + learning

3.5 Implementation¶

Complete Python implementations are provided in a separate notebook.

Navigate to: Chapter 3: Multiagent Search - Implementation

The implementation notebook includes:

GameState abstract class
Minimax algorithm
Alpha-Beta pruning
MCTS implementation
Tic-Tac-Toe example
Connect Four game
Performance comparisons