In games, the opponent is part of the environment but with their own intelligence.
Stuart Russell & Peter Norvig
3.1 Introduction¶
Multiagent environments involve multiple entities making decisions that affect each other. This chapter focuses on adversarial search where agents have conflicting goals.
Single-Agent vs. Multi-Agent¶
Single-Agent:
Agent vs. nature
Complete control over outcomes
Solution is a sequence of actions
OR nodes only
Multi-Agent:
Agent vs. intelligent opponent(s)
Cannot control opponent’s actions
Solution is a strategy (contingent plan)
AND-OR nodes (agent chooses, opponent responds)
Game Theory Basics¶
Components:
Players: Decision-makers (usually 2)
States: Game configurations
Actions: Legal moves for each player
Terminal Test: Is the game over?
Utility Function: Payoff at terminal states
Game Types:
Zero-Sum: One player’s gain is another’s loss
Perfect Information: All state info visible (chess)
Deterministic: No chance involved
Turn-Taking: Players alternate moves
3.2 Minimax Algorithm¶
Minimax is the fundamental algorithm for two-player zero-sum games with perfect information.
Core Idea¶
Assumptions:
Both players play optimally
MAX tries to maximize utility
MIN tries to minimize utility
Perfect information (both see full state)
Minimax Value:
Minimax Algorithm¶
Algorithm: MINIMAX(state, maximizing_player)
begin
if state is terminal then
return UTILITY(state)
if maximizing_player then
value ← -∞
for each action in ACTIONS(state) do
successor ← RESULT(state, action)
value ← max(value, MINIMAX(successor, false))
return value
else
value ← +∞
for each action in ACTIONS(state) do
successor ← RESULT(state, action)
value ← min(value, MINIMAX(successor, true))
return value
endProperties:
| Property | Value |
|---|---|
| Complete | Yes (finite games) |
| Optimal | Yes (vs optimal opponent) |
| Time | O(b^m) |
| Space | O(bm) |
3.3 Alpha-Beta Pruning¶
Alpha-Beta pruning optimizes Minimax by eliminating branches that cannot influence the final decision.
Key Idea¶
Parameters:
α (alpha): Best value MAX can guarantee (lower bound)
β (beta): Best value MIN can guarantee (upper bound)
Pruning Condition: When α ≥ β, stop searching that branch.
Alpha-Beta Algorithm¶
Algorithm: ALPHA-BETA(state, α, β, maximizing)
begin
if state is terminal then
return UTILITY(state)
if maximizing then
value ← -∞
for each action in ACTIONS(state) do
successor ← RESULT(state, action)
value ← max(value, ALPHA-BETA(successor, α, β, false))
α ← max(α, value)
if β ≤ α then
break // Beta cutoff
return value
else
value ← +∞
for each action in ACTIONS(state) do
successor ← RESULT(state, action)
value ← min(value, ALPHA-BETA(successor, α, β, true))
β ← min(β, value)
if β ≤ α then
break // Alpha cutoff
return value
endEffectiveness:
| Case | Complexity | Speedup |
|---|---|---|
| Best | O(b^(m/2)) | 2x depth |
| Average | O(b^(3m/4)) | Significant |
| Worst | O(b^m) | None |
3.4 Monte Carlo Tree Search¶
MCTS is a learning-based approach that estimates game values through random simulations.
Four Phases¶
Selection: Navigate tree using UCT formula
Expansion: Add new node to tree
Simulation: Play random game to terminal state
Backpropagation: Update statistics in visited nodes
UCT Formula¶
Upper Confidence Bound applied to Trees:
Where:
: wins from node n
: visits to node n
: visits to parent
: exploration constant (typically 1.41)
First term: Exploitation (prefer high win rate)
Second term: Exploration (prefer less-visited)
MCTS Algorithm¶
Algorithm: MCTS(root_state, num_simulations)
begin
root ← new MCTSNode(root_state)
for i = 1 to num_simulations do
// 1. Selection
node ← root
while node is fully expanded and not terminal do
node ← SELECT-CHILD(node) using UCT
// 2. Expansion
if node is not terminal then
node ← EXPAND(node) with random untried action
// 3. Simulation
result ← SIMULATE(node) // Random playout
// 4. Backpropagation
while node is not null do
UPDATE-STATS(node, result)
node ← node.parent
return action of most visited child of root
endSummary¶
Algorithm Comparison¶
| Algorithm | Type | Time | Optimality | Best For |
|---|---|---|---|---|
| Minimax | Deductive | O(b^m) | Yes | Small games |
| Alpha-Beta | Deductive | O(b^(m/2)) | Yes | Chess-like |
| MCTS | Inductive | Anytime | Converges | Large branching |
Key Takeaways¶
Minimax = Perfect Play: Assumes optimal opponents
Alpha-Beta = Smart Pruning: Doubles search depth
MCTS = Learning: No evaluation function needed
Hybrid Approaches: Modern AI combines search + learning
3.5 Implementation¶
Complete Python implementations are provided in a separate notebook.
Navigate to: Chapter 3: Multiagent Search - Implementation
The implementation notebook includes:
GameState abstract class
Minimax algorithm
Alpha-Beta pruning
MCTS implementation
Tic-Tac-Toe example
Connect Four game
Performance comparisons
Further Reading¶
Textbooks¶
Aggarwal, C. C. (2021). Artificial Intelligence: A Textbook. [Chapter 3]
Russell, S., & Norvig, P. (2020). AI: A Modern Approach (4th ed.). [Chapter 5]
Seminal Papers¶
Shannon, C. E. (1950). Programming a computer for playing chess.
Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo Planning.
Modern Breakthroughs¶
Silver, D., et al. (2016). Mastering the game of Go with deep neural networks.
Silver, D., et al. (2017). Mastering Chess and Shogi by Self-Play.