Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 3: Multiagent Search - Implementation

Complete Python implementations of game-playing algorithms with interactive exploration.

Contents

  1. Framework

  2. Minimax

  3. Alpha-Beta Pruning

  4. MCTS

  5. Tic-Tac-Toe

  6. Connect Four

  7. Comparisons

  8. Interactive Playground

  9. Guided Mini-Labs

  10. Summary

1. Framework {#framework}

This section defines a common interface used by all game environments in the notebook. A shared interface is important because it lets us plug different games into the same search algorithms without rewriting minimax, alpha-beta, or MCTS each time.

What the framework standardizes

  • get_legal_actions(player): returns valid moves.

  • get_successor(action): returns the next game state after a move.

  • is_terminal(): checks whether the game is over.

  • utility(player): returns payoff from a player’s perspective.

  • current_player(): indicates whose turn it is.

After running the next two code cells, every algorithm in later sections can work with any class that follows this interface.

import numpy as np
import math
import random
import time
import inspect
from typing import List, Optional, Tuple, Any, Dict
from abc import ABC, abstractmethod
from copy import deepcopy

from IPython.display import display, clear_output, Markdown
import ipywidgets as widgets

print('Libraries imported (including ipywidgets for interactivity)')
Libraries imported (including ipywidgets for interactivity)
class GameState(ABC):
    @abstractmethod
    def get_legal_actions(self, player: int) -> List[Any]:
        pass
    
    @abstractmethod
    def get_successor(self, action: Any):
        pass
    
    @abstractmethod
    def is_terminal(self) -> bool:
        pass
    
    @abstractmethod
    def utility(self, player: int) -> float:
        pass
    
    @abstractmethod
    def current_player(self) -> int:
        pass
    
    def display(self):
        print(self)

print('GameState framework defined')
GameState framework defined

2. Minimax {#minimax}

Minimax is the baseline adversarial search method for deterministic two-player zero-sum games. It assumes both players act optimally.

Why this implementation matters

  • It uses a fixed root-player perspective so returned values are consistent.

  • It supports optional evaluation functions for depth-limited search.

  • It tracks metrics (nodes, cutoff_evals) to make complexity visible.

When you run demonstrations later, compare minimax metrics against alpha-beta to understand pruning benefits.

def _call_eval(eval_fn, state, root_player):
    if eval_fn is None:
        return state.utility(root_player)
    try:
        return eval_fn(state, root_player)
    except TypeError:
        return eval_fn(state)


def minimax(state, depth=0, maximizing=True, eval_fn=None, max_depth=float('inf'),
            root_player=None, stats=None, return_metrics=False):
    """Minimax from a fixed root-player perspective with optional instrumentation."""
    if root_player is None:
        root_player = state.current_player()
    if stats is None:
        stats = {'nodes': 0, 'cutoff_evals': 0}

    stats['nodes'] += 1

    if state.is_terminal():
        value = state.utility(root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    if depth >= max_depth:
        stats['cutoff_evals'] += 1
        value = _call_eval(eval_fn, state, root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    player = state.current_player()
    actions = state.get_legal_actions(player)
    if not actions:
        value = state.utility(root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    if maximizing:
        best_value = float('-inf')
        best_action = None
        for action in actions:
            successor = state.get_successor(action)
            child_value, _ = minimax(
                successor, depth + 1, False, eval_fn, max_depth,
                root_player=root_player, stats=stats, return_metrics=False
            )
            if child_value > best_value:
                best_value = child_value
                best_action = action
    else:
        best_value = float('inf')
        best_action = None
        for action in actions:
            successor = state.get_successor(action)
            child_value, _ = minimax(
                successor, depth + 1, True, eval_fn, max_depth,
                root_player=root_player, stats=stats, return_metrics=False
            )
            if child_value < best_value:
                best_value = child_value
                best_action = action

    if depth == 0 and return_metrics:
        return best_value, best_action, stats
    return best_value, best_action


print('Minimax implemented with fixed-perspective scoring and metrics')
Minimax implemented with fixed-perspective scoring and metrics

3. Alpha-Beta Pruning {#alphabeta}

Alpha-beta pruning is an optimization of minimax that preserves the same final decision while exploring fewer nodes.

What to look for in this code

  • Same game-theoretic objective as minimax.

  • Bound tracking with alpha and beta.

  • Instrumentation for prunes so speedups are measurable.

In later test cells, focus on node count reduction and runtime improvement relative to minimax at the same depth.

def alpha_beta(state, depth=0, alpha=float('-inf'), beta=float('inf'),
               maximizing=True, eval_fn=None, max_depth=float('inf'),
               root_player=None, stats=None, return_metrics=False):
    """Alpha-beta from a fixed root-player perspective with instrumentation."""
    if root_player is None:
        root_player = state.current_player()
    if stats is None:
        stats = {'nodes': 0, 'cutoff_evals': 0, 'prunes': 0}

    stats['nodes'] += 1

    if state.is_terminal():
        value = state.utility(root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    if depth >= max_depth:
        stats['cutoff_evals'] += 1
        value = _call_eval(eval_fn, state, root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    player = state.current_player()
    actions = state.get_legal_actions(player)
    if not actions:
        value = state.utility(root_player)
        if depth == 0 and return_metrics:
            return value, None, stats
        return value, None

    if maximizing:
        value = float('-inf')
        best_action = None
        for action in actions:
            successor = state.get_successor(action)
            child_value, _ = alpha_beta(
                successor, depth + 1, alpha, beta, False, eval_fn, max_depth,
                root_player=root_player, stats=stats, return_metrics=False
            )
            if child_value > value:
                value = child_value
                best_action = action
            alpha = max(alpha, value)
            if beta <= alpha:
                stats['prunes'] += 1
                break
    else:
        value = float('inf')
        best_action = None
        for action in actions:
            successor = state.get_successor(action)
            child_value, _ = alpha_beta(
                successor, depth + 1, alpha, beta, True, eval_fn, max_depth,
                root_player=root_player, stats=stats, return_metrics=False
            )
            if child_value < value:
                value = child_value
                best_action = action
            beta = min(beta, value)
            if beta <= alpha:
                stats['prunes'] += 1
                break

    if depth == 0 and return_metrics:
        return value, best_action, stats
    return value, best_action


print('Alpha-Beta implemented with fixed-perspective scoring and metrics')
Alpha-Beta implemented with fixed-perspective scoring and metrics

4. MCTS {#mcts}

Monte Carlo Tree Search (MCTS) is useful when full search is too expensive or heuristic design is difficult. Instead of exhaustive expansion, it estimates promising actions via repeated simulations.

Implementation highlights

  • UCT-based child selection balances exploration and exploitation.

  • Random rollouts estimate outcomes from frontier states.

  • Backpropagation accumulates statistics at visited nodes.

  • Metrics (simulations, expansions, tree_nodes) show search effort.

Unlike minimax and alpha-beta, MCTS can improve progressively as you increase the simulation budget.

class MCTSNode:
    def __init__(self, state, parent=None, action=None):
        self.state = state
        self.parent = parent
        self.action = action
        self.children = []
        self.visits = 0
        self.wins = 0.0

    def is_fully_expanded(self):
        player = self.state.current_player()
        legal_actions = self.state.get_legal_actions(player)
        return len(self.children) == len(legal_actions)

    def best_child(self, c=1.41):
        parent_visits = max(1, self.visits)

        def uct(node):
            if node.visits == 0:
                return float('inf')
            exploit = node.wins / node.visits
            explore = c * math.sqrt(math.log(parent_visits) / node.visits)
            return exploit + explore

        return max(self.children, key=uct)

    def expand(self):
        tried = {child.action for child in self.children}
        player = self.state.current_player()
        legal = self.state.get_legal_actions(player)
        untried = [a for a in legal if a not in tried]
        action = random.choice(untried)
        next_state = self.state.get_successor(action)
        child = MCTSNode(next_state, self, action)
        self.children.append(child)
        return child


def mcts(root_state, num_sims=1000, c=1.41, verbose=True, return_stats=False):
    root = MCTSNode(root_state)
    root_player = root_state.current_player()
    stats = {'simulations': 0, 'expansions': 0, 'tree_nodes': 1}

    if root_state.is_terminal():
        if return_stats:
            return None, stats
        return None

    for _ in range(num_sims):
        node = root

        # Selection
        while node.is_fully_expanded() and not node.state.is_terminal() and node.children:
            node = node.best_child(c)

        # Expansion
        if not node.state.is_terminal() and not node.is_fully_expanded():
            node = node.expand()
            stats['expansions'] += 1
            stats['tree_nodes'] += 1

        # Simulation (rollout)
        sim_state = deepcopy(node.state)
        while not sim_state.is_terminal():
            player = sim_state.current_player()
            actions = sim_state.get_legal_actions(player)
            if not actions:
                break
            action = random.choice(actions)
            sim_state = sim_state.get_successor(action)

        result = sim_state.utility(root_player)
        stats['simulations'] += 1

        # Backpropagation: keep root-player perspective
        while node is not None:
            node.visits += 1
            node.wins += (result + 1) / 2.0
            node = node.parent

    best = max(root.children, key=lambda n: n.visits) if root.children else None
    best_action = best.action if best else None

    if verbose and best is not None:
        print(f'MCTS: {num_sims} sims, best action visits={best.visits}')

    if return_stats:
        return best_action, stats
    return best_action


print('MCTS implemented with consistent root perspective and metrics')
MCTS implemented with consistent root perspective and metrics

5. Tic-Tac-Toe {#tictactoe}

Tic-Tac-Toe is a compact environment for validating algorithm correctness. Because the state space is small, it is ideal for comparing complete-search methods and simulation methods in a controlled setting.

Why it is useful pedagogically

  • Rules are simple and transparent.

  • Minimax can often search deeply enough for near-complete analysis.

  • Differences in runtime and node counts are easy to observe.

After defining the game class and evaluation function, the mini-lab demonstrates performance behavior for each algorithm.

class TicTacToe(GameState):
    def __init__(self, board=None, player=1):
        self.board = np.zeros((3,3), dtype=int) if board is None else board.copy()
        self.player = player
    
    def get_legal_actions(self, player):
        return [(i,j) for i in range(3) for j in range(3) if self.board[i,j]==0]
    
    def get_successor(self, action):
        new_state = TicTacToe(self.board, -self.player)
        new_state.board[action] = self.player
        return new_state
    
    def is_terminal(self):
        return self.get_winner() is not None or len(self.get_legal_actions(1))==0
    
    def get_winner(self):
        # Check rows, cols, diagonals
        for i in range(3):
            if abs(self.board[i,:].sum())==3: return self.board[i,0]
        for j in range(3):
            if abs(self.board[:,j].sum())==3: return self.board[0,j]
        if abs(np.trace(self.board))==3: return self.board[0,0]
        if abs(np.trace(np.fliplr(self.board)))==3: return self.board[0,2]
        return None
    
    def utility(self, player):
        winner = self.get_winner()
        if winner == player: return 1.0
        if winner == -player: return -1.0
        return 0.0
    
    def current_player(self):
        return self.player
    
    def display(self):
        symbols = {1:'X', -1:'O', 0:'.'}
        for i in range(3):
            print(' '.join(symbols[self.board[i,j]] for j in range(3)))
        print()


def ttt_eval(state, root_player=None):
    if root_player is None:
        root_player = state.current_player()
    board = state.board
    lines = []
    lines.extend([board[i, :] for i in range(3)])
    lines.extend([board[:, j] for j in range(3)])
    lines.append(np.array([board[i, i] for i in range(3)]))
    lines.append(np.array([board[i, 2 - i] for i in range(3)]))

    opp = -root_player
    score = 0.0
    for line in lines:
        root_count = np.sum(line == root_player)
        opp_count = np.sum(line == opp)
        empty = np.sum(line == 0)
        if root_count == 2 and empty == 1:
            score += 3
        if root_count == 1 and empty == 2:
            score += 1
        if opp_count == 2 and empty == 1:
            score -= 3
        if opp_count == 1 and empty == 2:
            score -= 1
    return score


print('Tic-Tac-Toe and ttt_eval implemented')
Tic-Tac-Toe and ttt_eval implemented

Mini-Lab A: Depth vs Search Cost (Tic-Tac-Toe)

Hypothesis: deeper search improves decision quality but increases search cost. This lab measures that tradeoff directly.

How to read the output

  • Minimax nodes: total nodes visited without pruning.

  • Alpha-Beta nodes: total visited with pruning.

  • Alpha-Beta prunes: number of cut branches.

Discussion prompt

At equal depth, does alpha-beta preserve the same decision while reducing explored nodes? If yes, by how much as depth grows?

print('=== Tic-Tac-Toe Metrics Demo ===')
game = TicTacToe()
print('Initial:')
game.display()

print('Minimax (depth 9):')
start = time.time()
val, action, metrics = minimax(
    game, maximizing=True, eval_fn=ttt_eval, max_depth=9, return_metrics=True
 )
print(f'Best move: {action}, value: {val:.2f}, time: {time.time()-start:.4f}s')
print(f"Nodes: {metrics['nodes']}, Cutoff evals: {metrics['cutoff_evals']}")

print('\nAlpha-Beta (depth 9):')
start = time.time()
val, action, metrics = alpha_beta(
    game, maximizing=True, eval_fn=ttt_eval, max_depth=9, return_metrics=True
 )
print(f'Best move: {action}, value: {val:.2f}, time: {time.time()-start:.4f}s')
print(f"Nodes: {metrics['nodes']}, Cutoff evals: {metrics['cutoff_evals']}, Prunes: {metrics['prunes']}")

print('\nMCTS (1000 sims):')
start = time.time()
action, mstats = mcts(game, num_sims=1000, verbose=False, return_stats=True)
print(f'Best move: {action}, time: {time.time()-start:.4f}s')
print(f"Simulations: {mstats['simulations']}, Expansions: {mstats['expansions']}, Tree nodes: {mstats['tree_nodes']}")
=== Tic-Tac-Toe Metrics Demo ===
Initial:
. . .
. . .
. . .

Minimax (depth 9):
Best move: (0, 0), value: 0.00, time: 14.6891s
Nodes: 549946, Cutoff evals: 0

Alpha-Beta (depth 9):
Best move: (0, 0), value: 0.00, time: 0.6695s
Nodes: 18297, Cutoff evals: 0, Prunes: 8180

MCTS (1000 sims):
Best move: (1, 1), time: 0.3795s
Simulations: 1000, Expansions: 1000, Tree nodes: 1001

6. Connect Four {#connect4}

Connect Four has a larger branching factor and deeper tactical dependencies than Tic-Tac-Toe, making it a more realistic stress test for search algorithms.

Why this section is important

  • Complete minimax search is usually infeasible from the initial state.

  • Depth-limited alpha-beta with a heuristic becomes necessary.

  • MCTS is often competitive because it allocates effort adaptively.

Use this section to see how algorithm behavior changes when the game becomes more complex.

class ConnectFour(GameState):
    def __init__(self, board=None, player=1):
        self.rows, self.cols = 6, 7
        self.board = np.zeros((self.rows,self.cols), dtype=int) if board is None else board.copy()
        self.player = player
    
    def get_legal_actions(self, player):
        return [c for c in range(self.cols) if self.board[0,c]==0]
    
    def get_successor(self, action):
        col = action
        new_state = ConnectFour(self.board, -self.player)
        for row in range(self.rows-1, -1, -1):
            if new_state.board[row,col] == 0:
                new_state.board[row,col] = self.player
                break
        return new_state
    
    def is_terminal(self):
        return self.check_win() or len(self.get_legal_actions(1))==0
    
    def check_win(self):
        # Check horizontal, vertical, diagonal for 4 in a row
        for r in range(self.rows):
            for c in range(self.cols-3):
                if abs(self.board[r,c:c+4].sum())==4 and self.board[r,c]!=0:
                    return True
        for r in range(self.rows-3):
            for c in range(self.cols):
                if abs(self.board[r:r+4,c].sum())==4 and self.board[r,c]!=0:
                    return True
        for r in range(self.rows-3):
            for c in range(self.cols-3):
                diag = [self.board[r+i,c+i] for i in range(4)]
                if abs(sum(diag))==4 and diag[0]!=0:
                    return True
        for r in range(3,self.rows):
            for c in range(self.cols-3):
                diag = [self.board[r-i,c+i] for i in range(4)]
                if abs(sum(diag))==4 and diag[0]!=0:
                    return True
        return False
    
    def utility(self, player):
        if self.check_win():
            return 1.0 if self.player==-player else -1.0
        return 0.0
    
    def current_player(self):
        return self.player
    
    def display(self):
        symbols = {1:'X', -1:'O', 0:'.'}
        for i in range(self.rows):
            print(' '.join(symbols[self.board[i,j]] for j in range(self.cols)))
        print(' '.join(str(i) for i in range(self.cols)))
        print()

print('Connect Four implemented')
Connect Four implemented

Evaluation Function for Connect Four

This heuristic estimates state quality from the root player’s perspective.

Scoring intuition

  • Reward center control because central columns create more 4-in-a-row opportunities.

  • Reward 2-in-a-row and 3-in-a-row patterns with open extension spaces.

  • Penalize opponent threats to encourage blocking behavior.

The heuristic is intentionally simple so students can modify weights and observe strategy changes.

def ttt_eval(state, root_player=None):
    if root_player is None:
        root_player = state.current_player()
    board = state.board
    lines = []
    lines.extend([board[i, :] for i in range(3)])
    lines.extend([board[:, j] for j in range(3)])
    lines.append(np.array([board[i, i] for i in range(3)]))
    lines.append(np.array([board[i, 2 - i] for i in range(3)]))

    opp = -root_player
    score = 0.0
    for line in lines:
        root_count = np.sum(line == root_player)
        opp_count = np.sum(line == opp)
        empty = np.sum(line == 0)
        if root_count == 2 and empty == 1:
            score += 3
        if root_count == 1 and empty == 2:
            score += 1
        if opp_count == 2 and empty == 1:
            score -= 3
        if opp_count == 1 and empty == 2:
            score -= 1
    return score


def connect4_eval(state, root_player=None):
    if root_player is None:
        root_player = state.current_player()
    opp = -root_player

    def evaluate_window(window):
        root_count = window.count(root_player)
        opp_count = window.count(opp)
        empty = window.count(0)
        if root_count == 4:
            return 100
        if root_count == 3 and empty == 1:
            return 8
        if root_count == 2 and empty == 2:
            return 2
        if opp_count == 3 and empty == 1:
            return -9
        if opp_count == 2 and empty == 2:
            return -2
        return 0

    score = 0
    center_col = state.cols // 2
    center_array = [int(state.board[r, center_col]) for r in range(state.rows)]
    score += center_array.count(root_player) * 3

    # Horizontal
    for r in range(state.rows):
        row = [int(x) for x in state.board[r, :]]
        for c in range(state.cols - 3):
            score += evaluate_window(row[c:c + 4])

    # Vertical
    for c in range(state.cols):
        col = [int(state.board[r, c]) for r in range(state.rows)]
        for r in range(state.rows - 3):
            score += evaluate_window(col[r:r + 4])

    # Diagonals
    for r in range(state.rows - 3):
        for c in range(state.cols - 3):
            diag1 = [int(state.board[r + i, c + i]) for i in range(4)]
            diag2 = [int(state.board[r + 3 - i, c + i]) for i in range(4)]
            score += evaluate_window(diag1)
            score += evaluate_window(diag2)

    return float(score)


print('Evaluation functions defined for Tic-Tac-Toe and Connect Four')
Evaluation functions defined for Tic-Tac-Toe and Connect Four

Mini-Lab B: Simulations vs Stability (MCTS)

Hypothesis: increasing simulations improves stability of selected actions and confidence.

How to run this lab

  • Keep the same initial state.

  • Run repeated MCTS calls at one simulation budget.

  • Increase simulations and compare action frequency concentration.

A more concentrated action distribution at higher budgets suggests lower variance in search estimates.

print('=== Connect Four Metrics Demo ===')
c4 = ConnectFour()
print('Initial:')
c4.display()

print('Alpha-Beta (depth 4):')
start = time.time()
val, action, metrics = alpha_beta(
    c4, maximizing=True, eval_fn=connect4_eval, max_depth=4, return_metrics=True
 )
print(f'Best column: {action}, value: {val:.2f}, time: {time.time()-start:.4f}s')
print(f"Nodes: {metrics['nodes']}, Cutoff evals: {metrics['cutoff_evals']}, Prunes: {metrics['prunes']}")

print('\nMCTS (500 sims):')
start = time.time()
action, mstats = mcts(c4, num_sims=500, verbose=False, return_stats=True)
print(f'Best column: {action}, time: {time.time()-start:.4f}s')
print(f"Simulations: {mstats['simulations']}, Expansions: {mstats['expansions']}, Tree nodes: {mstats['tree_nodes']}")
=== Connect Four Metrics Demo ===
Initial:
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
0 1 2 3 4 5 6

Alpha-Beta (depth 4):
Best column: 3, value: 4.00, time: 0.2064s
Nodes: 737, Cutoff evals: 540, Prunes: 137

MCTS (500 sims):
Best column: 4, time: 2.0687s
Simulations: 500, Expansions: 500, Tree nodes: 501

7. Comparisons {#comparisons}

This section summarizes practical algorithm tradeoffs after the experiments.

What to compare

  • Decision quality under the same compute budget.

  • Runtime and node/simulation counts.

  • Sensitivity to heuristic quality and parameter settings.

Use these comparisons to justify algorithm choice by problem characteristics, not by preference.

print('=== Algorithm Comparison ===')
print()
print('Tic-Tac-Toe:')
print('  Minimax: Complete search, slow')
print('  Alpha-Beta: 2-10x faster than Minimax')
print('  MCTS: Fast, good for learning')
print()
print('Connect Four:')
print('  Alpha-Beta + eval: Limited depth (4-6 plies)')
print('  MCTS: Handles large branching factor')
print()
print('Key Insights:')
print('  - Alpha-Beta essential for deep search')
print('  - MCTS great when eval function hard to design')
print('  - Modern: Combine both (AlphaZero)')
=== Algorithm Comparison ===

Tic-Tac-Toe:
  Minimax: Complete search, slow
  Alpha-Beta: 2-10x faster than Minimax
  MCTS: Fast, good for learning

Connect Four:
  Alpha-Beta + eval: Limited depth (4-6 plies)
  MCTS: Handles large branching factor

Key Insights:
  - Alpha-Beta essential for deep search
  - MCTS great when eval function hard to design
  - Modern: Combine both (AlphaZero)

8. Interactive Playground {#interactive-playground}

Use this playground to run controlled experiments without editing code manually.

Controls

  • Game: switch between Tic-Tac-Toe and Connect Four.

  • Algorithm: choose Minimax, Alpha-Beta, or MCTS.

  • Depth / Sims / UCT c: tune search horizon and simulation behavior.

Suggested experiments

  • Compare Minimax vs Alpha-Beta at equal depth.

  • Increase MCTS simulations and observe action stability.

  • Test Connect Four with shallow vs deeper search to see heuristic effects.

Each run reports the chosen action, metrics, and resulting next state so decisions are easy to inspect.

def _state_to_text(state):
    symbols = {1: 'X', -1: 'O', 0: '.'}
    if isinstance(state, TicTacToe):
        rows = [' '.join(symbols[state.board[i, j]] for j in range(3)) for i in range(3)]
        return '\n'.join(rows)
    if isinstance(state, ConnectFour):
        rows = [' '.join(symbols[state.board[i, j]] for j in range(state.cols)) for i in range(state.rows)]
        footer = ' '.join(str(i) for i in range(state.cols))
        return '\n'.join(rows + [footer])
    return str(state)


def run_algorithm_once(state, algorithm, depth, sims, c):
    if algorithm == 'Minimax':
        value, action, metrics = minimax(
            state, maximizing=True, eval_fn=ttt_eval if isinstance(state, TicTacToe) else connect4_eval,
            max_depth=depth, return_metrics=True
        )
        return {'algorithm': algorithm, 'action': action, 'value': value, 'metrics': metrics}

    if algorithm == 'Alpha-Beta':
        value, action, metrics = alpha_beta(
            state, maximizing=True, eval_fn=ttt_eval if isinstance(state, TicTacToe) else connect4_eval,
            max_depth=depth, return_metrics=True
        )
        return {'algorithm': algorithm, 'action': action, 'value': value, 'metrics': metrics}

    action, metrics = mcts(state, num_sims=sims, c=c, verbose=False, return_stats=True)
    return {'algorithm': algorithm, 'action': action, 'value': None, 'metrics': metrics}


def interactive_playground():
    game_dd = widgets.Dropdown(options=['Tic-Tac-Toe', 'Connect Four'], value='Tic-Tac-Toe', description='Game:')
    algo_dd = widgets.Dropdown(options=['Minimax', 'Alpha-Beta', 'MCTS'], value='Alpha-Beta', description='Algorithm:')
    depth_slider = widgets.IntSlider(value=5, min=1, max=8, step=1, description='Depth:')
    sims_slider = widgets.IntSlider(value=1000, min=100, max=5000, step=100, description='Sims:')
    c_slider = widgets.FloatSlider(value=1.41, min=0.5, max=2.5, step=0.05, description='UCT c:')
    run_btn = widgets.Button(description='Run Experiment', button_style='success')
    out = widgets.Output()

    def on_run(_):
        with out:
            clear_output(wait=True)
            state = TicTacToe() if game_dd.value == 'Tic-Tac-Toe' else ConnectFour()
            print('Initial State:')
            print(_state_to_text(state))
            print()

            result = run_algorithm_once(
                state, algo_dd.value, depth_slider.value, sims_slider.value, c_slider.value
            )

            print(f"Algorithm: {result['algorithm']}")
            print(f"Suggested action: {result['action']}")
            if result['value'] is not None:
                print(f"Estimated value: {result['value']:.3f}")
            print(f"Metrics: {result['metrics']}")

            if result['action'] is not None:
                next_state = state.get_successor(result['action'])
                print('\nState after suggested action:')
                print(_state_to_text(next_state))

    run_btn.on_click(on_run)

    controls = widgets.VBox([
        widgets.HBox([game_dd, algo_dd]),
        widgets.HBox([depth_slider, sims_slider, c_slider]),
        run_btn
    ])
    display(controls, out)


interactive_playground()
Loading...
Loading...

Play Against the AI (Tic-Tac-Toe)

Play as X against an AI opponent to build intuition about search behavior.

How this helps learning

  • You can feel the difference between deterministic search (Alpha-Beta) and simulation-based play (MCTS).

  • Parameter changes immediately affect move quality and style.

  • Game outcomes provide fast feedback on algorithm settings.

Try multiple rounds with different settings and note when AI choices change.

def launch_tictactoe_vs_ai():
    algo_dd = widgets.Dropdown(options=['Alpha-Beta', 'MCTS'], value='Alpha-Beta', description='AI:')
    depth_slider = widgets.IntSlider(value=6, min=1, max=9, step=1, description='Depth:')
    sims_slider = widgets.IntSlider(value=1200, min=200, max=5000, step=100, description='Sims:')
    restart_btn = widgets.Button(description='Restart', button_style='warning')
    status = widgets.HTML(value='<b>Your turn: play X</b>')

    board_buttons = [widgets.Button(layout=widgets.Layout(width='48px', height='48px')) for _ in range(9)]
    board_grid = widgets.GridBox(
        board_buttons,
        layout=widgets.Layout(grid_template_columns='repeat(3, 52px)', gap='4px')
    )

    game = {'state': TicTacToe(player=1), 'over': False}

    def update_board():
        state = game['state']
        symbols = {1: 'X', -1: 'O', 0: ' '}
        for idx, btn in enumerate(board_buttons):
            r, c = divmod(idx, 3)
            val = int(state.board[r, c])
            btn.description = symbols[val]
            btn.disabled = (val != 0) or game['over']

    def finish_if_terminal():
        state = game['state']
        if not state.is_terminal():
            return False
        game['over'] = True
        winner = state.get_winner()
        if winner == 1:
            status.value = '<b>Game Over: You win</b>'
        elif winner == -1:
            status.value = '<b>Game Over: AI wins</b>'
        else:
            status.value = '<b>Game Over: Draw</b>'
        update_board()
        return True

    def ai_move():
        state = game['state']
        if algo_dd.value == 'Alpha-Beta':
            _, action, _ = alpha_beta(
                state, maximizing=True, eval_fn=ttt_eval, max_depth=depth_slider.value, return_metrics=True
            )
        else:
            action, _ = mcts(state, num_sims=sims_slider.value, c=1.41, verbose=False, return_stats=True)

        if action is not None:
            game['state'] = state.get_successor(action)

    def on_human_click(index):
        def _handler(_):
            if game['over']:
                return
            state = game['state']
            r, c = divmod(index, 3)
            if state.board[r, c] != 0:
                return

            # Human move (X)
            game['state'] = state.get_successor((r, c))
            update_board()
            if finish_if_terminal():
                return

            status.value = '<b>AI thinking...</b>'
            ai_move()
            update_board()
            finish_if_terminal()
            if not game['over']:
                status.value = '<b>Your turn: play X</b>'

        return _handler

    def on_restart(_):
        game['state'] = TicTacToe(player=1)
        game['over'] = False
        status.value = '<b>Your turn: play X</b>'
        update_board()

    for i, btn in enumerate(board_buttons):
        btn.on_click(on_human_click(i))
    restart_btn.on_click(on_restart)

    update_board()
    controls = widgets.HBox([algo_dd, depth_slider, sims_slider, restart_btn])
    display(widgets.VBox([controls, status, board_grid]))


launch_tictactoe_vs_ai()
Loading...

9. Guided Mini-Labs {#guided-mini-labs}

These mini-labs connect theory to observable behavior through short, repeatable experiments.

Learning goals

  • Relate depth to computational cost.

  • Relate simulation budget to action stability.

  • Validate conceptual understanding with quick checks.

Run labs in order for the clearest progression: cost analysis, stability analysis, then concept verification.

def mini_lab_depth_cost():
    start_depth = widgets.IntSlider(value=2, min=1, max=6, description='Start depth:')
    end_depth = widgets.IntSlider(value=6, min=1, max=9, description='End depth:')
    run_btn = widgets.Button(description='Run Depth Lab', button_style='info')
    out = widgets.Output()

    def on_run(_):
        with out:
            clear_output(wait=True)
            d1, d2 = start_depth.value, end_depth.value
            if d2 < d1:
                d1, d2 = d2, d1

            state = TicTacToe()
            print('Depth | Minimax nodes | Alpha-Beta nodes | Alpha-Beta prunes')
            print('-' * 62)
            for d in range(d1, d2 + 1):
                _, _, mm = minimax(state, max_depth=d, eval_fn=ttt_eval, return_metrics=True)
                _, _, ab = alpha_beta(state, max_depth=d, eval_fn=ttt_eval, return_metrics=True)
                print(f"{d:5d} | {mm['nodes']:13d} | {ab['nodes']:16d} | {ab['prunes']:17d}")

            print('\nReflection: Does alpha-beta keep the same decision quality while reducing search?')

    run_btn.on_click(on_run)
    display(widgets.VBox([widgets.HBox([start_depth, end_depth]), run_btn, out]))


mini_lab_depth_cost()
Loading...
def mini_lab_mcts_stability():
    sims = widgets.IntSlider(value=200, min=100, max=3000, step=100, description='Simulations:')
    repeats = widgets.IntSlider(value=5, min=3, max=20, description='Repeats:')
    run_btn = widgets.Button(description='Run MCTS Lab', button_style='info')
    out = widgets.Output()

    def on_run(_):
        with out:
            clear_output(wait=True)
            state = TicTacToe()
            actions = []
            print('Run | Suggested action')
            print('-' * 24)
            for i in range(repeats.value):
                action, stats = mcts(state, num_sims=sims.value, c=1.41, verbose=False, return_stats=True)
                actions.append(action)
                print(f"{i+1:3d} | {action}")

            counts = {}
            for a in actions:
                counts[a] = counts.get(a, 0) + 1
            print('\nAction frequency:', counts)
            print('Reflection: higher simulation budgets usually make the action distribution less noisy.')

    run_btn.on_click(on_run)
    display(widgets.VBox([widgets.HBox([sims, repeats]), run_btn, out]))


mini_lab_mcts_stability()
Loading...
def mini_lab_quick_check():
    q1 = widgets.RadioButtons(
        options=['Yes', 'No'],
        description='Q1:',
        value='Yes'
    )
    q2 = widgets.RadioButtons(
        options=['Yes', 'No'],
        description='Q2:',
        value='Yes'
    )
    check_btn = widgets.Button(description='Check Answers', button_style='success')
    out = widgets.Output()

    prompt1 = widgets.HTML('Q1: In deterministic zero-sum games, alpha-beta should return the same move as minimax (for same depth).')
    prompt2 = widgets.HTML('Q2: Increasing MCTS simulations usually reduces randomness in selected moves.')

    def on_check(_):
        with out:
            clear_output(wait=True)
            score = 0
            if q1.value == 'Yes':
                score += 1
                print('Q1 correct.')
            else:
                print('Q1 incorrect: alpha-beta is a pruning optimization of minimax.')

            if q2.value == 'Yes':
                score += 1
                print('Q2 correct.')
            else:
                print('Q2 incorrect: larger simulation budgets usually stabilize estimates.')

            print(f'\nScore: {score}/2')

    check_btn.on_click(on_check)
    display(widgets.VBox([prompt1, q1, prompt2, q2, check_btn, out]))


mini_lab_quick_check()
Loading...

10. Summary {#summary}

This notebook combines algorithm implementation, measurable diagnostics, and interactive learning activities.

What you implemented

  • Minimax with fixed root-player perspective and instrumentation

  • Alpha-Beta pruning with node and prune metrics

  • MCTS with simulation and tree-growth metrics

What you explored

  • Runtime and search-cost tradeoffs across algorithms

  • Heuristic-based depth-limited play in Connect Four

  • Interactive parameter tuning and human-vs-AI gameplay

What to take away

  • Alpha-Beta can match minimax decisions with far less search.

  • MCTS quality generally improves with larger simulation budgets.

  • Good evaluation functions are critical in large game spaces.

  • Algorithm choice should follow problem structure and compute budget.

For theory, see: Chapter 3: Multiagent Search