8.3 Experimental Design Principles

8.3 Experimental Design Principles#

Good experimental design is crucial for valid statistical inference. This section covers the fundamental principles that ensure reliable results.

The Three R’s of Experimental Design#

1. Randomization#

2. Replication#

3. Blocking (when needed)#

Randomization#

What is Randomization?#

Random assignment of experimental units to treatments.

Why Randomize?#

Controls for confounding: Balances unknown factors across groups
Justifies statistical inference: Allows use of probability theory
Reduces bias: Prevents systematic differences between groups

Example: Without Randomization#

BAD: Assign morning students to Method A, afternoon to Method B

Problem: Time of day is confounded with teaching method! Any difference could be due to: - Teaching method - Student alertness - Teacher fatigue - etc.

Example: With Randomization#

import numpy as np
import pandas as pd

np.random.seed(42)

# List of 40 students
students = [f'Student_{i:02d}' for i in range(1, 41)]

# Randomize assignment to 4 groups (10 each)
np.random.shuffle(students)

groups = {
    'Group_A': students[0:10],
    'Group_B': students[10:20],
    'Group_C': students[20:30],
    'Group_D': students[30:40]
}

print("Randomized Group Assignments")
print("="*60)
for group, members in groups.items():
    print(f"\n{group}:")
    print(', '.join(members))

# Create dataframe for analysis
treatments = []
for group in ['A', 'B', 'C', 'D']:
    treatments.extend([group] * 10)

df = pd.DataFrame({
    'Student': students,
    'Treatment': treatments
})

print("\n" + "="*60)
print("Treatment Assignment Summary:")
print(df['Treatment'].value_counts().sort_index())

Randomized Group Assignments
============================================================

Group_A:
Student_20, Student_17, Student_16, Student_27, Student_05, Student_13, Student_38, Student_28, Student_40, Student_07

Group_B:
Student_26, Student_10, Student_14, Student_32, Student_35, Student_09, Student_18, Student_25, Student_01, Student_34

Group_C:
Student_06, Student_12, Student_02, Student_30, Student_22, Student_03, Student_31, Student_37, Student_04, Student_36

Group_D:
Student_24, Student_33, Student_11, Student_23, Student_19, Student_21, Student_08, Student_15, Student_29, Student_39

============================================================
Treatment Assignment Summary:
Treatment
A    10
B    10
C    10
D    10
Name: count, dtype: int64

Replication#

What is Replication?#

Multiple independent observations for each treatment.

Why Replicate?#

Estimate variability: Cannot assess variation with n=1!
Increase precision: Standard error decreases with (\sqrt{n})
Increase power: Better chance of detecting real effects
Check reproducibility: Ensure results aren’t flukes

How Much Replication?#

Power analysis determines required sample size:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

def sample_size_anova(k, effect_size, alpha=0.05, power=0.80):
    """
    Estimate required sample size per group for one-way ANOVA.
    
    k: number of groups
    effect_size: Cohen's f (small=0.1, medium=0.25, large=0.4)
    alpha: significance level
    power: desired power
    """
    # This is a simplified approximation
    # For precise calculation, use specialized software
    
    from scipy.stats import f as f_dist
    
    # Start with initial guess
    n = 10
    current_power = 0
    
    while current_power < power and n < 1000:
        # Non-centrality parameter
        ncp = k * n * effect_size**2
        
        # Critical F-value
        df1 = k - 1
        df2 = k * (n - 1)
        f_crit = f_dist.ppf(1 - alpha, df1, df2)
        
        # Power (using non-central F distribution)
        from scipy.stats import ncf
        current_power = 1 - ncf.cdf(f_crit, df1, df2, ncp)
        
        n += 1
    
    return n

# Example: How many subjects per group?
k_groups = 4
effect_sizes = {'small': 0.1, 'medium': 0.25, 'large': 0.4}

print("Required Sample Size per Group (Power = 0.80)")
print("="*60)
print(f"{'Effect Size':<15} {'Cohen\'s f':<12} {'n per group':<15} {'Total N':<10}")
print("-"*60)

for name, f in effect_sizes.items():
    n = sample_size_anova(k_groups, f)
    total_n = k_groups * n
    print(f"{name:<15} {f:<12.2f} {n:<15d} {total_n:<10d}")

print("\nNote: These are approximations. Use specialized software for")
print("      precise power analysis (e.g., G*Power, R pwr package)")

Required Sample Size per Group (Power = 0.80)
============================================================
Effect Size     Cohen's f    n per group     Total N   
------------------------------------------------------------
small           0.10         275             1100      
medium          0.25         46              184       
large           0.40         20              80        

Note: These are approximations. Use specialized software for
      precise power analysis (e.g., G*Power, R pwr package)

Blocking#

What is Blocking?#

Grouping experimental units by a known source of variability, then randomizing within blocks.

When to Block?#

When you have a known, controllable source of variation:

Different batches of materials
Different time periods
Different locations
Matched subjects (twins, littermates)

Randomized Complete Block Design (RCBD)#

Block 1 (Field Location A): [Treat1] [Treat2] [Treat3] [Treat4] Block 2 (Field Location B): [Treat4] [Treat1] [Treat3] [Treat2]
Block 3 (Field Location C): [Treat2] [Treat4] [Treat1] [Treat3]

Each treatment appears once in each block
Treatment order randomized within blocks
Removes variation due to location

Python Example: RCBD Analysis#

import numpy as np
import pandas as pd
from scipy import stats

np.random.seed(42)

# Randomized Complete Block Design
# 4 treatments, 5 blocks (e.g., 5 fields)

data = {
    'Yield': [
        # Block 1
        45, 52, 48, 50,
        # Block 2  
        40, 47, 43, 45,
        # Block 3
        50, 57, 53, 55,
        # Block 4
        38, 45, 41, 43,
        # Block 5
        48, 55, 51, 53
    ],
    'Treatment': ['A', 'B', 'C', 'D'] * 5,
    'Block': [1]*4 + [2]*4 + [3]*4 + [4]*4 + [5]*4
}

df = pd.DataFrame(data)

print("Randomized Complete Block Design")
print("="*70)
print("\nData Summary:")
print(df.groupby(['Treatment', 'Block'])['Yield'].mean().unstack())

# Two-way ANOVA with blocking
try:
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    
    # Block is a factor but we're not interested in its main effect
    # We include it to account for block-to-block variation
    model = ols('Yield ~ C(Treatment) + C(Block)', data=df).fit()
    anova_table = anova_lm(model, typ=2)
    
    print("\n" + "="*70)
    print("ANOVA Table (RCBD)")
    print("="*70)
    print(anova_table)
    
    p_treatment = anova_table.loc['C(Treatment)', 'PR(>F)']
    p_block = anova_table.loc['C(Block)', 'PR(>F)']
    
    print("\n" + "="*70)
    print("Interpretation")
    print("="*70)
    
    print(f"\nTreatment effect: p = {p_treatment:.4f}")
    if p_treatment < 0.05:
        print("  → Significant: Treatments differ")
    else:
        print("  → Not significant: No treatment effect detected")
    
    print(f"\nBlock effect: p = {p_block:.4f}")
    if p_block < 0.05:
        print("  → Blocking was effective (blocks differ significantly)")
        print("     Good decision to block!")
    else:
        print("  → Blocking was not necessary (blocks don't differ much)")
        
except ImportError:
    print("Install statsmodels for RCBD analysis")

# Compare: What if we hadn't blocked?
print("\n" + "="*70)
print("Comparison: With vs. Without Blocking")
print("="*70)

# Without blocking (one-way ANOVA)
treatment_groups = [df[df['Treatment']==t]['Yield'].values for t in ['A','B','C','D']]
f_no_block, p_no_block = stats.f_oneway(*treatment_groups)

print(f"\nWithout blocking: p = {p_no_block:.4f}")
print(f"With blocking:    p = {p_treatment:.4f}")
print(f"\nBlocking {'increased' if p_treatment < p_no_block else 'decreased'} power!")

Randomized Complete Block Design
======================================================================

Data Summary:
Block         1     2     3     4     5
Treatment                              
A          45.0  40.0  50.0  38.0  48.0
B          52.0  47.0  57.0  45.0  55.0
C          48.0  43.0  53.0  41.0  51.0
D          50.0  45.0  55.0  43.0  53.0

======================================================================
ANOVA Table (RCBD)
======================================================================
                    sum_sq    df             F         PR(>F)
C(Treatment)  1.337500e+02   3.0  3.839408e+28  3.749985e-168
C(Block)      4.192000e+02   4.0  9.025121e+28  9.442942e-171
Residual      1.393444e-26  12.0           NaN            NaN

======================================================================
Interpretation
======================================================================

Treatment effect: p = 0.0000
  → Significant: Treatments differ

Block effect: p = 0.0000
  → Blocking was effective (blocks differ significantly)
     Good decision to block!

======================================================================
Comparison: With vs. Without Blocking
======================================================================

Without blocking: p = 0.2068
With blocking:    p = 0.0000

Blocking increased power!

Efficiency of Blocking#

Blocking increases power by removing block-to-block variation from the error term.

Relative Efficiency:

\[ \text{RE} = \frac{\text{MSE (unblocked)}}{\text{MSE (blocked)}} \]

RE > 1 means blocking was beneficial.

Common Experimental Designs#

1. Completely Randomized Design (CRD)#

Structure: Random assignment to treatments, no blocking

When to use: Homogeneous experimental units

Analysis: One-way ANOVA

# Example: 30 subjects, 3 treatments
subjects = np.arange(30)
np.random.shuffle(subjects)

design_crd = pd.DataFrame({
    'Subject': subjects,
    'Treatment': ['A']*10 + ['B']*10 + ['C']*10
})

2. Randomized Complete Block Design (RCBD)#

Structure: Blocks contain all treatments, randomized within blocks

When to use: Known source of variation to control

Analysis: Two-way ANOVA (treatment + block)

3. Latin Square Design#

Structure: Control for two blocking factors simultaneously

Example: Different operators (rows), different machines (columns)

    Machine 1  Machine 2  Machine 3  Machine 4

Op 1 A B C D Op 2 B C D A Op 3 C D A B
Op 4 D A B C

Each treatment appears once in each row AND column

Analysis: Three-way ANOVA (treatment + row block + column block)

4. Factorial Design#

Structure: Multiple factors, all combinations tested

When to use: Study multiple factors and their interactions

Analysis: Multi-way ANOVA

Sample Size Considerations#

Factors Affecting Required Sample Size#

Effect size: Smaller effects need larger n
Variability: Higher σ needs larger n
Significance level: Smaller α needs larger n
Desired power: Higher power needs larger n
Number of groups: More groups need larger n

Rules of Thumb#

Pilot studies: n ≥ 5-10 per group (exploratory)
Main experiments: n ≥ 20-30 per group (standard)
Detect small effects: n ≥ 50+ per group
Always do power analysis before collecting data!

Practical Considerations#

Randomization Methods#

import numpy as np

def randomize_assignment(subjects, treatments, method='complete'):
    """
    Randomize subject assignment to treatments.
    
    methods:
    - 'complete': Completely random (may give unequal groups)
    - 'balanced': Force equal group sizes
    - 'block': Randomize within blocks
    """
    n = len(subjects)
    k = len(treatments)
    
    if method == 'complete':
        # Completely random
        assignment = np.random.choice(treatments, size=n)
        
    elif method == 'balanced':
        # Force equal sizes
        n_per_group = n // k
        assignment = []
        for treatment in treatments:
            assignment.extend([treatment] * n_per_group)
        np.random.shuffle(assignment)
        
    return pd.DataFrame({
        'Subject': subjects,
        'Treatment': assignment
    })

# Example
subjects = [f'S{i:02d}' for i in range(1, 41)]
treatments = ['Control', 'TreatA', 'TreatB', 'TreatC']

assignment = randomize_assignment(subjects, treatments, method='balanced')
print(assignment.groupby('Treatment').size())

Treatment
Control    10
TreatA     10
TreatB     10
TreatC     10
dtype: int64

Dealing with Missing Data#

Prevention:

Build in redundancy
Careful data collection protocols
Regular data checks

If it happens:

Document reasons for missingness
Use appropriate methods:
- Complete case analysis (if MCAR - Missing Completely At Random)
- Mixed models (handles unbalanced data well)
- Multiple imputation (advanced)

Assumptions and Their Violations#

Independence#

Violated by:

Pseudoreplication (e.g., multiple measurements on same subject)
Spatial correlation
Temporal correlation

Solutions:

Proper randomization
Blocking
Mixed models (repeated measures)

Normality#

Check: Q-Q plots, Shapiro-Wilk test

If violated:

Transformations (log, sqrt, Box-Cox)
Non-parametric tests (Kruskal-Wallis)
Bootstrapping
Generalized linear models (GLM)

Homogeneity of Variance#

Check: Levene’s test, residual plots

If violated:

Transformations
Welch’s ANOVA
Weighted least squares
Generalized linear models

Summary#

The Golden Rules#

Randomize whenever possible
Replicate adequately (power analysis!)
Block when there’s known variation
Balance your design (equal group sizes)
Check assumptions before and after analysis
Report effect sizes, not just p-values

Design Selection Guide#

Situation	Design	Analysis
Homogeneous units, 1 factor	CRD	One-way ANOVA
Known blocking variable, 1 factor	RCBD	Two-way ANOVA
2+ factors of interest	Factorial	Multi-way ANOVA
2 blocking factors	Latin Square	Three-way ANOVA
Repeated measures	Within-subjects	Repeated measures ANOVA

Common Mistakes to Avoid#

❌ Pseudo-replication (treating subsamples as independent)
❌ Forgetting to randomize
❌ Unbalanced designs (when avoidable)
❌ Ignoring interactions in factorial designs
❌ Not checking assumptions
❌ P-hacking and multiple testing

Before You Experiment#

✅ Define clear hypotheses
✅ Conduct power analysis
✅ Choose appropriate design
✅ Plan randomization procedure
✅ Determine data collection protocols
✅ Prepare analysis plan

Conclusion#

Good experimental design is the foundation of valid statistical inference. No amount of sophisticated analysis can rescue a poorly designed experiment!

Remember:

“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.” — R.A. Fisher

Next chapter: Inferring Probability Models from Data (Maximum Likelihood and Bayesian Inference)!

8.3 Experimental Design Principles

Contents

8.3 Experimental Design Principles#

The Three R’s of Experimental Design#

1. Randomization#

2. Replication#

3. Blocking (when needed)#

Randomization#

What is Randomization?#

Why Randomize?#

Example: Without Randomization#

Example: With Randomization#

Replication#

What is Replication?#

Why Replicate?#

How Much Replication?#

Blocking#

What is Blocking?#

When to Block?#

Randomized Complete Block Design (RCBD)#

Python Example: RCBD Analysis#

Efficiency of Blocking#

Common Experimental Designs#

1. Completely Randomized Design (CRD)#

2. Randomized Complete Block Design (RCBD)#

3. Latin Square Design#

4. Factorial Design#

Sample Size Considerations#

Factors Affecting Required Sample Size#

Rules of Thumb#

Practical Considerations#

Randomization Methods#

Dealing with Missing Data#

Assumptions and Their Violations#

Independence#

Normality#

Homogeneity of Variance#

Summary#

The Golden Rules#

Design Selection Guide#

Common Mistakes to Avoid#

Before You Experiment#

Conclusion#