8.3 Experimental Design Principles#
Good experimental design is crucial for valid statistical inference. This section covers the fundamental principles that ensure reliable results.
The Three R’s of Experimental Design#
1. Randomization#
2. Replication#
3. Blocking (when needed)#
Randomization#
What is Randomization?#
Random assignment of experimental units to treatments.
Why Randomize?#
Controls for confounding: Balances unknown factors across groups
Justifies statistical inference: Allows use of probability theory
Reduces bias: Prevents systematic differences between groups
Example: Without Randomization#
BAD: Assign morning students to Method A, afternoon to Method B
Problem: Time of day is confounded with teaching method! Any difference could be due to: - Teaching method - Student alertness - Teacher fatigue - etc.
Example: With Randomization#
import numpy as np
import pandas as pd
np.random.seed(42)
# List of 40 students
students = [f'Student_{i:02d}' for i in range(1, 41)]
# Randomize assignment to 4 groups (10 each)
np.random.shuffle(students)
groups = {
'Group_A': students[0:10],
'Group_B': students[10:20],
'Group_C': students[20:30],
'Group_D': students[30:40]
}
print("Randomized Group Assignments")
print("="*60)
for group, members in groups.items():
print(f"\n{group}:")
print(', '.join(members))
# Create dataframe for analysis
treatments = []
for group in ['A', 'B', 'C', 'D']:
treatments.extend([group] * 10)
df = pd.DataFrame({
'Student': students,
'Treatment': treatments
})
print("\n" + "="*60)
print("Treatment Assignment Summary:")
print(df['Treatment'].value_counts().sort_index())
Randomized Group Assignments
============================================================
Group_A:
Student_20, Student_17, Student_16, Student_27, Student_05, Student_13, Student_38, Student_28, Student_40, Student_07
Group_B:
Student_26, Student_10, Student_14, Student_32, Student_35, Student_09, Student_18, Student_25, Student_01, Student_34
Group_C:
Student_06, Student_12, Student_02, Student_30, Student_22, Student_03, Student_31, Student_37, Student_04, Student_36
Group_D:
Student_24, Student_33, Student_11, Student_23, Student_19, Student_21, Student_08, Student_15, Student_29, Student_39
============================================================
Treatment Assignment Summary:
Treatment
A 10
B 10
C 10
D 10
Name: count, dtype: int64
Replication#
What is Replication?#
Multiple independent observations for each treatment.
Why Replicate?#
Estimate variability: Cannot assess variation with n=1!
Increase precision: Standard error decreases with (\sqrt{n})
Increase power: Better chance of detecting real effects
Check reproducibility: Ensure results aren’t flukes
How Much Replication?#
Power analysis determines required sample size:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
def sample_size_anova(k, effect_size, alpha=0.05, power=0.80):
"""
Estimate required sample size per group for one-way ANOVA.
k: number of groups
effect_size: Cohen's f (small=0.1, medium=0.25, large=0.4)
alpha: significance level
power: desired power
"""
# This is a simplified approximation
# For precise calculation, use specialized software
from scipy.stats import f as f_dist
# Start with initial guess
n = 10
current_power = 0
while current_power < power and n < 1000:
# Non-centrality parameter
ncp = k * n * effect_size**2
# Critical F-value
df1 = k - 1
df2 = k * (n - 1)
f_crit = f_dist.ppf(1 - alpha, df1, df2)
# Power (using non-central F distribution)
from scipy.stats import ncf
current_power = 1 - ncf.cdf(f_crit, df1, df2, ncp)
n += 1
return n
# Example: How many subjects per group?
k_groups = 4
effect_sizes = {'small': 0.1, 'medium': 0.25, 'large': 0.4}
print("Required Sample Size per Group (Power = 0.80)")
print("="*60)
print(f"{'Effect Size':<15} {'Cohen\'s f':<12} {'n per group':<15} {'Total N':<10}")
print("-"*60)
for name, f in effect_sizes.items():
n = sample_size_anova(k_groups, f)
total_n = k_groups * n
print(f"{name:<15} {f:<12.2f} {n:<15d} {total_n:<10d}")
print("\nNote: These are approximations. Use specialized software for")
print(" precise power analysis (e.g., G*Power, R pwr package)")
Required Sample Size per Group (Power = 0.80)
============================================================
Effect Size Cohen's f n per group Total N
------------------------------------------------------------
small 0.10 275 1100
medium 0.25 46 184
large 0.40 20 80
Note: These are approximations. Use specialized software for
precise power analysis (e.g., G*Power, R pwr package)
Blocking#
What is Blocking?#
Grouping experimental units by a known source of variability, then randomizing within blocks.
When to Block?#
When you have a known, controllable source of variation:
Different batches of materials
Different time periods
Different locations
Matched subjects (twins, littermates)
Randomized Complete Block Design (RCBD)#
Block 1 (Field Location A): [Treat1] [Treat2] [Treat3] [Treat4]
Block 2 (Field Location B): [Treat4] [Treat1] [Treat3] [Treat2]
Block 3 (Field Location C): [Treat2] [Treat4] [Treat1] [Treat3]
Each treatment appears once in each block
Treatment order randomized within blocks
Removes variation due to location
Python Example: RCBD Analysis#
import numpy as np
import pandas as pd
from scipy import stats
np.random.seed(42)
# Randomized Complete Block Design
# 4 treatments, 5 blocks (e.g., 5 fields)
data = {
'Yield': [
# Block 1
45, 52, 48, 50,
# Block 2
40, 47, 43, 45,
# Block 3
50, 57, 53, 55,
# Block 4
38, 45, 41, 43,
# Block 5
48, 55, 51, 53
],
'Treatment': ['A', 'B', 'C', 'D'] * 5,
'Block': [1]*4 + [2]*4 + [3]*4 + [4]*4 + [5]*4
}
df = pd.DataFrame(data)
print("Randomized Complete Block Design")
print("="*70)
print("\nData Summary:")
print(df.groupby(['Treatment', 'Block'])['Yield'].mean().unstack())
# Two-way ANOVA with blocking
try:
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
# Block is a factor but we're not interested in its main effect
# We include it to account for block-to-block variation
model = ols('Yield ~ C(Treatment) + C(Block)', data=df).fit()
anova_table = anova_lm(model, typ=2)
print("\n" + "="*70)
print("ANOVA Table (RCBD)")
print("="*70)
print(anova_table)
p_treatment = anova_table.loc['C(Treatment)', 'PR(>F)']
p_block = anova_table.loc['C(Block)', 'PR(>F)']
print("\n" + "="*70)
print("Interpretation")
print("="*70)
print(f"\nTreatment effect: p = {p_treatment:.4f}")
if p_treatment < 0.05:
print(" → Significant: Treatments differ")
else:
print(" → Not significant: No treatment effect detected")
print(f"\nBlock effect: p = {p_block:.4f}")
if p_block < 0.05:
print(" → Blocking was effective (blocks differ significantly)")
print(" Good decision to block!")
else:
print(" → Blocking was not necessary (blocks don't differ much)")
except ImportError:
print("Install statsmodels for RCBD analysis")
# Compare: What if we hadn't blocked?
print("\n" + "="*70)
print("Comparison: With vs. Without Blocking")
print("="*70)
# Without blocking (one-way ANOVA)
treatment_groups = [df[df['Treatment']==t]['Yield'].values for t in ['A','B','C','D']]
f_no_block, p_no_block = stats.f_oneway(*treatment_groups)
print(f"\nWithout blocking: p = {p_no_block:.4f}")
print(f"With blocking: p = {p_treatment:.4f}")
print(f"\nBlocking {'increased' if p_treatment < p_no_block else 'decreased'} power!")
Randomized Complete Block Design
======================================================================
Data Summary:
Block 1 2 3 4 5
Treatment
A 45.0 40.0 50.0 38.0 48.0
B 52.0 47.0 57.0 45.0 55.0
C 48.0 43.0 53.0 41.0 51.0
D 50.0 45.0 55.0 43.0 53.0
======================================================================
ANOVA Table (RCBD)
======================================================================
sum_sq df F PR(>F)
C(Treatment) 1.337500e+02 3.0 3.839408e+28 3.749985e-168
C(Block) 4.192000e+02 4.0 9.025121e+28 9.442942e-171
Residual 1.393444e-26 12.0 NaN NaN
======================================================================
Interpretation
======================================================================
Treatment effect: p = 0.0000
→ Significant: Treatments differ
Block effect: p = 0.0000
→ Blocking was effective (blocks differ significantly)
Good decision to block!
======================================================================
Comparison: With vs. Without Blocking
======================================================================
Without blocking: p = 0.2068
With blocking: p = 0.0000
Blocking increased power!
Efficiency of Blocking#
Blocking increases power by removing block-to-block variation from the error term.
Relative Efficiency:
RE > 1 means blocking was beneficial.
Common Experimental Designs#
1. Completely Randomized Design (CRD)#
Structure: Random assignment to treatments, no blocking
When to use: Homogeneous experimental units
Analysis: One-way ANOVA
# Example: 30 subjects, 3 treatments
subjects = np.arange(30)
np.random.shuffle(subjects)
design_crd = pd.DataFrame({
'Subject': subjects,
'Treatment': ['A']*10 + ['B']*10 + ['C']*10
})
2. Randomized Complete Block Design (RCBD)#
Structure: Blocks contain all treatments, randomized within blocks
When to use: Known source of variation to control
Analysis: Two-way ANOVA (treatment + block)
3. Latin Square Design#
Structure: Control for two blocking factors simultaneously
Example: Different operators (rows), different machines (columns)
Machine 1 Machine 2 Machine 3 Machine 4
Op 1 A B C D
Op 2 B C D A
Op 3 C D A B
Op 4 D A B C
Each treatment appears once in each row AND column
Analysis: Three-way ANOVA (treatment + row block + column block)
4. Factorial Design#
Structure: Multiple factors, all combinations tested
When to use: Study multiple factors and their interactions
Analysis: Multi-way ANOVA
Sample Size Considerations#
Factors Affecting Required Sample Size#
Effect size: Smaller effects need larger n
Variability: Higher σ needs larger n
Significance level: Smaller α needs larger n
Desired power: Higher power needs larger n
Number of groups: More groups need larger n
Rules of Thumb#
Pilot studies: n ≥ 5-10 per group (exploratory)
Main experiments: n ≥ 20-30 per group (standard)
Detect small effects: n ≥ 50+ per group
Always do power analysis before collecting data!
Practical Considerations#
Randomization Methods#
import numpy as np
def randomize_assignment(subjects, treatments, method='complete'):
"""
Randomize subject assignment to treatments.
methods:
- 'complete': Completely random (may give unequal groups)
- 'balanced': Force equal group sizes
- 'block': Randomize within blocks
"""
n = len(subjects)
k = len(treatments)
if method == 'complete':
# Completely random
assignment = np.random.choice(treatments, size=n)
elif method == 'balanced':
# Force equal sizes
n_per_group = n // k
assignment = []
for treatment in treatments:
assignment.extend([treatment] * n_per_group)
np.random.shuffle(assignment)
return pd.DataFrame({
'Subject': subjects,
'Treatment': assignment
})
# Example
subjects = [f'S{i:02d}' for i in range(1, 41)]
treatments = ['Control', 'TreatA', 'TreatB', 'TreatC']
assignment = randomize_assignment(subjects, treatments, method='balanced')
print(assignment.groupby('Treatment').size())
Treatment
Control 10
TreatA 10
TreatB 10
TreatC 10
dtype: int64
Dealing with Missing Data#
Prevention:
Build in redundancy
Careful data collection protocols
Regular data checks
If it happens:
Document reasons for missingness
Use appropriate methods:
Complete case analysis (if MCAR - Missing Completely At Random)
Mixed models (handles unbalanced data well)
Multiple imputation (advanced)
Assumptions and Their Violations#
Independence#
Violated by:
Pseudoreplication (e.g., multiple measurements on same subject)
Spatial correlation
Temporal correlation
Solutions:
Proper randomization
Blocking
Mixed models (repeated measures)
Normality#
Check: Q-Q plots, Shapiro-Wilk test
If violated:
Transformations (log, sqrt, Box-Cox)
Non-parametric tests (Kruskal-Wallis)
Bootstrapping
Generalized linear models (GLM)
Homogeneity of Variance#
Check: Levene’s test, residual plots
If violated:
Transformations
Welch’s ANOVA
Weighted least squares
Generalized linear models
Summary#
The Golden Rules#
Randomize whenever possible
Replicate adequately (power analysis!)
Block when there’s known variation
Balance your design (equal group sizes)
Check assumptions before and after analysis
Report effect sizes, not just p-values
Design Selection Guide#
Situation |
Design |
Analysis |
|---|---|---|
Homogeneous units, 1 factor |
CRD |
One-way ANOVA |
Known blocking variable, 1 factor |
RCBD |
Two-way ANOVA |
2+ factors of interest |
Factorial |
Multi-way ANOVA |
2 blocking factors |
Latin Square |
Three-way ANOVA |
Repeated measures |
Within-subjects |
Repeated measures ANOVA |
Common Mistakes to Avoid#
❌ Pseudo-replication (treating subsamples as independent)
❌ Forgetting to randomize
❌ Unbalanced designs (when avoidable)
❌ Ignoring interactions in factorial designs
❌ Not checking assumptions
❌ P-hacking and multiple testing
Before You Experiment#
✅ Define clear hypotheses
✅ Conduct power analysis
✅ Choose appropriate design
✅ Plan randomization procedure
✅ Determine data collection protocols
✅ Prepare analysis plan
Conclusion#
Good experimental design is the foundation of valid statistical inference. No amount of sophisticated analysis can rescue a poorly designed experiment!
Remember:
“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.” — R.A. Fisher
Next chapter: Inferring Probability Models from Data (Maximum Likelihood and Bayesian Inference)!