⚙️ Module 4: Assumptions & Advanced Topics

When Chi-Square Works (and When It Doesn't)

📚 Learning Objectives

By the end of this module, you will be able to:

Check and interpret chi-square assumptions
Understand the "expected frequencies ≥ 5" rule
Use Fisher's exact test for small samples
Know when to apply Yates' continuity correction
Conduct post-hoc comparisons for larger tables
Troubleshoot common problems and errors

✅ Chi-Square Assumptions

Chi-square tests make several assumptions. Violating these can lead to incorrect p-values!

Assumption 1: Categorical Variables

✓ Required: Variables must be categorical (nominal or ordinal)

✗ Wrong: Don't use chi-square on continuous data

Example error: "Age (years)" should be categorized into age groups first

Assumption 2: Independent Observations

✓ Required: Each observation can only be counted once

✗ Wrong: Same subject measured multiple times, paired data

Example error: Testing 50 people at Time 1 and Time 2 → 100 observations but only 50 independent subjects (use McNemar's test instead!)

Assumption 3: Expected Frequency Rule

✓ Required: Expected frequencies ≥ 5 in ALL cells

This is the most commonly violated assumption and the main focus of this module!

Assumption 4: Random Sampling

✓ Required: Data should come from random or representative sampling

Less about the statistical test and more about study design

🔢 The Expected Frequency ≥ 5 Rule

Critical Rule: ALL expected frequencies must be ≥ 5

⚠️ Why This Matters

Chi-square distribution is an approximation that works well when expected frequencies are large enough. When expected frequencies are too small (< 5), the approximation breaks down and p-values become unreliable.

Result of violation: Type I error rate increases (false positives!)

How to Check:

# After running chi-square test
result <- chisq.test(data)

# Check expected frequencies
result$expected

# Look for any values < 5

Example: Checking Expected Frequencies

	Improved	No Change
Treatment A	8.5	11.5
Treatment B	3.2	6.8

Problem: Treatment B / Improved cell has expected frequency of 3.2 < 5

Solution needed: Use Fisher's exact test instead!

🚨 Common Misconception

WRONG: "All OBSERVED frequencies must be ≥ 5"

CORRECT: "All EXPECTED frequencies must be ≥ 5"

It's okay to have observed frequencies of 0, 1, 2, etc. The rule is about EXPECTED frequencies!

🎯 Fisher's Exact Test: The Solution for Small Samples

When expected frequencies are < 5, use Fisher's exact test instead of chi-square.

Chi-Square Test

Uses approximation
Requires expected freq ≥ 5
Works with any table size
Faster computation

Fisher's Exact Test

Calculates exact probability
No frequency requirements
Best for 2×2 tables
Slower for large tables

Running Fisher's Exact Test in R:

# Same syntax as chi-square!
data <- matrix(c(8, 3, 12, 7), nrow = 2, byrow = TRUE)
rownames(data) <- c("Treatment A", "Treatment B")
colnames(data) <- c("Improved", "No Change")

# Run Fisher's exact test
fisher.test(data)

# For tables larger than 2x2, R may need simulation:
fisher.test(data, simulate.p.value = TRUE)

Example Output

Fisher's Exact Test for Count Data

data:  data
p-value = 0.3561
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.3664  11.8439
sample estimates:
odds ratio 
    1.5556

Interpretation:

p-value = 0.356: No significant association between treatment and outcome
Odds ratio = 1.56: Treatment A has 1.56× the odds of improvement compared to Treatment B (but not significant)
95% CI includes 1.0: Confirms non-significance

🌳 Decision Tree: Which Test Should I Use?

Is your DV categorical?

↓ YES

Are observations independent? (Each subject counted once)

↓ YES

One or two categorical variables?

↓ ONE variable

Goodness-of-Fit
Check expected freq ≥ 5
Use chisq.test()

↓ TWO variables

Test of Independence
→ See next step

For Test of Independence: Check expected frequencies

↓ All ≥ 5

Use Chi-Square
chisq.test()

↓ Any < 5

Use Fisher's Exact
fisher.test()

Is it a 2×2 table?

↓ YES

Yates' correction
Applied automatically
(more conservative)

↓ NO (larger)

Regular chi-square
No correction needed

📐 Yates' Continuity Correction

For 2×2 tables only, R applies Yates' continuity correction by default.

What is Yates' Correction?

Chi-square is a continuous distribution used to approximate a discrete distribution (your frequency counts). For 2×2 tables with df=1, this approximation can be poor.

Yates' correction: Adjusts the chi-square formula to be more conservative (reduces Type I error).

Effect: Slightly larger p-values (harder to reject H₀)

# With Yates' correction (default for 2×2)
chisq.test(data)  # Correction applied automatically

# Without Yates' correction
chisq.test(data, correct = FALSE)

# Compare the results

Example: Impact of Yates' Correction

Test Version	χ²	p-value
With Yates' correction	3.42	0.064
Without correction	4.57	0.032

Impact: Without correction → significant; With correction → non-significant

Recommendation: Use the default (with correction) for 2×2 tables, especially with smaller samples (n < 100)

🔧 Strategies When Expected Frequencies Are Too Small

If you have expected frequencies < 5, you have several options:

1Use Fisher's Exact Test

Best option: Always valid, no assumptions about frequencies

Limitation: Computationally intensive for large tables

fisher.test(data)

2Combine Categories

When appropriate: Categories are theoretically similar

Example: Combine "Slightly Improved" and "Greatly Improved" into "Improved"

# Original: 3 outcome categories
data_3cat <- matrix(c(5, 3, 2, 8, 7, 5), nrow = 2, byrow = TRUE)
colnames(data_3cat) <- c("Worse", "Same", "Better")

# Combine "Worse" and "Same" into "Not Better"
data_2cat <- matrix(c(8, 2, 15, 5), nrow = 2, byrow = TRUE)
colnames(data_2cat) <- c("Not Better", "Better")

# Now expected frequencies may be large enough
chisq.test(data_2cat)

⚠️ Warning

Don't combine categories JUST to get significance! Only combine when theoretically justified.

3Collect More Data

The most straightforward solution: increase sample size

When to use: Study is ongoing, pilot data suggests small cells

4Remove Rare Categories

When appropriate: Category has very few observations and isn't central to research question

Example: If studying 3 common species and 1 rare species with only 2 observations, might exclude the rare species

Report clearly: "One species with n=2 was excluded from analysis due to small sample size"

🎯 Practice Problem 1: Troubleshooting Small Frequencies

Scenario: Testing if habitat choice differs between two bird species

	Forest	Grassland	Wetland
Species A	18	12	2
Species B	15	10	3

# Create data
habitat_data <- matrix(c(18, 12, 2, 15, 10, 3), nrow = 2, byrow = TRUE)
rownames(habitat_data) <- c("Species A", "Species B")
colnames(habitat_data) <- c("Forest", "Grassland", "Wetland")

# Try chi-square
result <- chisq.test(habitat_data)
result

# Check expected frequencies
result$expected  # Look for values < 5

Warning in chisq.test(habitat_data): Chi-squared approximation may be incorrect

Chi-squared test for given probabilities
X-squared = 0.123, df = 2, p-value = 0.940

Expected frequencies:
          Forest Grassland Wetland
Species A  17.60     11.73    2.67
Species B  15.40     10.27    2.33

Problem: Wetland expected frequencies (2.67 and 2.33) are < 5! R even gives a warning!

Solutions:

Option 1: Fisher's Exact Test

fisher.test(habitat_data, simulate.p.value = TRUE)
# Simulation needed for tables larger than 2×2

Option 2: Combine Categories

# Combine Grassland and Wetland into "Non-Forest"
combined_data <- matrix(c(18, 14, 15, 13), nrow = 2, byrow = TRUE)
rownames(combined_data) <- c("Species A", "Species B")
colnames(combined_data) <- c("Forest", "Non-Forest")

result2 <- chisq.test(combined_data)
result2$expected  # All ≥ 5 now!
result2

Recommendation: Use Fisher's exact test with simulation, OR combine categories if theoretically justified (e.g., if both grassland and wetland are "open habitats")

🔍 Post-Hoc Tests for Larger Tables

When chi-square is significant with tables larger than 2×2, you know variables are related, but WHERE is the association?

Strategy 1: Examine Standardized Residuals

We covered this in Module 3 - this is your first step!

# Cells with |residual| > 2 or 3 drive the effect
result$stdres

Strategy 2: Conduct Follow-Up Chi-Square Tests

Break your large table into smaller 2×2 comparisons

Example: 3×2 Table (Three Treatments × Success/Failure)

Overall test: χ²(2) = 12.5, p = .002 (significant)

Question: Which treatments differ from each other?

# Original 3×2 table
full_data <- matrix(c(45, 15, 30, 30, 20, 40), nrow = 3, byrow = TRUE)
rownames(full_data) <- c("Treatment A", "Treatment B", "Treatment C")
colnames(full_data) <- c("Success", "Failure")

# Compare Treatment A vs B
AB <- full_data[1:2, ]
chisq.test(AB)

# Compare Treatment A vs C  
AC <- full_data[c(1,3), ]
chisq.test(AC)

# Compare Treatment B vs C
BC <- full_data[2:3, ]
chisq.test(BC)

# IMPORTANT: With 3 comparisons, consider Bonferroni correction
# Adjusted alpha = 0.05 / 3 = 0.017

⚠️ Multiple Comparisons Problem

Each additional test increases chance of Type I error (false positive)

Bonferroni correction: Divide alpha by number of comparisons

3 comparisons: α = 0.05/3 = 0.017
6 comparisons: α = 0.05/6 = 0.008

Only call results significant if p < adjusted alpha

Strategy 3: Focus on Planned Comparisons

Better than testing all possible pairs: decide BEFORE seeing data which comparisons matter

Example: Testing 4 treatments including a control

Planned comparisons:

Treatment A vs Control
Treatment B vs Control
Treatment C vs Control

Skip the treatment-to-treatment comparisons unless theoretically important

🐛 Common Errors & Troubleshooting

Error 1: "Chi-squared approximation may be incorrect"

Cause: Expected frequencies < 5

Solution: Use Fisher's exact test or combine categories

Error 2: "x must be non-negative and finite"

Cause: Negative values or missing data in your table

Solution: Check for data entry errors, remove NAs

# Remove missing values
clean_data <- na.omit(your_data)
# Check for negatives
summary(your_data)

Error 3: "arguments imply differing number of rows"

Cause: Trying to create table with unequal vector lengths

Solution: Verify all rows have same number of columns

Warning: "'simulate.p.value' was set but ignored"

Cause: Using simulate argument with regular chi-square instead of Fisher's test

Solution: Use simulation only with fisher.test()

🚀 Advanced Topics

McNemar's Test for Paired Data

When observations are NOT independent (same subjects measured twice)

Example: Before/After Treatment

50 patients tested before and after treatment (Pass/Fail)

	After: Pass	After: Fail
Before: Pass	20	5
Before: Fail	18	7

# McNemar's test for paired data
paired_data <- matrix(c(20, 5, 18, 7), nrow = 2)
mcnemar.test(paired_data)

# DO NOT use regular chi-square for paired data!

Cochran-Mantel-Haenszel Test

Testing association while controlling for a third variable (stratified analysis)

# Example: Treatment × Outcome, controlling for Sex
library(stats)
mantelhaen.test(array_data)  # 3D array: rows × cols × strata

Trend Tests (Cochran-Armitage)

Testing for linear trend when one variable is ordinal

Example: Does disease prevalence increase with age category?

Age categories: Young → Middle → Old (natural ordering)

🎯 Practice Problem 2: Complete Analysis Decision

For each scenario, decide which test to use and explain why.

Scenario A: 100 patients, 2 treatment groups, 3 outcome categories. All expected frequencies between 8-15.

Scenario B: 30 animals, 2 species, 2 habitat choices. Expected frequencies: 7.5, 7.5, 7.5, 7.5

Scenario C: 25 animals, 2 species, 4 habitat choices. Expected frequencies range from 2.1 to 5.8

Scenario A Answer:

Chi-square test of independence - All expected frequencies ≥ 5 (all between 8-15), sample size adequate (n=100), two categorical variables. This meets all assumptions for chi-square.

Scenario B Answer:

Chi-square test of independence - Although sample is modest (n=30), all expected frequencies are 7.5, which exceeds the ≥5 requirement. This is a 2×2 table, so Yates' correction will be applied automatically.

Scenario C Answer:

Fisher's exact test with simulation - Some expected frequencies < 5 (as low as 2.1), which violates chi-square assumption. Fisher's exact test is appropriate. For a 2×4 table, use simulate.p.value=TRUE. Alternative: Consider combining habitat categories if theoretically justified to increase expected frequencies.

🎯 Complete Analysis Workflow

1Understand Your Research Question

One or two categorical variables?
What are you testing? (equal distribution? association?)

2Check Assumptions

✓ Categorical variables
✓ Independent observations
✓ Random/representative sample

3Create Table & Run Preliminary Test

result <- chisq.test(data)
result$expected  # Check this first!

4Verify Expected Frequencies

All ≥ 5? → Proceed with chi-square
Any < 5? → Use Fisher's exact test OR combine categories

5Interpret Results

Look at χ², df, p-value
Calculate effect size (Cramér's V for test of independence)
Examine standardized residuals for patterns

6Visualize

Bar plots for goodness-of-fit
Grouped bar plots or mosaic plots for independence tests

7Report

Test name and purpose
Test statistic, df, p-value
Effect size
Pattern description with frequencies/percentages

🎯 Comprehensive Final Problem

Scenario: Testing if stress level affects immune response in 120 participants

	Strong Response	Moderate Response	Weak Response	Total
Low Stress	28	18	4	50
High Stress	12	25	33	70
Total	40	43	37	120

Complete these tasks:

Run chi-square test
Check expected frequencies
Calculate Cramér's V
Examine standardized residuals
Create visualization
Write complete results in APA style

# Your complete analysis here
stress_data <- matrix(c(28, 18, 4, 12, 25, 33), nrow = 2, byrow = TRUE)
rownames(stress_data) <- c("Low Stress", "High Stress")
colnames(stress_data) <- c("Strong", "Moderate", "Weak")

# Step 1: Run test
result <- chisq.test(stress_data)
result

# Step 2: Check expected frequencies
result$expected

# Step 3: Calculate Cramér's V
chi_sq <- result$statistic
n <- sum(stress_data)
k <- min(nrow(stress_data), ncol(stress_data))
V <- sqrt(chi_sq / (n * (k - 1)))
V

# Step 4: Examine residuals
result$stdres

# Step 5: Visualize
barplot(stress_data, beside = TRUE,
        col = c("#c8e6c9", "#ffcdd2"),
        legend = rownames(stress_data),
        xlab = "Immune Response",
        ylab = "Frequency",
        main = "Immune Response by Stress Level")

# Alternative: proportions
prop_data <- prop.table(stress_data, margin = 1)
barplot(prop_data, beside = TRUE,
        col = c("#c8e6c9", "#ffcdd2"),
        legend = rownames(stress_data),
        xlab = "Immune Response",
        ylab = "Proportion",
        main = "Immune Response by Stress Level (Proportions)")

Pearson's Chi-squared test

data:  stress_data
X-squared = 28.41, df = 2, p-value = 6.815e-07

Expected frequencies:
            Strong Moderate  Weak
Low Stress   16.67    17.92 15.42
High Stress  23.33    25.08 21.58

All expected frequencies ≥ 5 ✓

Cramér's V = 0.486

Standardized Residuals:
            Strong Moderate  Weak
Low Stress    2.78    0.02 -2.93
High Stress  -2.34   -0.02  2.46

Complete APA Write-Up:

"A chi-square test of independence was conducted to examine the relationship between stress level and immune response in 120 participants. All expected cell frequencies exceeded 5, meeting the assumptions for chi-square analysis. The association between stress level and immune response was statistically significant, χ²(2) = 28.41, p < .001, V = 0.49, indicating a moderate to strong relationship."

"Examination of standardized residuals revealed the pattern driving this association. Participants with low stress showed stronger immune responses than expected (56% strong response vs. 33% expected; residual = 2.78) and fewer weak responses (8% vs. 31% expected; residual = -2.93). Conversely, participants with high stress showed weaker immune responses than expected (47% weak vs. 31% expected; residual = 2.46) and fewer strong responses (17% vs. 33% expected; residual = -2.34). Moderate responses did not differ from expectation in either group."

"These findings suggest that higher stress levels are associated with compromised immune function, with high-stress individuals showing substantially weaker immune responses compared to their low-stress counterparts. The moderate-to-strong effect size indicates this is a practically meaningful relationship."

Key Interpretation Points:

✓ Checked assumptions (all expected ≥ 5)
✓ Reported test statistic, df, p-value
✓ Included effect size interpretation
✓ Described pattern using residuals
✓ Provided percentages for clarity
✓ Discussed practical implications

🤔 Final Check Your Understanding

Question 1: You have a 2×3 table with n=40. One expected frequency is 4.2. What should you do?

A) Proceed with chi-square; 4.2 is close enough to 5

B) Use Fisher's exact test instead

C) Increase alpha to 0.10 to compensate

Correct! Expected frequency of 4.2 is less than 5, violating chi-square assumptions. Use Fisher's exact test (with simulation for larger tables) or consider combining categories if theoretically justified.

Question 2: For a 2×2 table, should you turn off Yates' correction?

A) Yes, always turn it off for more power

B) No, keep it on (default) for more conservative test

C) Only turn it off if sample size > 1000

Correct! Keep Yates' correction on (the default) for 2×2 tables. It provides a more conservative (accurate) test, especially important with smaller samples. Only consider turning it off with very large samples where the correction may be overly conservative.

Question 3: You test 50 patients before and after treatment (same patients). Which test?

A) Chi-square test of independence

B) McNemar's test for paired data

C) Fisher's exact test

Correct! This is PAIRED data (same subjects measured twice), which violates the independence assumption. Use McNemar's test, not chi-square. Regular chi-square assumes all observations are independent.

📝 Module 4 Summary

Key Takeaways:

Expected frequencies ≥ 5 is the critical assumption to check
Fisher's exact test is the solution when expected frequencies are too small
Yates' correction is applied automatically for 2×2 tables (makes test more conservative)
For larger tables with significant results, examine standardized residuals to find patterns
Post-hoc comparisons need Bonferroni correction for multiple testing
Paired data requires McNemar's test, not chi-square
Always check assumptions BEFORE interpreting results

🎉 Congratulations! You've completed all four Chi-Square modules!
You now have the skills to analyze categorical data correctly and confidently.

📋 Quick Reference Card

Situation	Test to Use	R Code
One variable, test distribution	Goodness-of-fit	`chisq.test(obs, p=...)`
Two variables, all expected ≥ 5	Test of independence	`chisq.test(data)`
Two variables, any expected < 5	Fisher's exact	`fisher.test(data)`
Paired/repeated measures	McNemar's test	`mcnemar.test(data)`
Check effect size	Cramér's V	`sqrt(χ²/(n*(k-1)))`
Find pattern	Standardized residuals	`result$stdres`

Always remember:

Check assumptions FIRST (especially expected frequencies)
Report effect size, not just p-values
Visualize your data
Interpret patterns, don't just report statistics
Consider practical significance alongside statistical significance