📚 Instructor's Guide: Chi-Square Teaching Modules

Comprehensive Teaching Guide for Modules 1-4

Complete with Answer Keys, Teaching Tips, Common Errors, and Facilitation Notes

📑 Table of Contents

🎯 Course Overview & Pedagogical Approach

Course Philosophy

These modules teach chi-square as a practical research tool, not a mathematical exercise. Students learn to:

Learning Objectives Across All Modules

By the end of the complete module series, students should be able to:

  1. Distinguish categorical from continuous variables
  2. Choose between goodness-of-fit and test of independence
  3. Run chi-square tests in R using chisq.test()
  4. Calculate and interpret expected frequencies
  5. Check and respond to assumption violations
  6. Use Fisher's exact test when appropriate
  7. Calculate and interpret Cramér's V effect size
  8. Read and interpret standardized residuals
  9. Report results in APA style

Recommended Pacing

⏱️ TOTAL TIME: 4-6 hours of instruction + practice
💡 Teaching Tip

Module 3 is the heart of the course. Most research uses test of independence, not goodness-of-fit. Budget extra time for Module 3 and ensure students master contingency tables and effect sizes.

Prerequisites

Students should have:

Key Differences from Regression/ANOVA

📊 Module 1: Why Chi-Square Matters

⏱️ Estimated Time: 45-60 minutes

Learning Objectives

Key Concepts to Emphasize

💡 The Most Important Concept

"Chi-square is ONLY for categorical dependent variables."

Students constantly try to use chi-square on continuous data. Hammer this home early! Use examples:

  • ✓ Treatment response (improved/not improved) → Chi-square
  • ✗ Symptom severity score (0-100) → t-test or ANOVA, NOT chi-square

Interactive Activity: Variable Sorting

✅ Answer Key: Drag-and-Drop Activity

Categorical Variables:

  • Neuron type (pyramidal, interneuron)
  • Handedness (left, right)
  • Treatment outcome (improved, no change, worse)
  • Maze arm chosen (left, right, center)

Continuous Variables:

  • Firing rate (spikes/second)
  • Age (years)
  • Anxiety score (0-50 scale)
  • Time to find platform (seconds)
⚠️ Common Student Error

"Age is categorical because we can group it."

Correction: Age is inherently continuous (measured in years). You CAN categorize it (young/middle/old), but the raw variable is continuous. The key is: how was it measured? If measured as a number with decimal places, it's continuous.

Check Your Understanding Questions

✅ Question 1: Reaction Time Study

Q: Researcher measures reaction time (ms) for coffee vs. no coffee. Use chi-square?

A: NO. Reaction time is continuous. Use t-test.

Teaching point: Even though there are "two groups," the DV is continuous. Chi-square is about the DEPENDENT VARIABLE type, not the number of groups.

✅ Question 2: Goodness-of-Fit vs. Independence

Q: 100 patients categorized as improved/no change/worse. Test if distribution differs from equal proportions (33%, 33%, 33%).

A: Goodness-of-fit (one categorical variable)

Teaching point: Only ONE variable (outcome), testing against an expected distribution.

✅ Question 3: Test of Independence

Q: Males and females choosing among three habitats. Which test?

A: Test of independence (two categorical variables: sex AND habitat)

Teaching point: TWO variables, testing if they're related.

Discussion Prompts

🎲 Module 2: Goodness-of-Fit Test

⏱️ Estimated Time: 75-90 minutes

Learning Objectives

The Chi-Square Formula

💡 Teaching Strategy

Use the interactive calculator! Have students calculate χ² by hand for a simple example (3 categories) before using R. This builds intuition for what the statistic measures.

Key insight to convey: Larger differences between O and E → larger χ² → more evidence against H₀

✅ Hand Calculation Example

Data: Observed = (15, 18, 27); Expected = (20, 20, 20)

Calculation:

Cell 1: (15-20)²/20 = 25/20 = 1.25
Cell 2: (18-20)²/20 = 4/20 = 0.20
Cell 3: (27-20)²/20 = 49/20 = 2.45
χ² = 1.25 + 0.20 + 2.45 = 3.90
df = 3 - 1 = 2

Walk through: Show that the largest contribution (2.45) comes from the cell most different from expectation (27 vs 20).

Practice Problems with Solutions

✅ Practice Problem 1: Birth Days

Data: Mon=28, Tue=32, Wed=29, Thu=31, Fri=35, Sat=18, Sun=17 (n=190)

Expected: 190/7 = 27.14 per day

R Output: χ²(6) = 14.63, p = .023

Interpretation: Births are NOT equally distributed. Weekend births (Sat/Sun) are notably lower than weekdays - likely due to scheduled procedures.

Teaching point: Always interpret the pattern, not just the p-value! Look at which categories deviate.

✅ Practice Problem 2: Mendelian Genetics

Data: 152 purple, 48 white (n=200)

Expected (3:1 ratio): 150 purple (75%), 50 white (25%)

R Code:

flowers <- c(152, 48)
chisq.test(flowers, p = c(0.75, 0.25))

R Output: χ²(1) = 0.107, p = .744

Interpretation: Data fit 3:1 ratio very well. Supports Mendelian inheritance!

Teaching point: Non-significant results can be meaningful! Here it confirms theoretical prediction.

Common Mistakes

⚠️ Error 1: Using Raw Data Instead of Frequencies

Student tries:

responses <- c("A", "B", "A", "C")
chisq.test(responses)  # ERROR!

Correct approach:

freq <- table(responses)
chisq.test(freq)

Prevention: Emphasize that chi-square operates on COUNTS, not individual observations.

⚠️ Error 2: Expected Totals Don't Match Observed

Student specifies: p = c(0.3, 0.3, 0.5) (sums to 1.1!)

Correction: Proportions must sum to 1.0 exactly

Live Coding Demonstration

💡 Demonstration Script

Project your screen and walk through:

  1. Create observed frequency vector
  2. Run chisq.test()
  3. Interpret the output line-by-line
  4. Create a bar plot
  5. Narrate your thinking: "First I check if p < .05... yes, so I reject H₀... Now I look at which categories differ..."

🔗 Module 3: Test of Independence

⏱️ Estimated Time: 90-120 minutes

Learning Objectives

💡 Critical Concept: Independence

Use this analogy: "Independence means knowing one variable tells you NOTHING about the other. Like coin flip and dice roll - knowing you flipped heads doesn't help you predict the die."

Non-independence: "Knowing someone is Species A DOES tell you about habitat - they prefer water. Variables are associated."

Expected Frequency Formula

✅ Teaching the Formula

Formula: Expected = (Row Total × Column Total) / Grand Total

Walk through an example on the board:

Improved Not Improved Total
Male 45 15 60
Female 25 35 60
Total 70 50 120

Calculate Male/Improved expected:

E = (60 × 70) / 120 = 4200 / 120 = 35

Interpretation: "If sex and improvement were independent, we'd expect 35 of the 60 males to improve (the overall improvement rate of 70/120 applied to males)."

Standardized Residuals: The Most Underused Tool

💡 Teaching Strategy

Students often ignore residuals! Emphasize that a significant chi-square only tells you "something is different" - residuals tell you WHAT is different.

Rule of thumb: |residual| > 2 means that cell contributes notably to χ²

✅ Practice: Species × Habitat

Data: 3 species × 2 habitats (n=150)

R Output: χ²(2) = 8.26, p = .016

Standardized Residuals:

           Forest  Grassland
Species A   2.03    -2.03
Species B  -1.58     1.58
Species C  -0.41     0.41

Interpretation Guide:

  • Species A: Prefers forest (+2.03), avoids grassland (-2.03)
  • Species B: Slight grassland preference (+1.58), but < 2 threshold
  • Species C: No strong pattern (residuals < 1)

Conclusion: The significant chi-square is driven primarily by Species A's strong forest preference.

Cramér's V: Effect Size

⚠️ Common Student Error

"The chi-square is 25, so the effect must be large!"

Correction: Chi-square value depends on sample size! Always calculate effect size.

Example:

  • Study A: χ² = 25, n = 1000 → V = 0.16 (weak)
  • Study B: χ² = 25, n = 100 → V = 0.50 (strong)
✅ Cramér's V Calculation

Formula: V = √[χ² / (n × (k-1))]

Where k = min(rows, cols)

Example: Treatment × Sex (2×2 table)

χ² = 15.63, n = 120, k = 2

V = √[15.63 / (120 × 1)] = √0.130 = 0.36 (moderate effect)

R code to provide students:

# Manual calculation
chi_sq <- result$statistic
n <- sum(data)
k <- min(nrow(data), ncol(data))
V <- sqrt(chi_sq / (n * (k - 1)))
V

Practice Problems

✅ Comprehensive Problem: Diagnosis × Treatment

Data: 3 diagnoses × 2 treatments (n=200)

Results:

  • χ²(2) = 26.04, p < .001
  • V = 0.36 (moderate association)
  • Residuals show: Anxiety→Therapy (+2.82), Depression→Medication (+3.25)

Full APA Write-up:

"The relationship between diagnosis and treatment type was examined using a chi-square test of independence. Treatment assignment was significantly associated with diagnosis, χ²(2) = 26.04, p < .001, V = 0.36. Patients with anxiety were more likely to receive therapy (64%) than medication (36%), while patients with depression showed the opposite pattern (63% medication, 38% therapy). The moderate effect size indicates this is a meaningful clinical pattern."

⚙️ Module 4: Assumptions & Advanced Topics

⏱️ Estimated Time: 60-90 minutes

Learning Objectives

💡 The Most Important Rule

"Expected frequencies ≥ 5 in ALL cells"

This is the #1 assumption students violate. Emphasize:

  • It's about EXPECTED, not observed frequencies
  • Check with result$expected
  • R will warn you, but you need to know what to do

Decision Tree for Test Selection

✅ Teaching the Decision Process

Walk through this flowchart with students:

  1. Is DV categorical? NO → Use t-test/ANOVA/regression
  2. Are observations independent? NO → Use McNemar's test
  3. One or two variables? ONE → Goodness-of-fit; TWO → Test of independence
  4. Check expected frequencies: All ≥ 5 → Chi-square; Any < 5 → Fisher's exact

Practice: Give students scenarios and have them work through the decision tree.

Fisher's Exact Test

✅ When and How to Use Fisher's

When: ANY expected frequency < 5

Advantage: Exact p-value (no approximation)

Limitation: Computationally intensive for large tables

R Code:

# For 2×2 tables
fisher.test(data)

# For larger tables, use simulation
fisher.test(data, simulate.p.value = TRUE)

Teaching point: Show students the warning message R gives when expected < 5, then demonstrate switching to Fisher's test.

Common Scenarios and Solutions

✅ Scenario 1: Small Sample with Low Frequencies

Problem: 2 species × 3 habitats, n=30, some expected frequencies = 2.5

Solutions in order of preference:

  1. Fisher's exact test with simulation (best option)
  2. Combine categories (e.g., merge similar habitats) IF theoretically justified
  3. Collect more data if study is ongoing

What NOT to do: Don't arbitrarily combine categories just to get significance!

⚠️ Error: Confusing Observed and Expected

Student says: "I can't use chi-square because I have zeros in my table."

Correction: Observed frequencies of 0, 1, 2, etc. are FINE. The rule is about EXPECTED frequencies. Show them:

result$expected  # Check THIS, not the observed data

Yates' Continuity Correction

💡 Explaining Yates' Correction

Simple explanation: "For 2×2 tables, R automatically makes the test a bit more conservative to avoid false positives. This is usually good!"

When students ask about turning it off: "Keep it on unless you have a very large sample (n > 500) and theoretical reasons to use the uncorrected version."

Show the difference:

# With correction (default)
chisq.test(data)  # p = 0.064

# Without correction
chisq.test(data, correct = FALSE)  # p = 0.032

# See how it affects borderline results?

Post-Hoc Comparisons

✅ Teaching Post-Hoc Strategy

Scenario: 3 treatments × 2 outcomes, overall χ² is significant

Question: "Which treatments differ?"

Approach:

  1. First: Look at standardized residuals (|residual| > 2)
  2. Then: If needed, conduct pairwise chi-squares:
    • Treatment A vs B
    • Treatment A vs C
    • Treatment B vs C
  3. Apply Bonferroni: α = 0.05 / 3 = 0.017

Better approach: Plan comparisons in advance (control vs. each treatment) to reduce multiple testing burden

Practice Problem with Full Solution

✅ Comprehensive Problem: Stress × Immune Response

Data: 2 stress levels × 3 response categories (n=120)

Expected frequencies: All between 15.42 and 25.08 (all ≥ 5 ✓)

Results:

  • χ²(2) = 28.41, p < .001
  • V = 0.49 (moderate-strong effect)
  • Residuals: Low stress shows more strong responses (+2.78), fewer weak (-2.93)

Teaching points from this problem:

  • Assumptions met (show the check)
  • Significant result + meaningful effect size
  • Clear pattern from residuals
  • Real-world implications (stress affects immune function)

📋 Assessment Strategies & Rubrics

Formative Assessment

Each module includes built-in "Check Your Understanding" questions. Use these to:

Summative Assessment: Chi-Square Analysis Project

💡 Recommended Final Assessment

Task: Students complete a full chi-square analysis on provided (or collected) data

Components (100 points total):

  • Research question and hypotheses (10 pts)
  • Appropriate test selection with justification (15 pts)
  • Assumption checking (15 pts)
  • Correct R code and output (20 pts)
  • Effect size calculation (10 pts)
  • Pattern identification (residuals) (15 pts)
  • APA-style write-up (15 pts)

Detailed Grading Rubric

Component Exemplary (A) Proficient (B) Developing (C) Needs Work (D/F)
Test Selection Correct test chosen with clear justification of why Correct test, minimal justification Correct test, no justification OR wrong test with partial reasoning Wrong test, no justification
Assumptions All assumptions checked, expected frequencies examined, appropriate action taken if violated Most assumptions checked, basic response to violations Some assumptions checked but not all, or checked but ignored violations Assumptions not checked or serious violations ignored
R Code All code correct, well-commented, reproducible Code works with minor errors, adequate comments Code runs but has errors or poor organization Code doesn't run or major errors
Interpretation χ², p-value, effect size all correctly interpreted; pattern clearly described using residuals Statistics correctly interpreted, pattern description adequate Some correct interpretation but missing key elements Fundamental misinterpretation of results
Effect Size Cramér's V calculated and interpreted in context V calculated, basic interpretation V calculated but not interpreted Effect size omitted
Write-Up Complete APA format, all elements present, clear and concise Most APA elements, generally clear Some APA elements, somewhat unclear Not in APA format or very unclear

Quick Checks for Understanding

✅ Minute Paper Prompts

Use these at the end of each module (2-3 minutes):

  • Module 1: "Give one example of a categorical variable and one example of a continuous variable from your research area."
  • Module 2: "What does it mean when χ² is large? What does it mean when it's close to 0?"
  • Module 3: "What does a standardized residual of +3.2 tell you about that cell?"
  • Module 4: "When would you use Fisher's exact test instead of chi-square?"

Common Assessment Pitfalls

⚠️ Don't Over-Penalize Technical Errors

Students often struggle with R syntax (missing commas, wrong brackets). If the logic is correct but syntax is off, give partial credit. The goal is statistical thinking, not coding perfection.

💡 Provide Example Write-Ups

Give students 2-3 example APA write-ups (with varying quality) and have them identify strengths/weaknesses. This helps them understand expectations.

🔧 Technical Troubleshooting

Common R Errors and Solutions

Error Message Cause Solution
"Chi-squared approximation may be incorrect" Expected frequencies < 5 Use fisher.test() or combine categories
"x must be non-negative" Negative values in table Check data for entry errors
"probabilities must sum to 1" p vector doesn't sum to 1.0 Recalculate proportions: p/sum(p)
"arguments imply differing number of rows" Matrix rows have different lengths Verify all rows have same # of columns
"Error in data: object not found" Typo in data name Check spelling, use ls() to see objects

Module-Specific Issues

Module 1 (Interactive Elements)

⚠️ JavaScript Not Working

Symptoms: Drag-and-drop or quizzes not interactive

Solutions:

  • Test in Chrome or Firefox (most compatible)
  • Disable pop-up blockers
  • Hard refresh: Ctrl+Shift+R (Windows) or Cmd+Shift+R (Mac)
  • Have static screenshots as backup

Module 2 & 3 (Running Tests in R)

⚠️ Creating Matrices/Tables

Common student mistake:

# Wrong - data in wrong order
data <- matrix(c(10, 20, 30, 40), nrow = 2)

Teaching fix: Always have students verify their table looks right before running test:

# Create matrix
data <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE)

# CHECK IT before proceeding!
data  

# Add row/column names to catch errors
rownames(data) <- c("Group1", "Group2")
colnames(data) <- c("Yes", "No")
data  # Now easier to spot if wrong!

Module 4 (Fisher's Exact Test)

⚠️ Fisher's Test Taking Forever

Cause: Large table without simulation

Solution: Add simulate.p.value = TRUE

fisher.test(data, simulate.p.value = TRUE)

Data Import Issues

💡 Prevent Import Problems

Provide clean datasets with:

  • No special characters
  • No spaces in variable names (use underscores)
  • Missing data as NA (not blank or "missing")
  • Save as CSV (most universal format)

Standard import code to give students:

# Read CSV
data <- read.csv("filename.csv", header = TRUE)

# Check structure
str(data)
head(data)

# Create contingency table
table_data <- table(data$variable1, data$variable2)

Computer Lab Setup

Before First Session

During Class

Alternative Approaches if Technology Fails

💡 Backup Plans
  1. RStudio Cloud: Browser-based R (requires internet)
  2. Demonstrate only: Project screen, students follow along conceptually
  3. Paper-based: Provide pre-run R output, focus on interpretation
  4. Post-class practice: Move hands-on work to homework if lab fails

🎓 Teaching Tips & Philosophy

Creating a Supportive Learning Environment

💡 Normalize Confusion

Say things like:

  • "Chi-square seems simple, but choosing the right test takes practice. Everyone gets confused initially."
  • "I still look up the expected frequency formula every time!"
  • "Making mistakes is how you learn - let's figure out what went wrong together."

Common Pedagogical Challenges

Challenge 1: "It's Just Counting - Why Is This Hard?"

💡 Response Strategy

Why students struggle: Chi-square seems simpler than regression/ANOVA (no equations!), but requires different thinking (frequencies vs. means)

How to help:

  • Acknowledge it feels different from other tests
  • Emphasize the logic: "We're testing if counts differ from expectation"
  • Use lots of examples with real data they care about

Challenge 2: Categorical vs. Continuous Confusion

⚠️ Persistent Issue

Students constantly try to use chi-square on:

  • Likert scales (1-7 ratings)
  • Test scores (0-100)
  • Any ordinal scale

Prevention strategy:

  • Create a decision flowchart poster for the classroom
  • Use the "decimal point test": Can the value have decimals? → Probably continuous
  • Give weekly warm-up questions: "Chi-square or not?"

Challenge 3: P-Value Obsession

💡 Shifting Focus to Effect Sizes

Students say: "It's significant! We're done!"

Redirect to:

  • "What does Cramér's V tell you about the STRENGTH?"
  • "Which cells drive this effect? Look at residuals!"
  • "Is this difference practically meaningful?"

Require: All reports must include effect size and pattern description

Making It Relevant to Their Research

💡 Field-Specific Examples

Connect to student interests:

  • Neuroscience: Neuron types × brain regions, treatment response × receptor subtype
  • Animal Behavior: Species × habitat preference, sex × mating strategy
  • Clinical: Diagnosis × treatment outcome, drug response × genotype
  • Sensation/Perception: Detection success × stimulus intensity category

Ask students on Day 1 what categorical variables they work with!

Active Learning Strategies

💡 Group Activities

Contingency Table Construction (15 min):

  1. Give students raw data (20 observations, 2 variables)
  2. In pairs, have them hand-construct the contingency table
  3. Calculate expected frequencies by hand
  4. Then check with R

Why this works: Builds understanding of what R is doing "under the hood"

💡 "Assumption Detective" Activity

Setup: Provide 4-5 scenarios with data descriptions

Task: Students identify which assumptions are violated and what to do

Example scenarios:

  • Small sample with expected frequencies = 3
  • Same subjects measured before/after
  • Continuous DV misused
  • Perfect example meeting all assumptions

Managing Time Effectively

Activity Time Investment Worth It?
Hand calculation of χ² (one example) 15 minutes ✓ YES - builds intuition
Live coding every example High (30+ min per example) ⚠️ MAYBE - do 2-3 live, rest as handouts
Creating field-specific examples 1-2 hours prep ✓ YES - massively increases engagement
Detailed feedback on all code High (10 min/student) ⚠️ MAYBE - use group feedback for common errors
Multiple practice datasets 2-3 hours ✓ YES - critical for mastery

Measuring Success

✅ Success Indicators
  • Students correctly identify when chi-square is appropriate (vs. t-test/ANOVA)
  • Students check assumptions WITHOUT being prompted
  • Students interpret patterns, not just p-values
  • Students can troubleshoot their own R errors
  • Students ask "Is the effect size meaningful?" not just "Is it significant?"

🎉 Final Notes for Instructors

Chi-square is deceptively simple. Students think "it's just counting" but struggle with:

Your role is to:

Key Takeaway: Chi-square teaches students an essential skill - analyzing categorical outcomes. This appears constantly in real research (treatment response, diagnostic categories, behavioral choices). Make it relevant, make it practical, make it stick!

Good luck with your chi-square modules! 📊✨

Questions or feedback on this instructor's guide?

Adapt these materials to your specific teaching context and student needs!