Comprehensive Teaching Guide for Modules 1-4
Complete with Answer Keys, Teaching Tips, Common Errors, and Facilitation Notes
These modules teach chi-square as a practical research tool, not a mathematical exercise. Students learn to:
By the end of the complete module series, students should be able to:
chisq.test()Module 3 is the heart of the course. Most research uses test of independence, not goodness-of-fit. Budget extra time for Module 3 and ensure students master contingency tables and effect sizes.
Students should have:
"Chi-square is ONLY for categorical dependent variables."
Students constantly try to use chi-square on continuous data. Hammer this home early! Use examples:
Categorical Variables:
Continuous Variables:
"Age is categorical because we can group it."
Correction: Age is inherently continuous (measured in years). You CAN categorize it (young/middle/old), but the raw variable is continuous. The key is: how was it measured? If measured as a number with decimal places, it's continuous.
Q: Researcher measures reaction time (ms) for coffee vs. no coffee. Use chi-square?
A: NO. Reaction time is continuous. Use t-test.
Teaching point: Even though there are "two groups," the DV is continuous. Chi-square is about the DEPENDENT VARIABLE type, not the number of groups.
Q: 100 patients categorized as improved/no change/worse. Test if distribution differs from equal proportions (33%, 33%, 33%).
A: Goodness-of-fit (one categorical variable)
Teaching point: Only ONE variable (outcome), testing against an expected distribution.
Q: Males and females choosing among three habitats. Which test?
A: Test of independence (two categorical variables: sex AND habitat)
Teaching point: TWO variables, testing if they're related.
chisq.test() in RUse the interactive calculator! Have students calculate χ² by hand for a simple example (3 categories) before using R. This builds intuition for what the statistic measures.
Key insight to convey: Larger differences between O and E → larger χ² → more evidence against H₀
Data: Observed = (15, 18, 27); Expected = (20, 20, 20)
Calculation:
Cell 1: (15-20)²/20 = 25/20 = 1.25 Cell 2: (18-20)²/20 = 4/20 = 0.20 Cell 3: (27-20)²/20 = 49/20 = 2.45 χ² = 1.25 + 0.20 + 2.45 = 3.90 df = 3 - 1 = 2
Walk through: Show that the largest contribution (2.45) comes from the cell most different from expectation (27 vs 20).
Data: Mon=28, Tue=32, Wed=29, Thu=31, Fri=35, Sat=18, Sun=17 (n=190)
Expected: 190/7 = 27.14 per day
R Output: χ²(6) = 14.63, p = .023
Interpretation: Births are NOT equally distributed. Weekend births (Sat/Sun) are notably lower than weekdays - likely due to scheduled procedures.
Teaching point: Always interpret the pattern, not just the p-value! Look at which categories deviate.
Data: 152 purple, 48 white (n=200)
Expected (3:1 ratio): 150 purple (75%), 50 white (25%)
R Code:
flowers <- c(152, 48) chisq.test(flowers, p = c(0.75, 0.25))
R Output: χ²(1) = 0.107, p = .744
Interpretation: Data fit 3:1 ratio very well. Supports Mendelian inheritance!
Teaching point: Non-significant results can be meaningful! Here it confirms theoretical prediction.
Student tries:
responses <- c("A", "B", "A", "C")
chisq.test(responses) # ERROR!
Correct approach:
freq <- table(responses) chisq.test(freq)
Prevention: Emphasize that chi-square operates on COUNTS, not individual observations.
Student specifies: p = c(0.3, 0.3, 0.5) (sums to 1.1!)
Correction: Proportions must sum to 1.0 exactly
Project your screen and walk through:
chisq.test()Use this analogy: "Independence means knowing one variable tells you NOTHING about the other. Like coin flip and dice roll - knowing you flipped heads doesn't help you predict the die."
Non-independence: "Knowing someone is Species A DOES tell you about habitat - they prefer water. Variables are associated."
Formula: Expected = (Row Total × Column Total) / Grand Total
Walk through an example on the board:
| Improved | Not Improved | Total | |
|---|---|---|---|
| Male | 45 | 15 | 60 |
| Female | 25 | 35 | 60 |
| Total | 70 | 50 | 120 |
Calculate Male/Improved expected:
E = (60 × 70) / 120 = 4200 / 120 = 35
Interpretation: "If sex and improvement were independent, we'd expect 35 of the 60 males to improve (the overall improvement rate of 70/120 applied to males)."
Students often ignore residuals! Emphasize that a significant chi-square only tells you "something is different" - residuals tell you WHAT is different.
Rule of thumb: |residual| > 2 means that cell contributes notably to χ²
Data: 3 species × 2 habitats (n=150)
R Output: χ²(2) = 8.26, p = .016
Standardized Residuals:
Forest Grassland
Species A 2.03 -2.03
Species B -1.58 1.58
Species C -0.41 0.41
Interpretation Guide:
Conclusion: The significant chi-square is driven primarily by Species A's strong forest preference.
"The chi-square is 25, so the effect must be large!"
Correction: Chi-square value depends on sample size! Always calculate effect size.
Example:
Formula: V = √[χ² / (n × (k-1))]
Where k = min(rows, cols)
Example: Treatment × Sex (2×2 table)
χ² = 15.63, n = 120, k = 2
V = √[15.63 / (120 × 1)] = √0.130 = 0.36 (moderate effect)
R code to provide students:
# Manual calculation chi_sq <- result$statistic n <- sum(data) k <- min(nrow(data), ncol(data)) V <- sqrt(chi_sq / (n * (k - 1))) V
Data: 3 diagnoses × 2 treatments (n=200)
Results:
Full APA Write-up:
"The relationship between diagnosis and treatment type was examined using a chi-square test of independence. Treatment assignment was significantly associated with diagnosis, χ²(2) = 26.04, p < .001, V = 0.36. Patients with anxiety were more likely to receive therapy (64%) than medication (36%), while patients with depression showed the opposite pattern (63% medication, 38% therapy). The moderate effect size indicates this is a meaningful clinical pattern."
"Expected frequencies ≥ 5 in ALL cells"
This is the #1 assumption students violate. Emphasize:
result$expectedWalk through this flowchart with students:
Practice: Give students scenarios and have them work through the decision tree.
When: ANY expected frequency < 5
Advantage: Exact p-value (no approximation)
Limitation: Computationally intensive for large tables
R Code:
# For 2×2 tables fisher.test(data) # For larger tables, use simulation fisher.test(data, simulate.p.value = TRUE)
Teaching point: Show students the warning message R gives when expected < 5, then demonstrate switching to Fisher's test.
Problem: 2 species × 3 habitats, n=30, some expected frequencies = 2.5
Solutions in order of preference:
What NOT to do: Don't arbitrarily combine categories just to get significance!
Student says: "I can't use chi-square because I have zeros in my table."
Correction: Observed frequencies of 0, 1, 2, etc. are FINE. The rule is about EXPECTED frequencies. Show them:
result$expected # Check THIS, not the observed data
Simple explanation: "For 2×2 tables, R automatically makes the test a bit more conservative to avoid false positives. This is usually good!"
When students ask about turning it off: "Keep it on unless you have a very large sample (n > 500) and theoretical reasons to use the uncorrected version."
Show the difference:
# With correction (default) chisq.test(data) # p = 0.064 # Without correction chisq.test(data, correct = FALSE) # p = 0.032 # See how it affects borderline results?
Scenario: 3 treatments × 2 outcomes, overall χ² is significant
Question: "Which treatments differ?"
Approach:
Better approach: Plan comparisons in advance (control vs. each treatment) to reduce multiple testing burden
Data: 2 stress levels × 3 response categories (n=120)
Expected frequencies: All between 15.42 and 25.08 (all ≥ 5 ✓)
Results:
Teaching points from this problem:
Each module includes built-in "Check Your Understanding" questions. Use these to:
Task: Students complete a full chi-square analysis on provided (or collected) data
Components (100 points total):
| Component | Exemplary (A) | Proficient (B) | Developing (C) | Needs Work (D/F) |
|---|---|---|---|---|
| Test Selection | Correct test chosen with clear justification of why | Correct test, minimal justification | Correct test, no justification OR wrong test with partial reasoning | Wrong test, no justification |
| Assumptions | All assumptions checked, expected frequencies examined, appropriate action taken if violated | Most assumptions checked, basic response to violations | Some assumptions checked but not all, or checked but ignored violations | Assumptions not checked or serious violations ignored |
| R Code | All code correct, well-commented, reproducible | Code works with minor errors, adequate comments | Code runs but has errors or poor organization | Code doesn't run or major errors |
| Interpretation | χ², p-value, effect size all correctly interpreted; pattern clearly described using residuals | Statistics correctly interpreted, pattern description adequate | Some correct interpretation but missing key elements | Fundamental misinterpretation of results |
| Effect Size | Cramér's V calculated and interpreted in context | V calculated, basic interpretation | V calculated but not interpreted | Effect size omitted |
| Write-Up | Complete APA format, all elements present, clear and concise | Most APA elements, generally clear | Some APA elements, somewhat unclear | Not in APA format or very unclear |
Use these at the end of each module (2-3 minutes):
Students often struggle with R syntax (missing commas, wrong brackets). If the logic is correct but syntax is off, give partial credit. The goal is statistical thinking, not coding perfection.
Give students 2-3 example APA write-ups (with varying quality) and have them identify strengths/weaknesses. This helps them understand expectations.
| Error Message | Cause | Solution |
|---|---|---|
| "Chi-squared approximation may be incorrect" | Expected frequencies < 5 | Use fisher.test() or combine categories |
| "x must be non-negative" | Negative values in table | Check data for entry errors |
| "probabilities must sum to 1" | p vector doesn't sum to 1.0 | Recalculate proportions: p/sum(p) |
| "arguments imply differing number of rows" | Matrix rows have different lengths | Verify all rows have same # of columns |
| "Error in data: object not found" | Typo in data name | Check spelling, use ls() to see objects |
Symptoms: Drag-and-drop or quizzes not interactive
Solutions:
Common student mistake:
# Wrong - data in wrong order data <- matrix(c(10, 20, 30, 40), nrow = 2)
Teaching fix: Always have students verify their table looks right before running test:
# Create matrix
data <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE)
# CHECK IT before proceeding!
data
# Add row/column names to catch errors
rownames(data) <- c("Group1", "Group2")
colnames(data) <- c("Yes", "No")
data # Now easier to spot if wrong!
Cause: Large table without simulation
Solution: Add simulate.p.value = TRUE
fisher.test(data, simulate.p.value = TRUE)
Provide clean datasets with:
Standard import code to give students:
# Read CSV
data <- read.csv("filename.csv", header = TRUE)
# Check structure
str(data)
head(data)
# Create contingency table
table_data <- table(data$variable1, data$variable2)
Say things like:
Why students struggle: Chi-square seems simpler than regression/ANOVA (no equations!), but requires different thinking (frequencies vs. means)
How to help:
Students constantly try to use chi-square on:
Prevention strategy:
Students say: "It's significant! We're done!"
Redirect to:
Require: All reports must include effect size and pattern description
Connect to student interests:
Ask students on Day 1 what categorical variables they work with!
Contingency Table Construction (15 min):
Why this works: Builds understanding of what R is doing "under the hood"
Setup: Provide 4-5 scenarios with data descriptions
Task: Students identify which assumptions are violated and what to do
Example scenarios:
| Activity | Time Investment | Worth It? |
|---|---|---|
| Hand calculation of χ² (one example) | 15 minutes | ✓ YES - builds intuition |
| Live coding every example | High (30+ min per example) | ⚠️ MAYBE - do 2-3 live, rest as handouts |
| Creating field-specific examples | 1-2 hours prep | ✓ YES - massively increases engagement |
| Detailed feedback on all code | High (10 min/student) | ⚠️ MAYBE - use group feedback for common errors |
| Multiple practice datasets | 2-3 hours | ✓ YES - critical for mastery |
Chi-square is deceptively simple. Students think "it's just counting" but struggle with:
Your role is to:
Key Takeaway: Chi-square teaches students an essential skill - analyzing categorical outcomes. This appears constantly in real research (treatment response, diagnostic categories, behavioral choices). Make it relevant, make it practical, make it stick!
Good luck with your chi-square modules! 📊✨
Questions or feedback on this instructor's guide?
Adapt these materials to your specific teaching context and student needs!