Teaching Overview
Learning Sequence
These four modules are designed to be taught over 2-3 class sessions:
- Session 1: Modules 1 & 2 (Why normality matters + Visual detection)
- Session 2: Module 3 (Statistical tests and the large sample paradox)
- Session 3: Module 4 (Transformations and solutions)
Total Time Commitment:
- Module 1: 25-30 minutes
- Module 2: 45-65 minutes (most time-intensive)
- Module 3: 30-40 minutes
- Module 4: 35-45 minutes
- Total: ~2.5-3 hours of class time
Key Pedagogical Principles
- Consequences first: Start with why normality matters (Type I errors) before teaching detection
- Visual before statistical: Train pattern recognition before introducing Shapiro-Wilk
- Embrace the paradox: Large sample sizes both help (robustness) and hurt (test sensitivity)
- Practical decision-making: Focus on "what should I do?" not just "is it normal?"
💡 Pro Tip: Students will want rules ("p < .05 means transform!"). Resist this. The goal is thoughtful judgment combining visual inspection, statistical tests, sample size, and robustness considerations.
Module 1: Why Normality Matters
Learning Objectives
- Understand consequences of violating normality (Type I error inflation)
- See practical impact on confidence interval coverage
- Recognize that robustness depends on sample size
- Connect to Central Limit Theorem
Answer Keys
Question 1: Type I Error Rate with Normal Data
Expected result: ~5% (should be close to nominal α = .05)
Good student answer: "The Type I error rate is approximately 5%, which matches our alpha level. This shows the test is working correctly when assumptions are met."
Question 2: Type I Error Rate with Skewed Data (n=20)
Expected result: 8-12% (inflated above 5%)
Good student answer: "With small sample size and skewed data, the Type I error rate increased to about 10%, which is double the intended rate. This means we're rejecting true null hypotheses too often."
Question 3: Type I Error Rate with Skewed Data (n=100)
Expected result: ~5-7% (much closer to nominal, showing robustness)
Good student answer: "With a larger sample size, even though the data is still skewed, the Type I error rate dropped back down to around 5-6%. This demonstrates that larger samples make the test more robust to violations of normality."
Question 4: Why does sample size matter?
Strong answer should mention:
- Central Limit Theorem (sampling distribution becomes normal)
- With small n, violation of normality affects the test directly
- With large n, the sampling distribution of the mean is approximately normal even if the data isn't
Example: "The Central Limit Theorem tells us that as sample size increases, the sampling distribution of the mean approaches normality regardless of the population distribution. With n=100, even our skewed data produces a nearly normal sampling distribution, so the t-test performs well. With n=20, we don't have enough data for the CLT to fully protect us."
Common Student Misconceptions
⚠️ Misconception #1: "Normality violations always invalidate results"
Correction: Emphasize that robustness depends on sample size and severity of violation. Large samples are quite robust.
⚠️ Misconception #2: "If Shapiro-Wilk p > .05, my data is perfectly normal"
Correction: The test only tells us we don't have strong evidence against normality. Visual inspection is still essential.
💡 Discussion Prompt: "Why do you think statisticians care about Type I errors? What's the real-world consequence of rejecting H₀ when it's true?" This connects abstract concepts to research integrity.
Facilitation Notes
- Timing: Allow 5-7 minutes for students to run all simulations and record results
- Interactive element: Have students share their Type I error rates - they'll vary due to random sampling. Discuss why this variability exists.
- Technical issue: If simulations run slowly, reduce iterations from 1000 to 500
Module 2: Visual Detection of Non-Normality
Learning Objectives
- Recognize patterns in histograms and Q-Q plots
- Distinguish between normal, skewed, heavy-tailed, and bimodal distributions
- Build pattern recognition skills through practice
- Create a practical "field guide" for future reference
Answer Keys for Practice Datasets
| Dataset |
Distribution Type |
Histogram Clues |
Q-Q Plot Clues |
Verdict |
| Dataset 1 |
Normal |
Bell-shaped, symmetric |
Points fall on diagonal line |
✓ Normal - Proceed with t-test |
| Dataset 2 |
Right-skewed |
Long tail to the right, peak on left |
Upward curve at right end (heavy right tail) |
⚠️ Skewed - Consider transformation or large n |
| Dataset 3 |
Bimodal |
Two distinct peaks |
S-shaped curve or irregular |
✗ Bimodal - Investigate groups, don't transform |
| Dataset 4 |
Heavy-tailed |
Looks nearly normal but with outliers |
Points deviate at both ends (tails) |
⚠️ Heavy tails - Check for outliers, consider robust methods |
| Dataset 5 |
Left-skewed |
Long tail to the left, peak on right |
Downward curve at left end |
⚠️ Skewed - Consider transformation |
| Dataset 6 |
Normal (mild noise) |
Roughly bell-shaped with minor irregularities |
Points close to line with minor deviations |
✓ Approximately normal - Proceed |
Field Guide - Expected Entries
Strong Field Guide Characteristics:
- Sketches: Simple drawings showing characteristic shapes
- Key words: Descriptive terms (symmetric, tail, peak, clusters)
- Q-Q patterns: Notes on whether points curve up, down, or show S-shapes
- Action items: What to do next (transform, investigate, proceed)
Example Strong Entry (Right-skewed):
Right-Skewed Distribution
Histogram: Peak on left, long tail stretching right →
Most values cluster low, few extreme high values
Q-Q plot: Points curve UPWARD on right side
Action: Try log transformation or sqrt
Common in: Reaction times, income data, counts
Common Student Struggles
⚠️ Issue #1: Confusing which way the skew goes
Tip: "The skew points in the direction of the TAIL, not the peak. Right-skewed = tail goes right."
Memory aid: "Think of the tail as an arrow pointing in the direction of the skew."
⚠️ Issue #2: Not recognizing bimodal distributions
Tip: Emphasize that bimodality suggests TWO GROUPS. Don't transform - investigate! "Two peaks = two populations mixed together."
⚠️ Issue #3: Over-interpreting minor irregularities
Tip: Real data is messy. Minor wiggles are normal. Look for CLEAR patterns, not perfection.
Facilitation Strategy
💡 Recommended Approach:
- Individual exploration (10 min): Have students look at all 6 datasets quietly first
- Partner discussion (15 min): Compare observations and build field guides together
- Whole class reveal (15 min): Go through each dataset, have students share what they saw
- Key moment: When revealing bimodal dataset, ask "What might cause two peaks in real research?" (e.g., male/female, treatment/control, two species)
⏱️ Timing Reality Check:
This module WILL take longer than you expect (45-65 minutes typically). Students need time to:
- Generate and examine each dataset carefully
- Discuss with partners
- Create thoughtful field guides
- Process the connection between histogram and Q-Q plot patterns
Don't rush this. Pattern recognition is the most valuable skill they'll learn.
Discussion Questions
- "Why do you think Q-Q plots are more sensitive than histograms for detecting problems?"
- "What would you do if your histogram and Q-Q plot gave conflicting information?"
- "Dataset 3 was bimodal. In real research, what might cause this?" (Great for critical thinking)
- "Which distribution type do you think is most common in neuroscience/psychology data? Why?"
Module 3: Statistical Tests for Normality
Learning Objectives
- Understand and interpret Shapiro-Wilk test
- Recognize the "large sample paradox"
- Know when to trust visual vs. statistical tests
- Apply Central Limit Theorem understanding
Answer Keys
Questions 1-2: Recording Results
Expected pattern:
- Small sample (n=25): p-value likely > 0.05 (fails to detect deviation)
- Large sample (n=200): p-value likely < 0.05 (detects same deviation)
Key insight: Same distribution type, different conclusions!
Questions 2-3: Comparing Visuals
Question 2 (Histograms): Should look very similar (both slightly right-skewed)
Question 3 (Q-Q plots): Both should show similar patterns (slight deviation from line)
The paradox: They LOOK the same but get different p-values!
Question 4: Pattern as n Increases
Correct observation: p-value tends to decrease as sample size increases
Strong explanation: "Larger samples give the test more 'power' to detect even tiny deviations from perfect normality. With small samples, the test might miss problems. With large samples, it detects everything—even deviations too small to matter practically."
Advanced insight: Some students might note that this creates a dilemma: when you have enough data to reliably detect violations, you also have enough data that violations don't matter much!
Question 5: The Big Question - What Should You Do?
Ideal answer components:
- With large n: Visual inspection matters more than p-value
- If both visual and statistical tests agree: clear decision
- If they conflict: Depends on sample size and severity
- Remember: t-tests are robust with large samples
- When in doubt: Report both findings
Example strong answer:
"When I have a large sample (n > 50) and Shapiro-Wilk says p < .05 but the Q-Q plot looks only mildly skewed, I would proceed with the t-test because: (1) the t-test is robust to moderate non-normality with large samples due to the Central Limit Theorem, and (2) the Shapiro-Wilk test is overly sensitive with large samples, detecting trivial deviations that don't affect the validity of the test. I would note the mild skewness in my write-up but wouldn't transform unless the visual inspection showed severe problems."
Question 6: Robustness Demonstration
Expected result: Even with "failed" Shapiro-Wilk (p < .05), 95% CIs should still achieve ~95% coverage with n=100
Good answer: "Even though the Shapiro-Wilk test rejected normality, the confidence intervals still had approximately 95% coverage. This demonstrates that with large samples, the t-test is robust - it works correctly even when the formal assumption test says normality is violated."
The Large Sample Paradox - Teaching It Well
💡 The Key Teaching Moment:
This is the most important conceptual hurdle in the entire module series. Students need to understand:
- Small samples: Test lacks power (might miss problems) BUT violations matter more
- Large samples: Test has high power (detects tiny problems) BUT violations matter less
- The paradox: The Shapiro-Wilk test is MOST likely to "fail" when you LEAST need to worry about it!
Effective analogy: "It's like a smoke detector that gets more sensitive the bigger your fire extinguisher is. When you have a small extinguisher (small n), it might not detect smoke (low power). When you have a huge extinguisher (large n), it goes off at the tiniest wisp of smoke (high power), but you don't need to worry because you can handle it."
Common Student Reactions
⚠️ Frustration #1: "So when DO I trust the p-value?!"
Response: "Always trust your EYES first. The p-value is just one piece of evidence. With n > 50, visual inspection matters more."
⚠️ Frustration #2: "This seems wishy-washy. I want a clear rule!"
Response: "Statistics isn't about following rules blindly - it's about informed judgment. That's why we're training your pattern recognition skills AND showing you the tests. Real data analysis requires both."
Decision Framework Table - Expected Understanding
| Scenario |
Visual Check |
Shapiro-Wilk |
Sample Size |
Recommendation |
| 1 |
Looks normal |
p > .05 |
Any |
✓ Proceed with confidence |
| 2 |
Clearly skewed |
p < .05 |
Small (n<30) |
Transform or use non-parametric |
| 3 |
Mildly skewed |
p < .05 |
Large (n>50) |
Proceed (robust), note in write-up |
| 4 |
Looks fine |
p < .05 |
Large (n>100) |
Trust your eyes, ignore p-value |
| 5 |
Severe problems |
Any |
Any |
Transform or non-parametric |
| 6 |
Bimodal |
Any |
Any |
Don't transform - investigate groups! |
💡 Have students add this table to their notes! It's a practical reference they'll use in every future analysis.
Facilitation Notes
- Timing: 30-40 minutes total
- Pacing: Don't rush the paradox discussion - it's worth 10-15 minutes
- Assessment check: Ask "Why might p < .05 NOT mean you should transform?" to verify understanding
- Real-world connection: "In published papers, you'll often see 'data were slightly skewed but parametric tests were used due to large sample size and robustness' - now you know why!"
Module 4: Data Transformations
Learning Objectives
- Understand WHY transformations work (compress/expand scales)
- Match transformation to distribution shape
- Apply transformations in R
- Interpret results on transformed scales
- Know when NOT to transform
Answer Keys
Question 1: Why Transformations Work
Strong answer includes:
- Log transformation compresses large values more than small values
- This "pulls in" long right tails
- Makes multiplicative relationships additive
- Changes the scale while preserving order
Example: "A log transformation works on right-skewed data because it compresses large values proportionally more than small values. For instance, log(100) - log(10) = 1, but 100 - 10 = 90. This 'pulls in' the extreme right tail, making the distribution more symmetric."
Practice Dataset Results
Right-skewed dataset:
- Before transformation: Shapiro-Wilk p < .05, Q-Q shows upward curve
- After log transformation: Shapiro-Wilk p > .05, Q-Q points fall on line
- Conclusion: Log transformation successfully normalized the data
Left-skewed dataset:
- Before transformation: Shapiro-Wilk p < .05, Q-Q shows downward curve
- After reflection + sqrt transformation: Shapiro-Wilk p > .05, improved Q-Q
- Note: Some students may try squaring or exponential - these will make it WORSE (wrong direction)
Heavy-tailed dataset:
- Challenge: May not fully normalize with standard transformations
- Best approach: Investigate outliers first, might need robust methods
- Teaching point: Not all problems are solvable with transformations!
Question 2: When NOT to Transform
Key scenarios students should identify:
- Bimodal distributions - transformation won't fix underlying two-group structure
- Large samples with mild skew - robustness makes transformation unnecessary
- Data with meaningful zeros - log(0) is undefined
- When interpretability matters more than normality - original scale may be more meaningful
Example answer:
"You should NOT transform if: (1) you have a bimodal distribution because this suggests two distinct groups that should be analyzed separately, not squashed together, (2) you have a large sample (n > 100) with only mild skewness because the Central Limit Theorem makes the t-test robust anyway, or (3) your data has meaningful zeros (like reaction times or counts) and you'd lose interpretability with a log transformation."
Question 3: Interpretation Challenge
Scenario: Original data in milliseconds, mean = 450ms. After log transformation, mean = 6.1.
What students need to understand:
- 6.1 is the mean of LOG(reaction time), not reaction time itself
- To get back to original scale: exp(6.1) ≈ 445ms
- However, this is now the GEOMETRIC mean, not arithmetic mean
- Differences on log scale = ratios on original scale
Strong interpretation:
"The mean on the log scale is 6.1, which corresponds to a geometric mean of approximately 445ms on the original scale (exp(6.1) = 445). This is slightly lower than the arithmetic mean of 450ms because geometric means are pulled down by skewness. When we report results, we should either back-transform our estimates or clearly state we're working on the log scale."
Transformation Selection Guide - What Students Should Learn
| Distribution Shape |
First Try |
If That Doesn't Work |
R Code |
| Moderate right skew |
Square root |
Log |
sqrt(x) or log(x) |
| Severe right skew |
Log |
Inverse |
log(x) or 1/x |
| Left skew |
Reflect then sqrt |
Reflect then log |
sqrt(max(x)-x) |
| Heavy tails (both ends) |
Check for outliers first! |
Winsorize or robust methods |
Don't transform blindly |
| Bimodal |
DON'T TRANSFORM |
Investigate groups |
Split dataset or add grouping variable |
Common Student Mistakes
⚠️ Mistake #1: "I tried log but it made it worse!"
Likely cause: Data was left-skewed, not right-skewed. Need to reflect first.
Teaching moment: "Always look at the DIRECTION of skew before choosing transformation."
⚠️ Mistake #2: Transforming data with zeros or negative values
Problem: log(0) is undefined, log(negative) is complex
Solution: Add small constant: log(x + 1) or log(x + 0.5)
Caution: This changes interpretation! Mention in write-up.
⚠️ Mistake #3: Forgetting which scale they're on
Symptoms: "The mean reaction time was 6.1 milliseconds" (impossible - that's on log scale!)
Prevention: Always label transformed variables clearly: log_rt not just rt
⚠️ Mistake #4: Over-transforming
Example: Trying 5 different transformations to get p > .05
Problem: This is p-hacking! Choose transformation based on distribution shape, not p-value
Teaching point: "The goal isn't to maximize p-value - it's to make the distribution more symmetric."
Teaching Tips
💡 Make It Visual:
The "before and after" visual comparison is incredibly powerful. Have students:
- Save screenshot of "before" histogram and Q-Q plot
- Apply transformation
- Compare side-by-side with "after" plots
- "Wow, it actually worked!" moments are great for learning
💡 Real-World Context:
Explain why certain data types are naturally skewed:
- Reaction times: Can't be negative, but can have very slow responses (right-skewed)
- Income: Can't be negative, no upper limit (right-skewed)
- Test scores near ceiling: Most students score high, few score low (left-skewed)
- Survival times: Many die quickly, some survive long (right-skewed)
"When you understand WHY data is skewed, you can predict what you'll need to do!"
Advanced Discussion Questions
- "If we transform our data, analyze it, and report results - are we being honest with our readers? How should we report this?"
- "Some researchers always analyze data on original scale even if skewed, arguing for interpretability. Others always transform to meet assumptions. Who's right?"
- "What would you do if your data required a transformation for normality, but your reader expects results in the original units (like milliseconds)?"
- "Can you think of any situations where the log scale is actually MORE meaningful than the original scale?" (Hint: fold-change, ratios, pH)
Facilitation Notes
- Timing: 35-45 minutes total
- Hands-on emphasis: Students learn best by DOING transformations, not just reading about them
- Iteration is normal: Normalize the process of trying → checking → trying again
- Connect to Module 3: "Remember the large sample paradox? That's one reason why transformation might not be necessary!"
Assessment & Grading Guidance
Formative Assessment Throughout Modules
Check for Understanding - Key Moments:
- After Module 1: "Can you explain why Type I errors increased with small samples?"
- After Module 2: "Show me your field guide - can you identify this new distribution?"
- After Module 3: "What would you do: n=150, Shapiro p=.03, Q-Q looks mildly skewed?"
- After Module 4: "Why shouldn't you transform bimodal data?"
Summative Assessment Options
Option 1: Take-Home Analysis Assignment
Prompt: Provide students with 3 datasets (small n with clear skew, large n with mild skew, bimodal). Ask them to:
- Create and interpret visual diagnostics
- Run and interpret Shapiro-Wilk tests
- Make and justify decisions about transformation
- If transforming, show before/after comparison
- Explain their reasoning using concepts from all 4 modules
Grading rubric elements:
- Visual diagnostics are correct and well-labeled (20%)
- Statistical test interpretation is accurate (20%)
- Decision-making shows nuanced understanding of sample size/severity (30%)
- If transformation applied, it's appropriate and verified (20%)
- Written explanation demonstrates conceptual understanding (10%)
Option 2: In-Class Practical Exam
Format: 50-minute exam, students analyze 2 datasets using RStudio
Dataset 1 (20 points): Small sample, clear violation
- Students must identify problem visually
- Run appropriate tests
- Choose and apply transformation
- Verify improvement
Dataset 2 (20 points): Large sample, mild violation
- Students must recognize robustness applies
- Justify NOT transforming
- Demonstrate understanding of large sample paradox
Short answer (10 points):
- "When does Shapiro-Wilk p < .05 NOT mean you should transform?"
- "Why are bimodal distributions different from skewed distributions?"
Option 3: Peer Teaching Exercise
Format: Partners create a 5-minute "mini-lesson" teaching ONE concept to the class
Topics to assign:
- Why normality matters (using Module 1 simulation)
- How to read Q-Q plots (with examples)
- The large sample paradox
- When to use log transformations
- Why not to transform bimodal data
Assessment criteria:
- Accuracy of content
- Clear explanations with examples
- Effective use of visuals
- Answers peer questions correctly
Benefit: Best way to cement understanding is teaching others!
Red Flags in Student Work
⚠️ Red Flag #1: "The Shapiro-Wilk test showed p = .03, so the data is not normal."
Issue: Treating p-value as binary truth rather than evidence
Look for: Nuanced discussion of sample size, visual inspection, severity
⚠️ Red Flag #2: Reporting means on transformed scale without back-transformation
Issue: "Mean log reaction time was 6.1 ms" (nonsensical units)
Look for: Either back-transformed results OR clear statement of scale
⚠️ Red Flag #3: Trying multiple transformations to get p > .05
Issue: P-hacking to "pass" normality test
Look for: Transformation choice justified by distribution shape, not p-value
⚠️ Red Flag #4: Transforming bimodal data
Issue: Fundamental misunderstanding - two groups shouldn't be squashed
Look for: Recognition that bimodality means investigate, don't transform
What Success Looks Like
A student who truly "gets it" will:
- ✓ Look at visual diagnostics FIRST before running statistical tests
- ✓ Consider sample size when interpreting Shapiro-Wilk results
- ✓ Know when violations matter (small n, severe skew) vs. don't matter (large n, mild skew)
- ✓ Choose transformations based on distribution shape, not trial-and-error
- ✓ Recognize that bimodality signals investigation, not transformation
- ✓ Report findings honestly (including limitations)
- ✓ Make thoughtful decisions rather than blindly following rules
Troubleshooting & FAQs
Technical Issues
Issue: "The interactive modules won't run on student computers"
Solutions:
- Ensure JavaScript is enabled in browser
- Try different browser (Chrome works best)
- Check that pop-up blockers aren't interfering
- Have backup: run demos on instructor computer, students follow along
Issue: "Simulations are running too slowly"
Solutions:
- Reduce number of iterations (1000 → 500)
- Reduce sample sizes if needed
- Run on faster computer/connection
- Pre-run and show saved results if necessary
Issue: "Students are getting different results from each other"
Response:
- This is EXPECTED due to random sampling!
- Turn it into teaching moment: discuss sampling variability
- Results should be similar but not identical
- If wildly different, check code for typos
Pedagogical Questions
Q: "Should I teach parametric vs. non-parametric tests alongside this?"
A: These modules focus on checking assumptions and transformations. If you want to add non-parametric alternatives (Mann-Whitney, Kruskal-Wallis, Wilcoxon), consider creating Module 5 as an extension. Students need to master normality checking first before learning when to abandon parametric tests entirely.
Q: "My students have limited R experience - will they struggle?"
A: The modules include all necessary R code. Students primarily need to:
- Copy-paste code into R console
- Change variable names as needed
- Interpret output (which we teach them how to do)
If very new to R, consider a 15-minute R basics review first.
Q: "What if students ask about other assumption tests (Levene's, etc.)?"
A: Great question! These modules focus on normality specifically. You can mention:
- Levene's test for homogeneity of variance (similar logic to Shapiro-Wilk)
- Same large sample paradox applies
- Visual inspection (residual plots) matters here too
Consider creating supplementary materials if your course requires extensive assumption testing.
Frequently Asked Student Questions
| Student Question |
Suggested Response |
| "Why can't we just always use non-parametric tests?" |
"Non-parametric tests have lower power - they're less likely to detect real effects when they exist. Parametric tests are more efficient when assumptions are reasonably met. Also, parametric tests can handle covariates and complex designs more easily." |
| "Do real researchers actually check all these assumptions?" |
"Yes! Good researchers check assumptions. However, with experience, you learn when violations are likely and can sometimes predict what you'll find. But formal checking is important, especially when publishing." |
| "What if my advisor tells me different rules than what we learned here?" |
"Statistical practice varies across fields and even researchers. These modules teach you the concepts and reasoning. In practice, discuss with your advisor and justify your choices. What matters is thoughtful decision-making, not rigid rules." |
| "Can I just use robust standard errors instead?" |
"That's an advanced technique! Robust SEs help with heteroscedasticity and some violations, but they're not a cure-all. For now, master the basics of checking assumptions and transforming. Robust methods are a good next step." |
| "This seems like a lot of work for one assumption..." |
"It does! But normality is one of the most commonly violated assumptions, and mishandling it can invalidate your results. The time invested now will save you from making serious errors later. Plus, this pattern recognition skill transfers to other assumptions too." |
Adaptation Suggestions
For Different Course Levels
For 100-level / Intro Courses:
- Simplify: Focus on Modules 1 and 2 (why it matters + visual detection)
- Skip: Complex transformation decisions, interpretation of transformed scales
- Emphasize: "Normal enough" vs. "perfectly normal" mindset
- Rule of thumb approach: Provide simpler decision rules
For 300-level / Advanced Courses:
- Add: Box-Cox transformation, formal power analysis of normality tests
- Expand: Multivariate normality, residual diagnostics in regression
- Include: Robust regression, bootstrap methods as alternatives
- Critical thinking: Have students critique published papers' normality handling
For Graduate Courses:
- Theory: Derive why CLT provides robustness
- Simulation studies: Students design their own to test robustness boundaries
- Literature review: Examine debates about transformation vs. robust methods
- Applied project: Real dissertation data, complete assumption checking
For Different Disciplines
Psychology/Social Sciences:
- Use examples: reaction times, Likert scales, test scores
- Emphasize: Survey data often skewed, sample sizes often large
- Common issue: Ceiling/floor effects
Neuroscience/Biology:
- Use examples: spike rates, calcium signals, morphological measurements
- Emphasize: Biological data often log-normal
- Common issue: Count data, zero-inflation
Animal Behavior:
- Use examples: time budgets, dominance scores, bout durations
- Emphasize: Behavioral data often bounded (0-100%), rarely negative
- Common issue: Small sample sizes (n=5-10 animals common)
For Different Time Constraints
If you only have ONE class session (75 min):
- Module 1 brief version (10 min): Why it matters
- Module 2 streamlined (30 min): Visual detection with 3 datasets instead of 6
- Module 3 core concept (20 min): Large sample paradox only
- Module 4 overview (15 min): Show one transformation example, provide handout for others
If you have TWO full sessions (150 min):
- Session 1: Modules 1 & 2 in full depth
- Session 2: Modules 3 & 4 in full depth
- This is the recommended pacing
If you have extended time (3+ sessions):
- Session 1: Module 1 + Discussion
- Session 2: Module 2 + Field guide creation
- Session 3: Module 3 + Paradox deep dive
- Session 4: Module 4 + Real data practice
- Session 5: Integration activity / assessment
Resources & References
For Further Reading (Instructor)
- Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health, 23, 151-169.
- Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489.
- Razali, N. M., & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
Student-Friendly Resources
- Q-Q Plot tutorial: https://data.library.virginia.edu/understanding-q-q-plots/
- Transformations guide: https://www.statisticshowto.com/probability-and-statistics/normal-distributions/box-cox-transformation/
- Interactive visualization: https://seeing-theory.brown.edu/probability-distributions/index.html
R Package Documentation
stats::shapiro.test() - Built-in normality test
car::qqPlot() - Enhanced Q-Q plots with confidence bands
MASS::boxcox() - Optimal transformation selection
Final Thoughts for Instructors
The Goal Is Decision-Making, Not Rule-Following
The most important thing students should learn from these modules is how to think about assumptions, not just how to run tests. Good statistical practice requires:
- Understanding the consequences of violations
- Recognizing when robustness applies
- Making thoughtful, justified decisions
- Acknowledging limitations honestly
Students who leave saying:
- ❌ "I need Shapiro-Wilk p > .05 or I can't use t-tests" → They missed the point
- ✓ "I check visually first, consider sample size, and make informed decisions" → Success!
Common Instructor Concerns
"Won't this complexity confuse students who just want clear rules?"
Short-term: Maybe. Long-term: No. Students who learn nuanced thinking become better researchers. Those who learn rigid rules become frustrated when real data doesn't fit the rules (which it never does).
"This takes a lot of class time for one assumption..."
True. But normality is violated more often than any other assumption, and mishandling it invalidates many analyses. Time invested here prevents major errors later. Plus, the thinking skills transfer to other assumptions.
"What if students get different answers from what I expect?"
Good! If they can justify their decision using concepts from the modules, that's more valuable than getting the "right" answer. Statistical analysis isn't always black and white.
You've Got This!
These modules represent a complete, research-based approach to teaching normality testing. They've been designed to:
- ✓ Build conceptual understanding before procedural skills
- ✓ Address common misconceptions directly
- ✓ Provide hands-on practice with immediate feedback
- ✓ Prepare students for real-world data analysis
Remember: The goal isn't perfection - it's thoughtful, informed decision-making. Help your students become statistical thinkers, not just test-runners.
Good luck, and enjoy teaching!