Instructor's Guide: Teaching Statistical Normality

Teaching Overview

Learning Sequence

These four modules are designed to be taught over 2-3 class sessions:

Session 1: Modules 1 & 2 (Why normality matters + Visual detection)
Session 2: Module 3 (Statistical tests and the large sample paradox)
Session 3: Module 4 (Transformations and solutions)

Total Time Commitment:

Module 1: 25-30 minutes
Module 2: 45-65 minutes (most time-intensive)
Module 3: 30-40 minutes
Module 4: 35-45 minutes
Total: ~2.5-3 hours of class time

Key Pedagogical Principles

Consequences first: Start with why normality matters (Type I errors) before teaching detection
Visual before statistical: Train pattern recognition before introducing Shapiro-Wilk
Embrace the paradox: Large sample sizes both help (robustness) and hurt (test sensitivity)
Practical decision-making: Focus on "what should I do?" not just "is it normal?"

💡 Pro Tip: Students will want rules ("p < .05 means transform!"). Resist this. The goal is thoughtful judgment combining visual inspection, statistical tests, sample size, and robustness considerations.

Module 1: Why Normality Matters

Learning Objectives

Understand consequences of violating normality (Type I error inflation)
See practical impact on confidence interval coverage
Recognize that robustness depends on sample size
Connect to Central Limit Theorem

Answer Keys

Question 1: Type I Error Rate with Normal Data

Expected result: ~5% (should be close to nominal α = .05)

Good student answer: "The Type I error rate is approximately 5%, which matches our alpha level. This shows the test is working correctly when assumptions are met."

Question 2: Type I Error Rate with Skewed Data (n=20)

Expected result: 8-12% (inflated above 5%)

Good student answer: "With small sample size and skewed data, the Type I error rate increased to about 10%, which is double the intended rate. This means we're rejecting true null hypotheses too often."

Question 3: Type I Error Rate with Skewed Data (n=100)

Expected result: ~5-7% (much closer to nominal, showing robustness)

Good student answer: "With a larger sample size, even though the data is still skewed, the Type I error rate dropped back down to around 5-6%. This demonstrates that larger samples make the test more robust to violations of normality."

Question 4: Why does sample size matter?

Strong answer should mention:

Central Limit Theorem (sampling distribution becomes normal)
With small n, violation of normality affects the test directly
With large n, the sampling distribution of the mean is approximately normal even if the data isn't

Example: "The Central Limit Theorem tells us that as sample size increases, the sampling distribution of the mean approaches normality regardless of the population distribution. With n=100, even our skewed data produces a nearly normal sampling distribution, so the t-test performs well. With n=20, we don't have enough data for the CLT to fully protect us."

Common Student Misconceptions

⚠️ Misconception #1: "Normality violations always invalidate results"
Correction: Emphasize that robustness depends on sample size and severity of violation. Large samples are quite robust.

⚠️ Misconception #2: "If Shapiro-Wilk p > .05, my data is perfectly normal"
Correction: The test only tells us we don't have strong evidence against normality. Visual inspection is still essential.

💡 Discussion Prompt: "Why do you think statisticians care about Type I errors? What's the real-world consequence of rejecting H₀ when it's true?" This connects abstract concepts to research integrity.

Facilitation Notes

Timing: Allow 5-7 minutes for students to run all simulations and record results
Interactive element: Have students share their Type I error rates - they'll vary due to random sampling. Discuss why this variability exists.
Technical issue: If simulations run slowly, reduce iterations from 1000 to 500

Module 2: Visual Detection of Non-Normality

Learning Objectives

Recognize patterns in histograms and Q-Q plots
Distinguish between normal, skewed, heavy-tailed, and bimodal distributions
Build pattern recognition skills through practice
Create a practical "field guide" for future reference

Answer Keys for Practice Datasets

Dataset	Distribution Type	Histogram Clues	Q-Q Plot Clues	Verdict
Dataset 1	Normal	Bell-shaped, symmetric	Points fall on diagonal line	✓ Normal - Proceed with t-test
Dataset 2	Right-skewed	Long tail to the right, peak on left	Upward curve at right end (heavy right tail)	⚠️ Skewed - Consider transformation or large n
Dataset 3	Bimodal	Two distinct peaks	S-shaped curve or irregular	✗ Bimodal - Investigate groups, don't transform
Dataset 4	Heavy-tailed	Looks nearly normal but with outliers	Points deviate at both ends (tails)	⚠️ Heavy tails - Check for outliers, consider robust methods
Dataset 5	Left-skewed	Long tail to the left, peak on right	Downward curve at left end	⚠️ Skewed - Consider transformation
Dataset 6	Normal (mild noise)	Roughly bell-shaped with minor irregularities	Points close to line with minor deviations	✓ Approximately normal - Proceed

Field Guide - Expected Entries

Strong Field Guide Characteristics:

Sketches: Simple drawings showing characteristic shapes
Key words: Descriptive terms (symmetric, tail, peak, clusters)
Q-Q patterns: Notes on whether points curve up, down, or show S-shapes
Action items: What to do next (transform, investigate, proceed)

Example Strong Entry (Right-skewed):

Right-Skewed Distribution
Histogram: Peak on left, long tail stretching right →
Most values cluster low, few extreme high values
Q-Q plot: Points curve UPWARD on right side
Action: Try log transformation or sqrt
Common in: Reaction times, income data, counts

Common Student Struggles

⚠️ Issue #1: Confusing which way the skew goes
Tip: "The skew points in the direction of the TAIL, not the peak. Right-skewed = tail goes right."
Memory aid: "Think of the tail as an arrow pointing in the direction of the skew."

⚠️ Issue #2: Not recognizing bimodal distributions
Tip: Emphasize that bimodality suggests TWO GROUPS. Don't transform - investigate! "Two peaks = two populations mixed together."

⚠️ Issue #3: Over-interpreting minor irregularities
Tip: Real data is messy. Minor wiggles are normal. Look for CLEAR patterns, not perfection.

Facilitation Strategy

💡 Recommended Approach:

Individual exploration (10 min): Have students look at all 6 datasets quietly first
Partner discussion (15 min): Compare observations and build field guides together
Whole class reveal (15 min): Go through each dataset, have students share what they saw
Key moment: When revealing bimodal dataset, ask "What might cause two peaks in real research?" (e.g., male/female, treatment/control, two species)

⏱️ Timing Reality Check:

This module WILL take longer than you expect (45-65 minutes typically). Students need time to:

Generate and examine each dataset carefully
Discuss with partners
Create thoughtful field guides
Process the connection between histogram and Q-Q plot patterns

Don't rush this. Pattern recognition is the most valuable skill they'll learn.

Discussion Questions

"Why do you think Q-Q plots are more sensitive than histograms for detecting problems?"
"What would you do if your histogram and Q-Q plot gave conflicting information?"
"Dataset 3 was bimodal. In real research, what might cause this?" (Great for critical thinking)
"Which distribution type do you think is most common in neuroscience/psychology data? Why?"

Module 3: Statistical Tests for Normality

Learning Objectives

Understand and interpret Shapiro-Wilk test
Recognize the "large sample paradox"
Know when to trust visual vs. statistical tests
Apply Central Limit Theorem understanding

Answer Keys

Questions 1-2: Recording Results

Expected pattern:

Small sample (n=25): p-value likely > 0.05 (fails to detect deviation)
Large sample (n=200): p-value likely < 0.05 (detects same deviation)

Key insight: Same distribution type, different conclusions!

Questions 2-3: Comparing Visuals

Question 2 (Histograms): Should look very similar (both slightly right-skewed)

Question 3 (Q-Q plots): Both should show similar patterns (slight deviation from line)

The paradox: They LOOK the same but get different p-values!

Question 4: Pattern as n Increases

Correct observation: p-value tends to decrease as sample size increases

Strong explanation: "Larger samples give the test more 'power' to detect even tiny deviations from perfect normality. With small samples, the test might miss problems. With large samples, it detects everything—even deviations too small to matter practically."

Advanced insight: Some students might note that this creates a dilemma: when you have enough data to reliably detect violations, you also have enough data that violations don't matter much!

Question 5: The Big Question - What Should You Do?

Ideal answer components:

With large n: Visual inspection matters more than p-value
If both visual and statistical tests agree: clear decision
If they conflict: Depends on sample size and severity
Remember: t-tests are robust with large samples
When in doubt: Report both findings

Example strong answer:

"When I have a large sample (n > 50) and Shapiro-Wilk says p < .05 but the Q-Q plot looks only mildly skewed, I would proceed with the t-test because: (1) the t-test is robust to moderate non-normality with large samples due to the Central Limit Theorem, and (2) the Shapiro-Wilk test is overly sensitive with large samples, detecting trivial deviations that don't affect the validity of the test. I would note the mild skewness in my write-up but wouldn't transform unless the visual inspection showed severe problems."

Question 6: Robustness Demonstration

Expected result: Even with "failed" Shapiro-Wilk (p < .05), 95% CIs should still achieve ~95% coverage with n=100

Good answer: "Even though the Shapiro-Wilk test rejected normality, the confidence intervals still had approximately 95% coverage. This demonstrates that with large samples, the t-test is robust - it works correctly even when the formal assumption test says normality is violated."

The Large Sample Paradox - Teaching It Well

💡 The Key Teaching Moment:

This is the most important conceptual hurdle in the entire module series. Students need to understand:

Small samples: Test lacks power (might miss problems) BUT violations matter more
Large samples: Test has high power (detects tiny problems) BUT violations matter less
The paradox: The Shapiro-Wilk test is MOST likely to "fail" when you LEAST need to worry about it!

Effective analogy: "It's like a smoke detector that gets more sensitive the bigger your fire extinguisher is. When you have a small extinguisher (small n), it might not detect smoke (low power). When you have a huge extinguisher (large n), it goes off at the tiniest wisp of smoke (high power), but you don't need to worry because you can handle it."

Common Student Reactions

⚠️ Frustration #1: "So when DO I trust the p-value?!"
Response: "Always trust your EYES first. The p-value is just one piece of evidence. With n > 50, visual inspection matters more."

⚠️ Frustration #2: "This seems wishy-washy. I want a clear rule!"
Response: "Statistics isn't about following rules blindly - it's about informed judgment. That's why we're training your pattern recognition skills AND showing you the tests. Real data analysis requires both."

Decision Framework Table - Expected Understanding

Scenario	Visual Check	Shapiro-Wilk	Sample Size	Recommendation
1	Looks normal	p > .05	Any	✓ Proceed with confidence
2	Clearly skewed	p < .05	Small (n<30)	Transform or use non-parametric
3	Mildly skewed	p < .05	Large (n>50)	Proceed (robust), note in write-up
4	Looks fine	p < .05	Large (n>100)	Trust your eyes, ignore p-value
5	Severe problems	Any	Any	Transform or non-parametric
6	Bimodal	Any	Any	Don't transform - investigate groups!

💡 Have students add this table to their notes! It's a practical reference they'll use in every future analysis.

Facilitation Notes

Timing: 30-40 minutes total
Pacing: Don't rush the paradox discussion - it's worth 10-15 minutes
Assessment check: Ask "Why might p < .05 NOT mean you should transform?" to verify understanding
Real-world connection: "In published papers, you'll often see 'data were slightly skewed but parametric tests were used due to large sample size and robustness' - now you know why!"

Module 4: Data Transformations

Learning Objectives

Understand WHY transformations work (compress/expand scales)
Match transformation to distribution shape
Apply transformations in R
Interpret results on transformed scales
Know when NOT to transform

Answer Keys

Question 1: Why Transformations Work

Strong answer includes:

Log transformation compresses large values more than small values
This "pulls in" long right tails
Makes multiplicative relationships additive
Changes the scale while preserving order

Example: "A log transformation works on right-skewed data because it compresses large values proportionally more than small values. For instance, log(100) - log(10) = 1, but 100 - 10 = 90. This 'pulls in' the extreme right tail, making the distribution more symmetric."

Practice Dataset Results

Right-skewed dataset:

Before transformation: Shapiro-Wilk p < .05, Q-Q shows upward curve
After log transformation: Shapiro-Wilk p > .05, Q-Q points fall on line
Conclusion: Log transformation successfully normalized the data

Left-skewed dataset:

Before transformation: Shapiro-Wilk p < .05, Q-Q shows downward curve
After reflection + sqrt transformation: Shapiro-Wilk p > .05, improved Q-Q
Note: Some students may try squaring or exponential - these will make it WORSE (wrong direction)

Heavy-tailed dataset:

Challenge: May not fully normalize with standard transformations
Best approach: Investigate outliers first, might need robust methods
Teaching point: Not all problems are solvable with transformations!

Question 2: When NOT to Transform

Key scenarios students should identify:

Bimodal distributions - transformation won't fix underlying two-group structure
Large samples with mild skew - robustness makes transformation unnecessary
Data with meaningful zeros - log(0) is undefined
When interpretability matters more than normality - original scale may be more meaningful

Example answer:

"You should NOT transform if: (1) you have a bimodal distribution because this suggests two distinct groups that should be analyzed separately, not squashed together, (2) you have a large sample (n > 100) with only mild skewness because the Central Limit Theorem makes the t-test robust anyway, or (3) your data has meaningful zeros (like reaction times or counts) and you'd lose interpretability with a log transformation."

Question 3: Interpretation Challenge

Scenario: Original data in milliseconds, mean = 450ms. After log transformation, mean = 6.1.

What students need to understand:

6.1 is the mean of LOG(reaction time), not reaction time itself
To get back to original scale: exp(6.1) ≈ 445ms
However, this is now the GEOMETRIC mean, not arithmetic mean
Differences on log scale = ratios on original scale

Strong interpretation:

"The mean on the log scale is 6.1, which corresponds to a geometric mean of approximately 445ms on the original scale (exp(6.1) = 445). This is slightly lower than the arithmetic mean of 450ms because geometric means are pulled down by skewness. When we report results, we should either back-transform our estimates or clearly state we're working on the log scale."

Transformation Selection Guide - What Students Should Learn

Distribution Shape	First Try	If That Doesn't Work	R Code
Moderate right skew	Square root	Log	`sqrt(x)` or `log(x)`
Severe right skew	Log	Inverse	`log(x)` or `1/x`
Left skew	Reflect then sqrt	Reflect then log	`sqrt(max(x)-x)`
Heavy tails (both ends)	Check for outliers first!	Winsorize or robust methods	Don't transform blindly
Bimodal	DON'T TRANSFORM	Investigate groups	Split dataset or add grouping variable

Common Student Mistakes

⚠️ Mistake #1: "I tried log but it made it worse!"
Likely cause: Data was left-skewed, not right-skewed. Need to reflect first.
Teaching moment: "Always look at the DIRECTION of skew before choosing transformation."

⚠️ Mistake #2: Transforming data with zeros or negative values
Problem: log(0) is undefined, log(negative) is complex
Solution: Add small constant: log(x + 1) or log(x + 0.5)
Caution: This changes interpretation! Mention in write-up.

⚠️ Mistake #3: Forgetting which scale they're on
Symptoms: "The mean reaction time was 6.1 milliseconds" (impossible - that's on log scale!)
Prevention: Always label transformed variables clearly: log_rt not just rt

⚠️ Mistake #4: Over-transforming
Example: Trying 5 different transformations to get p > .05
Problem: This is p-hacking! Choose transformation based on distribution shape, not p-value
Teaching point: "The goal isn't to maximize p-value - it's to make the distribution more symmetric."

Teaching Tips

💡 Make It Visual:

The "before and after" visual comparison is incredibly powerful. Have students:

Save screenshot of "before" histogram and Q-Q plot
Apply transformation
Compare side-by-side with "after" plots
"Wow, it actually worked!" moments are great for learning

💡 Real-World Context:

Explain why certain data types are naturally skewed:

Reaction times: Can't be negative, but can have very slow responses (right-skewed)
Income: Can't be negative, no upper limit (right-skewed)
Test scores near ceiling: Most students score high, few score low (left-skewed)
Survival times: Many die quickly, some survive long (right-skewed)

"When you understand WHY data is skewed, you can predict what you'll need to do!"

Advanced Discussion Questions

"If we transform our data, analyze it, and report results - are we being honest with our readers? How should we report this?"
"Some researchers always analyze data on original scale even if skewed, arguing for interpretability. Others always transform to meet assumptions. Who's right?"
"What would you do if your data required a transformation for normality, but your reader expects results in the original units (like milliseconds)?"
"Can you think of any situations where the log scale is actually MORE meaningful than the original scale?" (Hint: fold-change, ratios, pH)

Facilitation Notes

Timing: 35-45 minutes total
Hands-on emphasis: Students learn best by DOING transformations, not just reading about them
Iteration is normal: Normalize the process of trying → checking → trying again
Connect to Module 3: "Remember the large sample paradox? That's one reason why transformation might not be necessary!"

Assessment & Grading Guidance

Formative Assessment Throughout Modules

Check for Understanding - Key Moments:

After Module 1: "Can you explain why Type I errors increased with small samples?"
After Module 2: "Show me your field guide - can you identify this new distribution?"
After Module 3: "What would you do: n=150, Shapiro p=.03, Q-Q looks mildly skewed?"
After Module 4: "Why shouldn't you transform bimodal data?"

Summative Assessment Options

Option 1: Take-Home Analysis Assignment

Prompt: Provide students with 3 datasets (small n with clear skew, large n with mild skew, bimodal). Ask them to:

Create and interpret visual diagnostics
Run and interpret Shapiro-Wilk tests
Make and justify decisions about transformation
If transforming, show before/after comparison
Explain their reasoning using concepts from all 4 modules

Grading rubric elements:

Visual diagnostics are correct and well-labeled (20%)
Statistical test interpretation is accurate (20%)
Decision-making shows nuanced understanding of sample size/severity (30%)
If transformation applied, it's appropriate and verified (20%)
Written explanation demonstrates conceptual understanding (10%)

Option 2: In-Class Practical Exam

Format: 50-minute exam, students analyze 2 datasets using RStudio

Dataset 1 (20 points): Small sample, clear violation

Students must identify problem visually
Run appropriate tests
Choose and apply transformation
Verify improvement

Dataset 2 (20 points): Large sample, mild violation

Students must recognize robustness applies
Justify NOT transforming
Demonstrate understanding of large sample paradox

Short answer (10 points):

"When does Shapiro-Wilk p < .05 NOT mean you should transform?"
"Why are bimodal distributions different from skewed distributions?"

Option 3: Peer Teaching Exercise

Format: Partners create a 5-minute "mini-lesson" teaching ONE concept to the class

Topics to assign:

Why normality matters (using Module 1 simulation)
How to read Q-Q plots (with examples)
The large sample paradox
When to use log transformations
Why not to transform bimodal data

Assessment criteria:

Accuracy of content
Clear explanations with examples
Effective use of visuals
Answers peer questions correctly

Benefit: Best way to cement understanding is teaching others!

Red Flags in Student Work

⚠️ Red Flag #1: "The Shapiro-Wilk test showed p = .03, so the data is not normal."
Issue: Treating p-value as binary truth rather than evidence
Look for: Nuanced discussion of sample size, visual inspection, severity

⚠️ Red Flag #2: Reporting means on transformed scale without back-transformation
Issue: "Mean log reaction time was 6.1 ms" (nonsensical units)
Look for: Either back-transformed results OR clear statement of scale

⚠️ Red Flag #3: Trying multiple transformations to get p > .05
Issue: P-hacking to "pass" normality test
Look for: Transformation choice justified by distribution shape, not p-value

⚠️ Red Flag #4: Transforming bimodal data
Issue: Fundamental misunderstanding - two groups shouldn't be squashed
Look for: Recognition that bimodality means investigate, don't transform

What Success Looks Like

A student who truly "gets it" will:

✓ Look at visual diagnostics FIRST before running statistical tests
✓ Consider sample size when interpreting Shapiro-Wilk results
✓ Know when violations matter (small n, severe skew) vs. don't matter (large n, mild skew)
✓ Choose transformations based on distribution shape, not trial-and-error
✓ Recognize that bimodality signals investigation, not transformation
✓ Report findings honestly (including limitations)
✓ Make thoughtful decisions rather than blindly following rules

Troubleshooting & FAQs

Technical Issues

Issue: "The interactive modules won't run on student computers"
Solutions:

Ensure JavaScript is enabled in browser
Try different browser (Chrome works best)
Check that pop-up blockers aren't interfering
Have backup: run demos on instructor computer, students follow along

Issue: "Simulations are running too slowly"
Solutions:

Reduce number of iterations (1000 → 500)
Reduce sample sizes if needed
Run on faster computer/connection
Pre-run and show saved results if necessary

Issue: "Students are getting different results from each other"
Response:

This is EXPECTED due to random sampling!
Turn it into teaching moment: discuss sampling variability
Results should be similar but not identical
If wildly different, check code for typos

Pedagogical Questions

Q: "Should I teach parametric vs. non-parametric tests alongside this?"
A: These modules focus on checking assumptions and transformations. If you want to add non-parametric alternatives (Mann-Whitney, Kruskal-Wallis, Wilcoxon), consider creating Module 5 as an extension. Students need to master normality checking first before learning when to abandon parametric tests entirely.

Q: "My students have limited R experience - will they struggle?"
A: The modules include all necessary R code. Students primarily need to:

Copy-paste code into R console
Change variable names as needed
Interpret output (which we teach them how to do)

If very new to R, consider a 15-minute R basics review first.

Q: "What if students ask about other assumption tests (Levene's, etc.)?"
A: Great question! These modules focus on normality specifically. You can mention:

Levene's test for homogeneity of variance (similar logic to Shapiro-Wilk)
Same large sample paradox applies
Visual inspection (residual plots) matters here too

Consider creating supplementary materials if your course requires extensive assumption testing.

Frequently Asked Student Questions

Student Question	Suggested Response
"Why can't we just always use non-parametric tests?"	"Non-parametric tests have lower power - they're less likely to detect real effects when they exist. Parametric tests are more efficient when assumptions are reasonably met. Also, parametric tests can handle covariates and complex designs more easily."
"Do real researchers actually check all these assumptions?"	"Yes! Good researchers check assumptions. However, with experience, you learn when violations are likely and can sometimes predict what you'll find. But formal checking is important, especially when publishing."
"What if my advisor tells me different rules than what we learned here?"	"Statistical practice varies across fields and even researchers. These modules teach you the concepts and reasoning. In practice, discuss with your advisor and justify your choices. What matters is thoughtful decision-making, not rigid rules."
"Can I just use robust standard errors instead?"	"That's an advanced technique! Robust SEs help with heteroscedasticity and some violations, but they're not a cure-all. For now, master the basics of checking assumptions and transforming. Robust methods are a good next step."
"This seems like a lot of work for one assumption..."	"It does! But normality is one of the most commonly violated assumptions, and mishandling it can invalidate your results. The time invested now will save you from making serious errors later. Plus, this pattern recognition skill transfers to other assumptions too."

Adaptation Suggestions

For Different Course Levels

For 100-level / Intro Courses:

Simplify: Focus on Modules 1 and 2 (why it matters + visual detection)
Skip: Complex transformation decisions, interpretation of transformed scales
Emphasize: "Normal enough" vs. "perfectly normal" mindset
Rule of thumb approach: Provide simpler decision rules

For 300-level / Advanced Courses:

Add: Box-Cox transformation, formal power analysis of normality tests
Expand: Multivariate normality, residual diagnostics in regression
Include: Robust regression, bootstrap methods as alternatives
Critical thinking: Have students critique published papers' normality handling

For Graduate Courses:

Theory: Derive why CLT provides robustness
Simulation studies: Students design their own to test robustness boundaries
Literature review: Examine debates about transformation vs. robust methods
Applied project: Real dissertation data, complete assumption checking

For Different Disciplines

Psychology/Social Sciences:

Use examples: reaction times, Likert scales, test scores
Emphasize: Survey data often skewed, sample sizes often large
Common issue: Ceiling/floor effects

Neuroscience/Biology:

Use examples: spike rates, calcium signals, morphological measurements
Emphasize: Biological data often log-normal
Common issue: Count data, zero-inflation

Animal Behavior:

Use examples: time budgets, dominance scores, bout durations
Emphasize: Behavioral data often bounded (0-100%), rarely negative
Common issue: Small sample sizes (n=5-10 animals common)

For Different Time Constraints

If you only have ONE class session (75 min):

Module 1 brief version (10 min): Why it matters
Module 2 streamlined (30 min): Visual detection with 3 datasets instead of 6
Module 3 core concept (20 min): Large sample paradox only
Module 4 overview (15 min): Show one transformation example, provide handout for others

If you have TWO full sessions (150 min):

Session 1: Modules 1 & 2 in full depth
Session 2: Modules 3 & 4 in full depth
This is the recommended pacing

If you have extended time (3+ sessions):

Session 1: Module 1 + Discussion
Session 2: Module 2 + Field guide creation
Session 3: Module 3 + Paradox deep dive
Session 4: Module 4 + Real data practice
Session 5: Integration activity / assessment

Resources & References

For Further Reading (Instructor)

Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health, 23, 151-169.
Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489.
Razali, N. M., & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.

Student-Friendly Resources

Q-Q Plot tutorial: https://data.library.virginia.edu/understanding-q-q-plots/
Transformations guide: https://www.statisticshowto.com/probability-and-statistics/normal-distributions/box-cox-transformation/
Interactive visualization: https://seeing-theory.brown.edu/probability-distributions/index.html

R Package Documentation

stats::shapiro.test() - Built-in normality test
car::qqPlot() - Enhanced Q-Q plots with confidence bands
MASS::boxcox() - Optimal transformation selection

Final Thoughts for Instructors

The Goal Is Decision-Making, Not Rule-Following

The most important thing students should learn from these modules is how to think about assumptions, not just how to run tests. Good statistical practice requires:

Understanding the consequences of violations
Recognizing when robustness applies
Making thoughtful, justified decisions
Acknowledging limitations honestly

Students who leave saying:

❌ "I need Shapiro-Wilk p > .05 or I can't use t-tests" → They missed the point
✓ "I check visually first, consider sample size, and make informed decisions" → Success!

Common Instructor Concerns

"Won't this complexity confuse students who just want clear rules?"

Short-term: Maybe. Long-term: No. Students who learn nuanced thinking become better researchers. Those who learn rigid rules become frustrated when real data doesn't fit the rules (which it never does).

"This takes a lot of class time for one assumption..."

True. But normality is violated more often than any other assumption, and mishandling it invalidates many analyses. Time invested here prevents major errors later. Plus, the thinking skills transfer to other assumptions.

"What if students get different answers from what I expect?"

Good! If they can justify their decision using concepts from the modules, that's more valuable than getting the "right" answer. Statistical analysis isn't always black and white.

You've Got This!

These modules represent a complete, research-based approach to teaching normality testing. They've been designed to:

✓ Build conceptual understanding before procedural skills
✓ Address common misconceptions directly
✓ Provide hands-on practice with immediate feedback
✓ Prepare students for real-world data analysis

Remember: The goal isn't perfection - it's thoughtful, informed decision-making. Help your students become statistical thinkers, not just test-runners.

Good luck, and enjoy teaching!