Module 3: Statistical Tests for Normality

The Large Sample Paradox

☁️ Working Guidelines

⏱️ Estimated time: 50 minutes
👥 Work with your partner—discuss the paradox!
💾 Your answers are saved automatically in your browser
📄 When finished, use Print (Ctrl+P/Cmd+P) and "Save as PDF" to submit

0% Complete

🎯 Learning Objectives

By the end of this module, you will:

Understand the Shapiro-Wilk test and how to interpret it
Discover why large samples can give "significant" results even when data look normal
Learn when to trust visual inspection over statistical tests
Apply the Central Limit Theorem to understand robustness

🔍 The Paradox

Here's a puzzling situation:

Two datasets look IDENTICAL in their histograms and Q-Q plots—both appear roughly normal.

But when you run a Shapiro-Wilk test:

Dataset A (n=25): p = 0.32 → "Data are normal" ✓
Dataset B (n=200): p = 0.008 → "Data are NOT normal" ✗

What's going on? Let's investigate this paradox!

Part 1: Meet the Shapiro-Wilk Test

📚 What is the Shapiro-Wilk Test?

The Shapiro-Wilk test is a statistical test that answers the question:

"Do these data come from a normal distribution?"

How it works:

H₀ (null hypothesis): The data ARE normally distributed
Hₐ (alternative): The data are NOT normally distributed
Test statistic: W (ranges from 0 to 1)
Closer to 1 = more normal
p-value interpretation:
- p > .05 → Fail to reject H₀ (no significant deviation from normality)
- p < .05 → Reject H₀ (significant deviation from normality)

Part 2: Experiment with Small Samples

Let's start with a small sample and see how the test behaves.

Part 3: Now Try a Large Sample

Now we'll generate data from the SAME distribution, just with more observations.

📝 Observe & Record

Question 1: Record the results you observed:

Small Sample (n=25):

Shapiro-Wilk W = , p =

Large Sample (n=200):

Shapiro-Wilk W = , p =

Question 2: Compare the two histograms. Do they look different?

Question 3: Look at the Q-Q plots. Do both show points following the line?

🤔 Stuck? Click for a hint

The distributions should look very similar visually, but the p-values might be quite different. Why would a statistical test reject normality for data that looks normal?

Part 4: The Pattern Emerges

Let's test multiple sample sizes to see the pattern clearly.

📝 Discover the Pattern

Question 4: What pattern do you see as sample size (n) increases?

Question 5: THE BIG QUESTION: Your data with n=150 looks perfectly normal in the Q-Q plot, but Shapiro-Wilk gives p = 0.03. What should you do?

Select one:

Trust the visual inspection (proceed with parametric test)
Trust the statistical test (use non-parametric)
It depends on other factors

Explain your reasoning (3-4 sentences):

Part 5: Understanding Why This Happens

💡 The Key Insight

Statistical tests detect statistically significant deviations from normality.

Visual inspection shows practically significant deviations.

With large samples:

Tests become very sensitive to tiny deviations
They can detect differences that don't matter in practice
A p-value of 0.001 might indicate a deviation so small you can't even see it

With small samples:

Tests have low power to detect violations
Non-significant p-values might just mean "we can't tell" (not "it's definitely normal")
Visual inspection is crucial because the test might miss problems

Part 6: Does It Actually Matter?

Let's test whether these "statistically significant" violations actually affect our t-tests.

📝 Synthesis Questions

Question 6: Based on the robustness test, even with n=100 per group and slightly non-normal data, the t-test maintained approximately correct Type I error rates. Why might large samples "protect" us from violations?

Hint: Think about the Central Limit Theorem

Question 7: Create a decision rule: Given a Shapiro-Wilk result and a Q-Q plot, how do you decide whether to proceed with a parametric test? Consider sample size in your answer.

Part 7: Real-World Application

Scenario: You're analyzing survey data from 500 participants. You run diagnostics:

Histogram: Roughly bell-shaped, slight asymmetry
Q-Q plot: Points mostly on the line, slight curve at upper tail
Shapiro-Wilk: W = 0.982, p = 0.001

Question 8A: What is your interpretation? Is the deviation from normality a problem?

Question 8B: Would you proceed with a t-test or ANOVA? Why or why not?

Question 8C: How would you report this in a Results section?

🎯 Key Discoveries

What You Should Have Learned:

✓ Discovery 1: Statistical Significance ≠ Practical Significance

Large samples can produce "significant" Shapiro-Wilk results for trivial deviations that don't affect your analysis.

✓ Discovery 2: Sample Size Matters

n < 30: Normality is important; check carefully
n = 30-50: Moderate robustness; prefer normal but okay if mild violation
n > 50: Highly robust; visual inspection > statistical tests

✓ Discovery 3: Visual + Statistical Together

Always use BOTH visual inspection AND statistical tests. When they disagree, sample size helps you decide which to trust.

✓ Discovery 4: Central Limit Theorem Protection

With large samples, the sampling distribution of means becomes normal even if raw data aren't—this makes t-tests robust.

📚 Decision Framework

Situation	Decision
Visual OK, Shapiro OK (p > .05)	✓ Proceed with parametric test
Visual OK, Shapiro fails (p < .05, n > 50)	✓ Still use parametric (robust with large n)
Visual shows problems, Shapiro OK (n < 30)	⚠ Trust visual; test lacks power
Visual shows problems, Shapiro fails (any n)	✗ Transform or use non-parametric

📋 Before You Submit

✅ Submission Checklist

Both partner names filled in
All simulations run
Questions 1-8 completed
Recorded results from experiments
Decision rule created (Q7)

📤 How to Submit

Click "Save Progress"
Print: Ctrl+P (Windows) or Cmd+P (Mac)
Choose "Save as PDF"
Save as: module3_lastname1_lastname2.pdf
Upload to your course site

🎉 You've mastered the Shapiro-Wilk paradox! 🎉

Next up: Module 4 will teach you what to DO when data aren't normal (transformations!)