The Large Sample Paradox
By the end of this module, you will:
Here's a puzzling situation:
Two datasets look IDENTICAL in their histograms and Q-Q plotsβboth appear roughly normal.
But when you run a Shapiro-Wilk test:
What's going on? Let's investigate this paradox!
The Shapiro-Wilk test is a statistical test that answers the question:
"Do these data come from a normal distribution?"
How it works:
Let's start with a small sample and see how the test behaves.
Now we'll generate data from the SAME distribution, just with more observations.
Question 1: Record the results you observed:
Small Sample (n=25):
Shapiro-Wilk W = , p =
Large Sample (n=200):
Shapiro-Wilk W = , p =
Question 2: Compare the two histograms. Do they look different?
Question 3: Look at the Q-Q plots. Do both show points following the line?
The distributions should look very similar visually, but the p-values might be quite different. Why would a statistical test reject normality for data that looks normal?
Let's test multiple sample sizes to see the pattern clearly.
Question 4: What pattern do you see as sample size (n) increases?
Question 5: THE BIG QUESTION: Your data with n=150 looks perfectly normal in the Q-Q plot, but Shapiro-Wilk gives p = 0.03. What should you do?
Select one:
Explain your reasoning (3-4 sentences):
Statistical tests detect statistically significant deviations from normality.
Visual inspection shows practically significant deviations.
With large samples:
With small samples:
Let's test whether these "statistically significant" violations actually affect our t-tests.
Question 6: Based on the robustness test, even with n=100 per group and slightly non-normal data, the t-test maintained approximately correct Type I error rates. Why might large samples "protect" us from violations?
Hint: Think about the Central Limit Theorem
Question 7: Create a decision rule: Given a Shapiro-Wilk result and a Q-Q plot, how do you decide whether to proceed with a parametric test? Consider sample size in your answer.
Scenario: You're analyzing survey data from 500 participants. You run diagnostics:
Question 8A: What is your interpretation? Is the deviation from normality a problem?
Question 8B: Would you proceed with a t-test or ANOVA? Why or why not?
Question 8C: How would you report this in a Results section?
β Discovery 1: Statistical Significance β Practical Significance
Large samples can produce "significant" Shapiro-Wilk results for trivial deviations that don't affect your analysis.
β Discovery 2: Sample Size Matters
β Discovery 3: Visual + Statistical Together
Always use BOTH visual inspection AND statistical tests. When they disagree, sample size helps you decide which to trust.
β Discovery 4: Central Limit Theorem Protection
With large samples, the sampling distribution of means becomes normal even if raw data aren'tβthis makes t-tests robust.
| Situation | Decision |
|---|---|
| Visual OK, Shapiro OK (p > .05) |
β Proceed with parametric test |
| Visual OK, Shapiro fails (p < .05, n > 50) |
β Still use parametric (robust with large n) |
| Visual shows problems, Shapiro OK (n < 30) |
β Trust visual; test lacks power |
| Visual shows problems, Shapiro fails (any n) |
β Transform or use non-parametric |
module3_lastname1_lastname2.pdfπ You've mastered the Shapiro-Wilk paradox! π
Next up: Module 4 will teach you what to DO when data aren't normal (transformations!)