Checking Normality: Workflow & R Code

Practical Guide

This page provides step-by-step workflows for checking normality in R, organized by test type.

🎯 Decision Tree: What Should I Check?

graph TD
    A[Do I have a model?<br/>lm or aov object] --> B[YES]
    A --> C[NO]
    B --> D[Check RESIDUALS<br/>residuals model]
    C --> E[What test am I running?]
    E --> F[One-sample t-test]
    E --> G[Independent t-test]
    E --> H[Paired t-test]
    E --> I[Correlation]
    F --> J[Check the variable itself]
    G --> K[Check each group separately]
    H --> L[Check difference scores]
    I --> M[Check both variables]

Quick Reference by Test Type

Test Type	What to Check	How
One-sample t-test	The variable	`shapiro.test(data$var)`
Independent t-test	Each group separately	`tapply(data$var, data$group, shapiro.test)`
Paired t-test	Difference scores	`shapiro.test(data$diff)`
ANOVA	Model residuals	`shapiro.test(residuals(model))`
Regression	Model residuals	`shapiro.test(residuals(model))`
Correlation	Both variables	`shapiro.test(data$x); shapiro.test(data$y)`

📊 One-Sample t-test

What to Check

The variable itself (comparing to a known value)

R Code

# Quick normality check
shapiro.test(data$variable)

# Visual inspection
hist(data$variable, 
     main = "Histogram", 
     col = "lightblue",
     xlab = "Values")

qqnorm(data$variable)
qqline(data$variable, col = "red")

# If normal: proceed with t-test
t.test(data$variable, mu = test_value)

# If NOT normal: use Wilcoxon
wilcox.test(data$variable, mu = test_value)

📊 Independent Samples t-test

What to Check

Each group separately

R Code

# Check normality by group
library(dplyr)

# Visual by group
boxplot(outcome ~ group, data = data,
        main = "Distribution by Group",
        col = c("lightblue", "lightcoral"))

# Histogram by group
par(mfrow = c(1, 2))
hist(data$outcome[data$group == "Group1"], 
     main = "Group 1", col = "lightblue")
hist(data$outcome[data$group == "Group2"], 
     main = "Group 2", col = "lightcoral")
par(mfrow = c(1, 1))

# Statistical test by group
tapply(data$outcome, data$group, shapiro.test)

# If normal: proceed with t-test
t.test(outcome ~ group, data = data)

# If NOT normal: use Mann-Whitney U
wilcox.test(outcome ~ group, data = data)

📊 Paired t-test

What to Check

The difference scores (not the raw measurements!)

R Code

# Create difference scores
data$diff <- data$after - data$before

# Check normality of differences
shapiro.test(data$diff)

# Visual inspection of differences
hist(data$diff, 
     main = "Distribution of Differences",
     col = "lightgreen",
     xlab = "After - Before")

qqnorm(data$diff)
qqline(data$diff, col = "red")

# If normal: proceed with paired t-test
t.test(data$after, data$before, paired = TRUE)

# If NOT normal: use Wilcoxon Signed-Rank
wilcox.test(data$after, data$before, paired = TRUE)

📊 ANOVA

What to Check

Model residuals (NOT the raw data!)

R Code

# First, run your ANOVA model
model <- aov(outcome ~ group, data = data)

# THEN check residuals
shapiro.test(residuals(model))

# Visual inspection of residuals
hist(residuals(model), 
     main = "Residuals Distribution",
     col = "lightyellow",
     xlab = "Residuals")

qqnorm(residuals(model))
qqline(residuals(model), col = "red")

# Standard diagnostic plots
plot(model, which = 2)  # Q-Q plot

# If normal: interpret ANOVA results
summary(model)

# If NOT normal: use Kruskal-Wallis
kruskal.test(outcome ~ group, data = data)

📊 Regression

What to Check

Model residuals (NOT X or Y individually!)

R Code

# First, run your regression model
model <- lm(outcome ~ predictor, data = data)

# THEN check residuals
shapiro.test(residuals(model))

# Comprehensive diagnostic plots
par(mfrow = c(2, 2))
plot(model)
par(mfrow = c(1, 1))

# Q-Q plot specifically
qqnorm(residuals(model))
qqline(residuals(model), col = "red")

# Histogram of residuals
hist(residuals(model), 
     main = "Residuals Distribution",
     col = "lightblue",
     xlab = "Residuals")

# If residuals normal: interpret results
summary(model)

# If NOT normal: consider transformation or robust methods

🔍 Interpreting Visual Methods

Histogram

Look for: - Bell-shaped curve = Good - Skewed (long tail) = Concern - Multiple peaks = Red flag

Normal

Symmetric, single peak, bell-shaped

Non-Normal

Strongly skewed, flat, or multiple peaks

Q-Q Plot (Quantile-Quantile)

How to read: - Points fall on red line = Normal - Points deviate systematically = Non-normal

Common patterns:

S-curve → Right-skewed
Backwards S → Left-skewed
Curve at ends → Heavy tails
Few outliers at ends → Usually okay

Pro Tip

A few points off the line at the extremes is usually fine. Look for systematic, large deviations.

Boxplot

Look for: - Symmetric box = Good - Many outliers = Concern - Extremely long whiskers = Concern

📈 Interpreting Shapiro-Wilk Test

The Test

shapiro.test(data$variable)
# or
shapiro.test(residuals(model))

Interpreting Results

Shapiro-Wilk normality test

data:  data$variable
W = 0.96, p-value = 0.234

Decision:

p-value	Interpretation	Action
p > .05	Data are consistent with normality	✓ Proceed with parametric test
p < .05	Significant deviation from normality	⚠ Consider alternatives
p < .01	Strong evidence of non-normality	⚠ Use non-parametric test

Sample Size Matters

Small samples (n < 30): Test has low power, may miss violations
Large samples (n > 100): Test may flag trivial violations

Always use visual methods + statistical test together!

🎯 Complete Workflow Template

For Model-Based Tests (ANOVA, Regression)

# Step 1: Run your model
model <- aov(outcome ~ predictor, data = data)

# Step 2: Extract residuals
res <- residuals(model)

# Step 3: Visual check
par(mfrow = c(1, 2))
hist(res, main = "Residuals", col = "lightblue")
qqnorm(res); qqline(res, col = "red")
par(mfrow = c(1, 1))

# Step 4: Statistical test
shapiro.test(res)

# Step 5: Decision
# If p > .05 and visuals look good → Proceed
# If p < .05 or bad visuals → Consider alternatives

For Simple Tests (t-tests)

# Step 1: Identify what to check
# - One-sample: the variable
# - Independent: each group
# - Paired: difference scores

# Step 2: Visual check
hist(data$variable, col = "lightblue")
qqnorm(data$variable); qqline(data$variable, col = "red")

# Step 3: Statistical test
shapiro.test(data$variable)

# Step 4: Decision
# Normal → Parametric test
# Not normal → Non-parametric alternative

⚠️ Common Mistakes

Don't Do This

❌ Checking raw data for ANOVA/regression (check residuals!)
❌ Running analysis without checking assumptions
❌ Ignoring visual methods (only using Shapiro-Wilk)
❌ Panicking over minor violations with large samples
❌ Using parametric tests with clearly non-normal small samples

Do This

✓ Check residuals for model-based tests
✓ Use visual + statistical methods together
✓ Consider sample size in your decision
✓ Know your non-parametric alternatives
✓ Document your assumption checks

📚 Next Steps

Assumptions failing? → When Assumptions Fail

Want interactive practice? → Normality Modules

Need conceptual review? → Understanding Normality

Other assumptions? → Other Assumptions

← Understanding Normality | Home | When Assumptions Fail →