Module 4: Data Transformations

When Data Aren't Normal - Fix It!

☁️ Working Guidelines

⏱️ Estimated time: 60 minutes
👥 Work with your partner—experiment together!
💾 Your answers are saved automatically in your browser
📄 When finished, use Print (Ctrl+P/Cmd+P) and "Save as PDF" to submit

0% Complete

🎯 Learning Objectives

By the end of this module, you will:

Understand WHY transformations work to normalize data
Learn which transformation to use for different types of skewness
Apply transformations and evaluate their effectiveness
Recognize when NOT to transform (and use non-parametric tests instead)

🔧 The Problem

You've diagnosed non-normality. Now what?

In previous modules, you learned to detect non-normal data. But what do you actually DO about it?

Three options:

Transform the data (this module!)
Use non-parametric tests (Module 5)
Accept the violation if n is large (Module 3)

Today, we'll explore transformations—mathematical operations that can "fix" skewed data.

Part 1: Why Do Transformations Work?

💡 The Intuition

Imagine reaction times: 200ms, 400ms, 800ms, 1600ms

Original data:

Gaps between values: 200, 400, 800 (increasing!)
This creates right skew

After log transformation:

log(200) = 5.30, log(400) = 5.99, log(800) = 6.68, log(1600) = 7.38
Gaps: 0.69, 0.69, 0.70 (equal!)
Multiplicative relationships become additive

Result: The log transformation "pulls in" extreme values proportionally, creating symmetry!

Part 2: Types of Transformations

📚 Transformation Toolkit

Type of Skew	Transformation	Formula	When to Use
Right Skew (long tail right)	Log	log(x)	Reaction times, income, counts
Right Skew (mild)	Square Root	√x	Count data, mild skew
Left Skew (long tail left)	Square	x²	Proportion data near 1
Left Skew	Reflect + Log	log(max - x + 1)	When square doesn't work

⚠️ Important: Log and square root require all values > 0. If you have zeros or negatives, add a constant first!

Part 3: Hands-On Transformation Lab

Let's generate some skewed data and practice transforming it!

📝 Reflection Questions

Question 1: In your own words, explain WHY the log transformation works for right-skewed data. What does it do to large vs. small values?

Question 2: You applied transformations above. Which transformation worked best for the right-skewed data? How did you evaluate whether it worked?

Question 3: After transforming, your data are now normal. But your results are in "log(reaction time)" units. Why might this be hard to interpret? How could you address this?

Part 4: When NOT to Transform

🚫 Stop! Don't Transform These:

1. Bimodal Distributions

Two peaks = two groups. Transformation won't fix this—you need to analyze groups separately or investigate why there are two distributions.

2. Categorical/Ordinal Data

Likert scales (1-5 ratings) shouldn't be transformed. Use non-parametric tests or treat as ordinal.

3. When You're "P-Hacking"

Don't try 10 different transformations until you get p > .05 in Shapiro-Wilk! Choose based on data type, not desired outcome.

4. When Non-Parametric is Easier

If transformation makes interpretation too complex, just use Mann-Whitney or Kruskal-Wallis.

Part 5: Decision Flowchart

🗺️ When to Transform: Decision Guide

Step 1: Check your diagnostic plots

Right-skewed? → Try log or sqrt
Left-skewed? → Try square or reflect-then-log
Bimodal? → DON'T transform; investigate groups
Just outliers? → DON'T transform; investigate outliers

Step 2: Consider your data type

Reaction times / durations → Log almost always works
Counts (0, 1, 2, 3...) → Square root usually works
Proportions (0 to 1) → May need arcsine or logit
Income / prices → Log typically works

Step 3: Apply transformation and re-check

Make histogram and Q-Q plot of transformed data
Run Shapiro-Wilk on transformed data
If improved: proceed with analysis on transformed data
If not improved: try different transformation or use non-parametric

Step 4: Report clearly

"Data were log-transformed prior to analysis"
"Means are reported in original units after back-transformation"
Or: "Analysis performed on log-transformed values"

Part 6: Practice Scenario

Scenario: You're analyzing time spent studying (in minutes) for 40 students:

Histogram: Strongly right-skewed
Q-Q plot: Points curve upward at high end
Shapiro-Wilk: W = 0.89, p = 0.002
Mean = 85 min, Median = 60 min

Question 4A: Based on this information, what transformation would you try first? Why?

Question 4B: After applying log transformation, you get:

Histogram: Roughly bell-shaped
Q-Q plot: Points follow line closely
Shapiro-Wilk: W = 0.97, p = 0.35

Write how you would report this in a Methods section:

Question 4C: Your analysis yields a mean of 4.20 on the log scale (SE = 0.15). How would you report this in original minutes?

Hint: exp(4.20) = 66.7 minutes

Part 7: Common Mistakes

⚠️ Don't Make These Errors:

Mistake 1: Transforming without checking if it helped

❌ "I log-transformed the data" (but didn't verify it worked)
✓ "Log transformation improved normality (W = 0.89 → 0.96, p = 0.002 → 0.22)"

Mistake 2: Trying transformations until p > .05

❌ P-hacking: Testing 5 transformations to find one that works
✓ Choose based on data characteristics, not desired p-value

Mistake 3: Forgetting about zeros/negatives

❌ log(0) = error!
✓ Add constant: log(x + 1) if you have zeros

Mistake 4: Not considering interpretability

❌ Running complex analysis on transformed data without explaining results
✓ Back-transform key results for interpretation

Mistake 5: Transforming categorical data

❌ log(Likert scale 1-5)
✓ Use non-parametric tests for ordinal data

🎯 Key Takeaways

What You Should Remember:

✓ Transformations change the scale to make data more normal

Log, square root, and square transformations can fix many skewness problems.

✓ Choose based on data type and skew direction

Right skew → log or sqrt
Left skew → square or reflect-then-log
Bimodal → Don't transform!

✓ Always verify the transformation worked

Re-check histograms, Q-Q plots, and Shapiro-Wilk after transforming.

✓ Report clearly and consider interpretation

Tell readers what you did and back-transform results when appropriate.

✓ When in doubt, use non-parametric

If transformation is too complex or doesn't help, non-parametric tests are a great alternative!

📚 Looking Ahead

In Module 5, you'll learn:

When to skip transformation and use non-parametric tests
Mann-Whitney U test (alternative to t-test)
Kruskal-Wallis test (alternative to ANOVA)
Comparing parametric vs. non-parametric approaches

📋 Before You Submit

✅ Submission Checklist

Both partner names filled in
Ran log transformation demonstration
Experimented with transforming skewed data
Completed all reflection questions (Q1-Q4)
Viewed bimodal example

📤 How to Submit

Click "Save Progress"
Print: Ctrl+P (Windows) or Cmd+P (Mac)
Choose "Save as PDF"
Save as: module4_lastname1_lastname2.pdf
Upload to your course site

🎉 You've mastered data transformations! 🎉

Next: Module 5 will cover non-parametric alternatives when transformation isn't the answer!