Module 4: Data Transformations

When Data Aren't Normal - Fix It!

☁️ Working Guidelines

0% Complete

🎯 Learning Objectives

By the end of this module, you will:

  1. Understand WHY transformations work to normalize data
  2. Learn which transformation to use for different types of skewness
  3. Apply transformations and evaluate their effectiveness
  4. Recognize when NOT to transform (and use non-parametric tests instead)

πŸ”§ The Problem

You've diagnosed non-normality. Now what?

In previous modules, you learned to detect non-normal data. But what do you actually DO about it?

Three options:

  1. Transform the data (this module!)
  2. Use non-parametric tests (Module 5)
  3. Accept the violation if n is large (Module 3)

Today, we'll explore transformationsβ€”mathematical operations that can "fix" skewed data.

Part 1: Why Do Transformations Work?

πŸ’‘ The Intuition

Imagine reaction times: 200ms, 400ms, 800ms, 1600ms

Original data:

After log transformation:

Result: The log transformation "pulls in" extreme values proportionally, creating symmetry!

Part 2: Types of Transformations

πŸ“š Transformation Toolkit

Type of Skew Transformation Formula When to Use
Right Skew
(long tail right)
Log log(x) Reaction times, income, counts
Right Skew
(mild)
Square Root √x Count data, mild skew
Left Skew
(long tail left)
Square xΒ² Proportion data near 1
Left Skew Reflect + Log log(max - x + 1) When square doesn't work

⚠️ Important: Log and square root require all values > 0. If you have zeros or negatives, add a constant first!

Part 3: Hands-On Transformation Lab

Let's generate some skewed data and practice transforming it!

πŸ“ Reflection Questions

Question 1: In your own words, explain WHY the log transformation works for right-skewed data. What does it do to large vs. small values?

Question 2: You applied transformations above. Which transformation worked best for the right-skewed data? How did you evaluate whether it worked?

Question 3: After transforming, your data are now normal. But your results are in "log(reaction time)" units. Why might this be hard to interpret? How could you address this?

Part 4: When NOT to Transform

🚫 Stop! Don't Transform These:

1. Bimodal Distributions

Two peaks = two groups. Transformation won't fix thisβ€”you need to analyze groups separately or investigate why there are two distributions.

2. Categorical/Ordinal Data

Likert scales (1-5 ratings) shouldn't be transformed. Use non-parametric tests or treat as ordinal.

3. When You're "P-Hacking"

Don't try 10 different transformations until you get p > .05 in Shapiro-Wilk! Choose based on data type, not desired outcome.

4. When Non-Parametric is Easier

If transformation makes interpretation too complex, just use Mann-Whitney or Kruskal-Wallis.

Part 5: Decision Flowchart

πŸ—ΊοΈ When to Transform: Decision Guide

Step 1: Check your diagnostic plots

Step 2: Consider your data type

Step 3: Apply transformation and re-check

Step 4: Report clearly

Part 6: Practice Scenario

Scenario: You're analyzing time spent studying (in minutes) for 40 students:

Question 4A: Based on this information, what transformation would you try first? Why?

Question 4B: After applying log transformation, you get:

Write how you would report this in a Methods section:

Question 4C: Your analysis yields a mean of 4.20 on the log scale (SE = 0.15). How would you report this in original minutes?

Hint: exp(4.20) = 66.7 minutes

Part 7: Common Mistakes

⚠️ Don't Make These Errors:

Mistake 1: Transforming without checking if it helped

❌ "I log-transformed the data" (but didn't verify it worked)
βœ“ "Log transformation improved normality (W = 0.89 β†’ 0.96, p = 0.002 β†’ 0.22)"

Mistake 2: Trying transformations until p > .05

❌ P-hacking: Testing 5 transformations to find one that works
βœ“ Choose based on data characteristics, not desired p-value

Mistake 3: Forgetting about zeros/negatives

❌ log(0) = error!
βœ“ Add constant: log(x + 1) if you have zeros

Mistake 4: Not considering interpretability

❌ Running complex analysis on transformed data without explaining results
βœ“ Back-transform key results for interpretation

Mistake 5: Transforming categorical data

❌ log(Likert scale 1-5)
βœ“ Use non-parametric tests for ordinal data

🎯 Key Takeaways

What You Should Remember:

βœ“ Transformations change the scale to make data more normal

Log, square root, and square transformations can fix many skewness problems.

βœ“ Choose based on data type and skew direction

βœ“ Always verify the transformation worked

Re-check histograms, Q-Q plots, and Shapiro-Wilk after transforming.

βœ“ Report clearly and consider interpretation

Tell readers what you did and back-transform results when appropriate.

βœ“ When in doubt, use non-parametric

If transformation is too complex or doesn't help, non-parametric tests are a great alternative!

πŸ“š Looking Ahead

In Module 5, you'll learn:

πŸ“‹ Before You Submit

βœ… Submission Checklist

πŸ“€ How to Submit

  1. Click "Save Progress"
  2. Print: Ctrl+P (Windows) or Cmd+P (Mac)
  3. Choose "Save as PDF"
  4. Save as: module4_lastname1_lastname2.pdf
  5. Upload to your course site

πŸŽ‰ You've mastered data transformations! πŸŽ‰

Next: Module 5 will cover non-parametric alternatives when transformation isn't the answer!