When Data Aren't Normal - Fix It!
By the end of this module, you will:
You've diagnosed non-normality. Now what?
In previous modules, you learned to detect non-normal data. But what do you actually DO about it?
Three options:
Today, we'll explore transformationsβmathematical operations that can "fix" skewed data.
Imagine reaction times: 200ms, 400ms, 800ms, 1600ms
Original data:
After log transformation:
Result: The log transformation "pulls in" extreme values proportionally, creating symmetry!
| Type of Skew | Transformation | Formula | When to Use |
|---|---|---|---|
| Right Skew (long tail right) |
Log | log(x) | Reaction times, income, counts |
| Right Skew (mild) |
Square Root | βx | Count data, mild skew |
| Left Skew (long tail left) |
Square | xΒ² | Proportion data near 1 |
| Left Skew | Reflect + Log | log(max - x + 1) | When square doesn't work |
β οΈ Important: Log and square root require all values > 0. If you have zeros or negatives, add a constant first!
Let's generate some skewed data and practice transforming it!
Question 1: In your own words, explain WHY the log transformation works for right-skewed data. What does it do to large vs. small values?
Question 2: You applied transformations above. Which transformation worked best for the right-skewed data? How did you evaluate whether it worked?
Question 3: After transforming, your data are now normal. But your results are in "log(reaction time)" units. Why might this be hard to interpret? How could you address this?
1. Bimodal Distributions
Two peaks = two groups. Transformation won't fix thisβyou need to analyze groups separately or investigate why there are two distributions.
2. Categorical/Ordinal Data
Likert scales (1-5 ratings) shouldn't be transformed. Use non-parametric tests or treat as ordinal.
3. When You're "P-Hacking"
Don't try 10 different transformations until you get p > .05 in Shapiro-Wilk! Choose based on data type, not desired outcome.
4. When Non-Parametric is Easier
If transformation makes interpretation too complex, just use Mann-Whitney or Kruskal-Wallis.
Step 1: Check your diagnostic plots
Step 2: Consider your data type
Step 3: Apply transformation and re-check
Step 4: Report clearly
Scenario: You're analyzing time spent studying (in minutes) for 40 students:
Question 4A: Based on this information, what transformation would you try first? Why?
Question 4B: After applying log transformation, you get:
Write how you would report this in a Methods section:
Question 4C: Your analysis yields a mean of 4.20 on the log scale (SE = 0.15). How would you report this in original minutes?
Hint: exp(4.20) = 66.7 minutes
Mistake 1: Transforming without checking if it helped
β "I log-transformed the data" (but didn't verify it worked)
β "Log transformation improved normality (W = 0.89 β 0.96, p = 0.002 β 0.22)"
Mistake 2: Trying transformations until p > .05
β P-hacking: Testing 5 transformations to find one that works
β Choose based on data characteristics, not desired p-value
Mistake 3: Forgetting about zeros/negatives
β log(0) = error!
β Add constant: log(x + 1) if you have zeros
Mistake 4: Not considering interpretability
β Running complex analysis on transformed data without explaining results
β Back-transform key results for interpretation
Mistake 5: Transforming categorical data
β log(Likert scale 1-5)
β Use non-parametric tests for ordinal data
β Transformations change the scale to make data more normal
Log, square root, and square transformations can fix many skewness problems.
β Choose based on data type and skew direction
β Always verify the transformation worked
Re-check histograms, Q-Q plots, and Shapiro-Wilk after transforming.
β Report clearly and consider interpretation
Tell readers what you did and back-transform results when appropriate.
β When in doubt, use non-parametric
If transformation is too complex or doesn't help, non-parametric tests are a great alternative!
In Module 5, you'll learn:
module4_lastname1_lastname2.pdfπ You've mastered data transformations! π
Next: Module 5 will cover non-parametric alternatives when transformation isn't the answer!