| Assumption | What It Means | How to Check |
|---|---|---|
| 1. Linearity | The relationship between X and Y is linear | Residual plot (should show no pattern) |
| 2. Independence | Observations are independent of each other | Study design (can't test with plots) |
| 3. Normality | Residuals are normally distributed | Q-Q plot, histogram of residuals |
| 4. Equal Variance | Variance of residuals is constant (homoscedasticity) | Residual plot (spread should be even) |
Residual = Observed Y - Predicted Y
Residuals are the errors in our predictions. Good regression means small, random residuals with no patterns.
See what residuals look like and how they're calculated.
What it means: The relationship between X and Y is actually linear (a straight line fits well).
How to check: Plot residuals vs. fitted values. Should see random scatter with no pattern.
Compare residual plots for linear vs. non-linear relationships.
What it means: Each observation is independent - knowing one observation doesn't tell you about another.
If yes to any, you may need special methods (time series analysis, multilevel models, repeated measures ANOVA).
What it means: The residuals follow a normal distribution.
How to check: Q-Q plot or histogram of residuals.
See what normal vs. non-normal residuals look like.
What it means: The spread of residuals is the same across all values of X.
How to check: Residual plot - spread should be consistent, not getting wider or narrower.
Compare residual plots with equal vs. unequal variance.
Outlier: An observation that doesn't fit the pattern (large residual)
Influential point: An observation that strongly affects the regression line
| Type | Description | Impact |
|---|---|---|
| Outlier in Y | Far from regression line vertically | Large residual, but may not affect slope much |
| Outlier in X | Extreme X value (leverage point) | Can strongly pull the regression line |
| Influential point | Both extreme X AND doesn't fit pattern | Changes slope and R² substantially |
See how different types of unusual points affect regression.
| Violation | Solutions |
|---|---|
| Non-linearity |
- Transform X or Y (log, square root) - Add polynomial terms (X²) - Use non-linear regression |
| Non-independence |
- Use multilevel/mixed models - Time series methods - Repeated measures ANOVA |
| Non-normal residuals |
- Transform Y - Use robust regression - Note: With large n, less critical |
| Unequal variance |
- Transform Y (often log) - Use weighted least squares - Use robust standard errors |
| Outliers/Influential |
- Investigate: data error or real? - Report with and without - Use robust regression - NEVER just delete without justification |
Make sure you can: