T-TESTS: Definitive Workflow

Last Updated: November 2024 | Use this version for all t-test analyses

Quick Reference

Purpose: Compare means between groups or against a known value
When to use: Continuous outcome variable, approximately normal data
Alternative if assumptions fail: Mann-Whitney U or Wilcoxon signed-rank (Section 6)

Decision Tree

How many groups do you have?
├─ ONE group (compare to known value) → One-Sample t-test
├─ TWO independent groups → Independent Samples t-test  
└─ TWO measurements (same people) → Paired t-test

1. ONE-SAMPLE T-TEST

Question: Does my sample mean differ from a known population value?

Example: Do my students' IQ scores differ from the national average (100)?

R Code

# Check normality first
hist(data$score)
shapiro.test(data$score)  # If p > .05, data are normal enough

# Run test
t.test(data$score, mu = 100)  # Replace 100 with your comparison value

# Get descriptives
mean(data$score, na.rm = TRUE)
sd(data$score, na.rm = TRUE)

Interpreting Output

t-value: How many standard errors your mean is from the comparison value
p-value: If < .05, your mean significantly differs from the comparison value
95% CI: Range likely to contain the true mean; if it doesn't include your comparison value (e.g., 100), the result is significant

Reporting Template

"A one-sample t-test showed that the sample mean (M = [mean], SD = [sd]) was significantly [higher/lower] than [comparison value], t([df]) = [t-value], p = [p-value]."

Example: "A one-sample t-test showed that the sample mean (M = 105.3, SD = 12.4) was significantly higher than 100, t(49) = 3.02, p = .004."

2. INDEPENDENT SAMPLES T-TEST

Question: Do two separate groups have different means?

Example: Do students who drank coffee have faster reaction times than those who didn't?

R Code

# Prepare data
data$group <- as.factor(data$group)  # Make sure group is a factor

# Check normality for EACH group
tapply(data$score, data$group, shapiro.test)

# Visualize
boxplot(score ~ group, data = data, 
        xlab = "Group", ylab = "Score",
        col = c("lightblue", "lightcoral"))

# Run test (Welch's t-test is the default and safest)
t.test(score ~ group, data = data)

# Get descriptives by group
aggregate(score ~ group, data = data, FUN = mean)
aggregate(score ~ group, data = data, FUN = sd)
aggregate(score ~ group, data = data, FUN = length)  # Sample sizes

# Effect size (if significant)
library(effsize)
cohen.d(score ~ group, data = data)

Interpreting Output

t-value: Size of the difference between groups (in standard error units)
p-value: If < .05, groups differ significantly
95% CI: Range for the true difference between means; if it includes 0, not significant
Cohen's d: Effect size (0.2 = small, 0.5 = medium, 0.8 = large)

Reporting Template

"An independent samples t-test showed that Group A (M = [mean1], SD = [sd1]) scored significantly [higher/lower] than Group B (M = [mean2], SD = [sd2]), t([df]) = [t-value], p = [p-value], d = [effect size]."

Example: "An independent samples t-test showed that the coffee group (M = 245.3, SD = 18.2 ms) had significantly faster reaction times than the control group (M = 280.5, SD = 22.1 ms), t(38) = 5.43, p < .001, d = 1.72 (large effect)."

3. PAIRED T-TEST

Question: Did scores change from Time 1 to Time 2 for the same people?

Example: Did students' anxiety scores decrease after the intervention?

R Code

# Check normality of DIFFERENCES (not the raw scores!)
differences <- data$post - data$pre
hist(differences)
shapiro.test(differences)

# Visualize
boxplot(data$pre, data$post, 
        names = c("Pre", "Post"),
        ylab = "Score",
        col = c("lightblue", "lightgreen"))

# Run test
t.test(data$pre, data$post, paired = TRUE)

# Get descriptives
mean(data$pre, na.rm = TRUE)
sd(data$pre, na.rm = TRUE)
mean(data$post, na.rm = TRUE)
sd(data$post, na.rm = TRUE)

# Effect size
library(effsize)
cohen.d(data$pre, data$post, paired = TRUE)

Interpreting Output

t-value: How much scores changed (in standard error units)
p-value: If < .05, there was a significant change
95% CI: Range for the true mean difference; if it doesn't include 0, significant
Cohen's d: Effect size of the change

Reporting Template

"A paired samples t-test showed that scores significantly [increased/decreased] from pre-test (M = [mean1], SD = [sd1]) to post-test (M = [mean2], SD = [sd2]), t([df]) = [t-value], p = [p-value], d = [effect size]."

Example: "A paired samples t-test showed that anxiety significantly decreased from pre-intervention (M = 42.3, SD = 8.1) to post-intervention (M = 28.7, SD = 6.4), t(29) = 8.21, p < .001, d = 1.85 (large effect)."

Assumptions Check

Normality

Check: Shapiro-Wilk test + histogram or Q-Q plot
If violated: Use non-parametric alternative (see Section 6)
Note: For independent and paired t-tests with n > 30 per group, t-tests are robust to moderate violations

Independence

Each observation should be independent
For paired tests, pairs should be independent of other pairs

Homogeneity of Variance (Independent t-test only)

Don't worry about this - R's default Welch's t-test handles unequal variances
If you want to check anyway: var.test(score ~ group, data = data)

Common Mistakes & Solutions

❌ Mistake: Forgetting to make group variable a factor ✅ Solution: Always use data$group <- as.factor(data$group) first

❌ Mistake: Using independent t-test for repeated measures data ✅ Solution: Same people measured twice = paired t-test

❌ Mistake: For paired test, checking normality of raw scores instead of differences ✅ Solution: Create difference scores first, then check normality

❌ Mistake: Running t-test with 3+ groups ✅ Solution: Use ANOVA instead (Section 5.2)

❌ Mistake: Reporting "t = 2.45, p = .018" without context ✅ Solution: Always include means, SDs, and interpretation

Quick Troubleshooting

Error: "grouping factor must have exactly 2 levels" → Check: table(data$group) - you might have 3+ groups or typos in group names

Error: "not enough observations" → Check for NAs: sum(is.na(data$score)) - add na.rm = TRUE to calculations

p-value exactly = 1.000 → Your groups have identical means - this is unusual, check your data

Very small p-value (e.g., 2.2e-16) → Report as "p < .001" in your write-up

Effect Size Reference

Cohen's d	Interpretation
0.0 - 0.2	Negligible
0.2 - 0.5	Small
0.5 - 0.8	Medium
0.8+	Large

Assumptions failing? → See Section 4: Checking Normality
Need non-parametric alternative? → See Section 6: Non-Parametric Tests
3+ groups? → See Section 5.2: ANOVA
Continuous predictor? → See Section 5.3: Regression

See Section 10 (Archives) for: - Excel-based t-test instructions - Alternative workflow formats - Historical versions

T-TESTS: Definitive Workflow

Quick Reference

Decision Tree

1. ONE-SAMPLE T-TEST

R Code

Interpreting Output

Reporting Template

2. INDEPENDENT SAMPLES T-TEST

R Code

Interpreting Output

Reporting Template

3. PAIRED T-TEST

R Code

Interpreting Output

Reporting Template

Assumptions Check

Normality

Independence

Homogeneity of Variance (Independent t-test only)

Common Mistakes & Solutions

Quick Troubleshooting

Effect Size Reference

Related Sections