🎲 Module 1: Why Chi-Square Matters

Understanding Categorical Data Analysis

📚 Learning Objectives

By the end of this module, you will be able to:

Distinguish between categorical and continuous variables
Identify research questions appropriate for Chi-square analysis
Understand the difference between goodness-of-fit and test of independence
Recognize when Chi-square is the right statistical test to use

🔍 Understanding Variable Types: The Foundation

Before we can understand Chi-square, we need to be crystal clear about what kind of data we're working with.

Categorical Variables (Also Called: Nominal, Discrete, Qualitative)

Categorical variables represent groups, categories, or classifications. They answer "what kind?" or "which category?"

Examples:

Species (rat, mouse, human)
Sex (male, female)
Treatment group (control, experimental)
Diagnostic category (anxious, depressed, control)
Color preference (red, blue, green)
Survival (alive, dead)
Correct/Incorrect response

Continuous Variables (Also Called: Quantitative, Measurement)

Continuous variables represent measurements or counts. They answer "how much?" or "how many?"

Examples:

Reaction time (milliseconds)
Body weight (grams)
Test scores (0-100)
Heart rate (beats per minute)
Number of lever presses
Time to complete task (seconds)
Depression score (continuous scale)

💡 Key Insight: Why This Matters

Different variable types require different statistical tests!

Continuous DV: Use t-test, ANOVA, or regression
Categorical DV: Use Chi-square, Fisher's exact, or logistic regression

Chi-square is specifically designed for categorical data. If you try to use it on continuous data, you're using the wrong tool!

🎯 Interactive Activity: Categorize These Variables

Drag each variable into the correct category. This skill is essential for choosing the right statistical test!

Variable Bank (Drag these):

Neuron type (pyramidal, interneuron)

Firing rate (spikes/second)

Handedness (left, right)

Age (years)

Treatment outcome (improved, no change, worse)

Anxiety score (0-50 scale)

Maze arm chosen (left, right, center)

Time to find platform (seconds)

📊 Categorical Variables

Drop categorical variables here

📈 Continuous Variables

Drop continuous variables here

🎲 What Does Chi-Square Test?

Chi-square tests answer questions about frequencies and proportions in categorical data.

The core question Chi-square asks:

"Are the observed frequencies different from what we'd expect?"

Two Types of Chi-Square Tests:

1️⃣ Goodness-of-Fit Test

One categorical variable

Tests if observed distribution matches an expected distribution

Example questions:

Do rats choose maze arms equally?
Are births evenly distributed across weekdays?
Does a die land fairly on all sides?

2️⃣ Test of Independence

Two categorical variables

Tests if two variables are related or independent

Example questions:

Is treatment response related to sex?
Does species affect habitat preference?
Is diagnosis associated with treatment type?

📊 Visual Demo: Expected vs. Observed Frequencies

Let's see how Chi-square works with a simple example:

Research Question: Do rats have a left/right preference?

We test 60 rats in a T-maze where they must choose left or right. If there's no preference (null hypothesis), we'd expect 30 to go left and 30 to go right.

Expected (No Preference)

Direction	Expected Count
Left	30
Right	30

Observed (Actual Data)

Direction	Observed Count
Left	42
Right	18

The Chi-square test calculates: How different is the observed pattern from what we expected?

Interpretation: The observed frequencies (42 left, 18 right) are quite different from expected (30, 30). Chi-square will tell us if this difference is statistically significant or just random variation.

✅ When Should You Use Chi-Square?

Use Chi-square when ALL of these are true:

✓ Your dependent variable is categorical (not continuous)
✓ You're working with frequency counts (not means or scores)
✓ Observations are independent (each subject counted once)
✓ You have adequate sample size (we'll cover this in Module 4)

DO NOT use Chi-square when:

✗ Your DV is continuous (use t-test, ANOVA, regression instead)
✗ You want to compare means (use t-test or ANOVA)
✗ You have repeated measures on same subjects (use McNemar's test)
✗ Expected frequencies are too small (use Fisher's exact test - Module 4)

🎯 Interactive Activity: Choose the Right Test

For each research question below, decide if Chi-square is appropriate or if you should use a different test.

🤔 Check Your Understanding

Question 1: A researcher measures reaction time (in milliseconds) for participants who drank coffee vs. no coffee. Should they use Chi-square?

A) Yes, because there are two groups (coffee vs. no coffee)

B) No, because reaction time is continuous, not categorical

C) Yes, because we're comparing groups

Correct! Even though we're comparing two groups, the dependent variable (reaction time) is continuous. This calls for a t-test, not Chi-square. Chi-square is ONLY for categorical dependent variables.

Question 2: A researcher categorizes 100 patients as "improved," "no change," or "worse" after treatment. They want to test if the distribution differs from equal proportions (33%, 33%, 33%). Which Chi-square test should they use?

A) Goodness-of-fit test (one categorical variable)

B) Test of independence (two categorical variables)

C) Neither - should use ANOVA

Correct! This is a goodness-of-fit test because we have ONE categorical variable (outcome: improved/no change/worse) and we're testing if the observed distribution matches an expected distribution (equal proportions).

Question 3: You want to know if males and females differ in their choice of three different habitats (forest, grassland, desert). Which test?

A) Goodness-of-fit test

B) Test of independence (Chi-square for two categorical variables)

C) Two-way ANOVA

Correct! This is a test of independence because you have TWO categorical variables (sex: male/female AND habitat: forest/grassland/desert) and you want to know if they're related.

📝 Module 1 Summary

Key Takeaways:

Chi-square is for categorical data - it tests frequencies, not means
Two types: Goodness-of-fit (1 variable) and Test of Independence (2 variables)
Core question: Are observed frequencies different from expected?
Always check: Is your DV categorical? If yes, Chi-square might be right!

Next up: Module 2 will teach you how to run Chi-square goodness-of-fit tests in R and interpret the results!