Session 12 — Chi-Square Test

Decision Making Statistics — S04

Author

M. Kachour

Published

June 24, 2026

This session presents the Chi-square test of independence, used to study whether two characteristics are linked.

1 Introduction

2 Chi-square test formulation

2.1 Setup

Variable \(X\) has \(p\) modalities: \(A_1,A_2,\ldots,A_p\)
Variable \(Y\) has \(q\) modalities: \(B_1,B_2,\ldots,B_q\)
A sample of size \(n\) is organized into a contingency table of dimension \(p \times q\)

2.2 Hypotheses

Hypotheses

\[ H_0: X \text{ and } Y \text{ are independent} \qquad \text{vs.} \qquad H_1: X \text{ and } Y \text{ are dependent} \]

2.3 Notations for the contingency table

\(n_{i,j}\): number of individuals simultaneously in category \(A_i\) and category \(B_j\),
\(n_{i,.}=\sum_{j=1}^{q} n_{i,j}\): row total of row \(i\),
\(n_{.,j}=\sum_{i=1}^{p} n_{i,j}\): column total of column \(j\),
\(n=\sum_{i=1}^{p}\sum_{j=1}^{q} n_{i,j}\): grand total.

3 Expected counts and test statistic

3.1 Expected counts under independence

Expected frequencies

If \(H_0\) is true, the theoretical count in cell \((i,j)\) is:

\[ E_{i,j}=\frac{n_{i,.}\times n_{.,j}}{n} \]

Validity condition

The standard Chi-square test is considered valid when all expected counts satisfy:

\[ E_{i,j} \geq 5 \]

If this condition is not met, some categories may need to be grouped.

3.2 Test statistic

Computed Chi-square

\[ U_{obs}=\sum_{i=1}^{p}\sum_{j=1}^{q}\frac{(n_{i,j}-E_{i,j})^2}{E_{i,j}} \]

Under \(H_0\), the statistic follows approximately a Chi-square distribution with:

\[ \nu=(p-1)(q-1) \]

degrees of freedom.

3.3 Decision rule

Right-tailed Chi-square test

Reject \(H_0\) at risk \(\alpha\) if:

\[ U_{obs}>k \]

where \(k=\chi^2_{\alpha}(\nu)\) is the critical value read in the Chi-square table.

Exam tip

In a Chi-square exercise, always proceed in this order:

identify the two variables and their numbers of categories,
compute row totals, column totals, and the grand total,
compute the expected counts,
check the validity condition,
compute \(U_{obs}\) and compare it with the critical value.

4 Conclusion

\(H_0\) rejected: the data confirm a link between the two variables.
\(H_0\) not rejected: the data do not confirm a link at the chosen risk level.

5 Application exercise

5.1 Coffee consumption and marital status

Exercise

A marketing company selected a random sample of housewives (women under 50, assumed to be the main shopper in the household) to study the link between marital status and weekly coffee consumption.

	Less than 1 cup/day	1–2 cups/day	2–3 cups/day	More than 3 cups/day
Single	30	40	50	20
Married	50	60	80	30
Other	20	30	40	15

Can we say, with a \(5\%\) risk, that there is a link between coffee consumption level and marital status?

Solution

Step 1 — Formulation

\(X\) = marital status, so \(p=3\)
\(Y\) = coffee consumption level, so \(q=4\)
\(H_0\): independence
\(H_1\): dependence

Step 2 — Marginal totals

Row totals:

\[ 140,\quad 220,\quad 105 \]

Column totals:

\[ 100,\quad 130,\quad 170,\quad 65 \]

Grand total:

\[ n=465 \]

Step 3 — Expected counts

Using

\[ E_{i,j}=\frac{n_{i,.}n_{.,j}}{n} \]

we obtain approximately:

	Less than 1	1–2	2–3	More than 3
Single	30.108	39.140	51.183	19.570
Married	47.312	61.505	80.430	30.753
Other	22.581	29.355	38.387	14.677

All expected counts are greater than \(5\), so the test is valid.

Step 4 — Computed statistic

\[ U_{obs}=\sum \frac{(n_{i,j}-E_{i,j})^2}{E_{i,j}} \approx 0.650 \]

Step 5 — Critical value

The number of degrees of freedom is:

\[ \nu=(3-1)(4-1)=6 \]

At the \(5\%\) level:

\[ k=\chi^2_{5\%}(6)=12.592 \]

Since:

\[ 0.650<12.592 \]

we do not reject \(H_0\).

Conclusion: with the counts given in the table above, the data do not confirm a link between coffee consumption level and marital status at the \(5\%\) risk level.

5.2 Exercise 2 — Purchase frequency by age group

Exercise

A retail chain surveys a random sample of \(300\) loyalty-card holders to study the link between age group and purchase frequency:

	Never	Occasionally	Regularly
Under 35	30	50	70
35 and over	45	60	45

Can we say, with a \(5\%\) risk, that there is a link between age group and purchase frequency? And at a \(1\%\) risk?

Solution

Step 1 — Formulation

\(X\) = age group, so \(p=2\)
\(Y\) = purchase frequency, so \(q=3\)
\(H_0\): independence
\(H_1\): dependence
\(\nu=(2-1)(3-1)=2\)

Step 2 — Marginal totals

Row totals: \(150\), \(150\). Column totals: \(75\), \(110\), \(115\). Grand total: \(n=300\).

Step 3 — Expected counts

Since both row totals equal \(150\), each expected count equals \(n_{i,.}\times n_{.,j}/300 = 150\times n_{.,j}/300 = n_{.,j}/2\):

	Never	Occasionally	Regularly
Under 35	37.5	55.0	57.5
35 and over	37.5	55.0	57.5

All expected counts are \(\geq 5\) ✓.

Step 4 — Computed statistic

\[ U_{obs}=\frac{(30-37.5)^2}{37.5}+\frac{(50-55)^2}{55}+\frac{(70-57.5)^2}{57.5}+\frac{(45-37.5)^2}{37.5}+\frac{(60-55)^2}{55}+\frac{(45-57.5)^2}{57.5} \]

\[ =1.500+0.455+2.717+1.500+0.455+2.717\approx 9.344 \]

Step 5 — Critical values

\[ \chi^2_{5\%}(2)=5.991 \qquad \text{and} \qquad \chi^2_{1\%}(2)=9.210 \]

Since \(9.344>5.991\) and \(9.344>9.210\), we reject \(H_0\) at both risk levels.

Conclusion: with both \(5\%\) and \(1\%\) risk, the data confirm a link between age group and purchase frequency. Customers under 35 tend to purchase more regularly, whereas older customers purchase less frequently.

5.3 Exercise 3 — Education level and job satisfaction

Exercise

An HR consultancy surveys \(245\) randomly selected employees to study the link between education level and job satisfaction:

	Low satisfaction	Medium satisfaction	High satisfaction
No degree	25	30	10
Bachelor	15	45	40
Master or above	5	20	55

Can we say, with a \(5\%\) risk, that education level and job satisfaction are linked?

Solution

Step 1 — Formulation

\(X\) = education level (\(p=3\)), \(Y\) = satisfaction level (\(q=3\))
\(H_0\): independence, \(H_1\): dependence
\(\nu=(3-1)(3-1)=4\)

Step 2 — Marginal totals

Row totals: \(65\), \(100\), \(80\). Column totals: \(45\), \(95\), \(105\). Grand total: \(n=245\).

Step 3 — Expected counts

\[ E_{i,j}=\frac{n_{i,.}\times n_{.,j}}{245} \]

	Low	Medium	High
No degree	11.94	25.20	27.86
Bachelor	18.37	38.78	42.86
Master+	14.69	31.02	34.29

All expected counts are \(\geq 5\) ✓.

Step 4 — Computed statistic

\[ U_{obs}=\frac{(25-11.94)^2}{11.94}+\frac{(30-25.20)^2}{25.20}+\frac{(10-27.86)^2}{27.86} +\frac{(15-18.37)^2}{18.37}+\frac{(45-38.78)^2}{38.78}+\frac{(40-42.86)^2}{42.86} \]

\[ +\frac{(5-14.69)^2}{14.69}+\frac{(20-31.02)^2}{31.02}+\frac{(55-34.29)^2}{34.29} \]

\[ \approx 14.29+0.91+11.45+0.62+1.00+0.19+6.40+3.91+12.51\approx 51.28 \]

Step 5 — Critical value

\[ \chi^2_{5\%}(4)=9.488 \]

Since \(51.28>9.488\), we reject \(H_0\).

Conclusion: with a \(5\%\) risk, the data confirm a strong link between education level and job satisfaction. Higher-educated employees report noticeably higher satisfaction levels.

1 Introduction

1.1 Typical business and social science questions

2 Chi-square test formulation

2.1 Setup

2.2 Hypotheses

2.3 Notations for the contingency table

3 Expected counts and test statistic

3.1 Expected counts under independence

3.2 Test statistic

3.3 Decision rule

4 Conclusion

5 Application exercise

5.1 Coffee consumption and marital status

5.2 Exercise 2 — Purchase frequency by age group

5.3 Exercise 3 — Education level and job satisfaction