Session 14 — Practical Work: ANOVA, Chi-Square & Correlation in Excel
Decision Making Statistics — S04
This session is a computer lab devoted to applying the advanced statistical tests from Sessions 11–13 in Excel.
1 Session goals
By the end of this practical session, you should be able to:
- perform and interpret an ANOVA test,
- perform and interpret a Chi-square test of independence,
- perform and interpret a test on the linear correlation coefficient,
- use Excel tools and functions to automate the decision process.
Review Sessions 11–13 and the appendices below before completing the Excel workbooks.
2 Exercise 1 — ANOVA (based on Session 11)
2.1 Context
The following table shows measurements of the height (in mm) of a fern, taken in several different environments in France on randomly sampled specimens.
| Env 1 | Env 2 | Env 3 | Env 4 | Env 5 |
|---|---|---|---|---|
| 112 | 141 | 156 | 187 | 241 |
| 115 | 146 | 167 | 105 | 264 |
| 112 | 135 | 143 | 179 | 225 |
| 118 | 147 | 178 | 123 | 257 |
| 124 | 154 | 145 | 114 | 248 |
| 132 | 169 | 258 | ||
| 131 | 136 | |||
| 115 |
A researcher wants to know whether the environment has an influence on plant size.
Mission 1.1 — After reading Session 11, Appendix 1, and using the Fisher table, complete Tables 1, 2, 3, and 4 of PW - Session 14 - Exercise1.xlsx.
Goal: perform and interpret an ANOVA test.
To go further:
- use
VLOOKUPto determine the critical value \(f\), - use
IFto automate the conclusion, - or use Data \(\rightarrow\) Data Analysis \(\rightarrow\) Anova: Single Factor.
Use the following workflow:
- compute the group sizes \(n_j\), group means \(\bar{x}_j\), and the overall mean \(\bar{x}\),
- compute \(SSD_{inter}\) and \(SSD_{intra}\),
- compute
\[ U_{obs}=\frac{AS_{inter}}{AS_{intra}} \]
- read the Fisher critical value with the appropriate degrees of freedom,
- conclude with an Excel formula such as
IF(Uobs>f,"Reject H0","Do not reject H0").
3 Exercise 2 — Chi-square test (based on Session 12)
3.1 Context
In the early 1970s, the association Prévention Routière wanted to encourage the French population to wear seat belts. To study the link between injury severity and seat belt use, a random sample of 10,779 drivers involved in an accident yielded:
| Nature of injury | Seat belt: YES | Seat belt: NO |
|---|---|---|
| Severe or fatal | 5 | 141 |
| Serious | 25 | 330 |
| No or little injury | 1,229 | 9,049 |
Can we say, with a \(5\%\) risk, that there is a link between seat belt use and the nature of the injury? In other words, is the seat belt effective?
Mission 2.1 — From the 0 - Data tab of PW - Session 14 - Exercise2.xlsx, create a pivot table to recreate the contingency table.
Mission 2.2 — Recreate the contingency table needed to perform the Chi-square test.
Mission 2.3 — Complete Tables 3, 4, 5, and 6 in the 2 - KHI-SQUARE Test sheet.
Goal: perform and interpret a Chi-square test.
To go further:
- use
VLOOKUPto determine the critical value \(k\), - use
IFto automate the decision, - or use
CHISQ.TESTandCHISQ.INVdirectly.
Method to follow:
- compute the observed contingency table,
- compute row totals, column totals, and the grand total,
- compute the expected counts with
\[ E_{i,j}=\frac{n_{i,.}\times n_{.,j}}{n} \]
- verify that all expected counts are at least \(5\),
- compute the Chi-square statistic,
- compare it with the critical value or use the p-value returned by
CHISQ.TEST.
4 Exercise 3 — Linear correlation test (based on Session 13)
4.1 Context
Let \(X\) be the net salary of women and \(Y\) the net salary of men in the same household. It is assumed that \((X,Y)\) follows a bivariate Normal distribution. The two variables are studied on \(15\) households.
At the \(5\%\) risk level, can we say that there is a link between salaries and gender?
Mission — After reading Session 13 and Appendix 1, complete Tables 2, 3, and 4 of PW - Session 14 - Exercise3.xlsx.
Goal: perform and interpret a test on the linear correlation coefficient.
To go further:
- use
IFto automate the conclusion, - use
TINV(\alpha, n-2)to determine \(t\), - here \(\alpha=5\%\) and \(n-2=13\).
Suggested method:
- compute \(\bar{x}\), \(\bar{y}\), \(s_x\), \(s_y\), and the covariance,
- compute the empirical correlation coefficient \(r\),
- calculate
\[ U_{obs}=r\sqrt{\frac{n-2}{1-r^2}} \]
- compare \(|U_{obs}|\) with the Student critical value,
- conclude with a sentence about the existence, sign, and strength of the linear relationship.
5 Appendix 1 — Standard Excel functions
| Function | Description |
|---|---|
SUM(range) |
Sum of values |
AVERAGE(range) |
Arithmetic mean |
COUNT(range) |
Count numeric entries |
IF(condition, val_true, val_false) |
Conditional logic |
6 Appendix 2 — VLOOKUP function
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Used to search for a value in the first column of a table and return a value from a specified column. A typical use here is to look up critical values in a statistical table.
Set the fourth argument to FALSE to request an exact match.
7 Appendix 3 — Pivot tables
A pivot table is a tool for summarizing and analyzing data quickly.
Access it through:
Insert \(\rightarrow\) Tables \(\rightarrow\) PivotTable
To create a contingency table from raw data, place one variable in the rows, the other in the columns, and the count in the values area.