Session 7 — Estimation & Confidence Intervals

Decision Making Statistics — S04

Author

M. Kachour

Published

June 24, 2026

This session introduces the basic tools of inferential statistics: estimation and confidence intervals.

1 Overview

1.1 Modeling framework

1.1.1 Problem 1 — Average life of an electronic circuit

The quality department of factory U is interested in the average life of electronic circuit CE110.

Modeling elements

Population: all CE110 electronic circuits manufactured and marketed by factory U
Variable studied: the lifetime of a CE110 circuit
Type of variable: quantitative continuous
Modeling assumption: the studied variable, noted \(X\), follows a distribution \(\mathcal{L}\)
Unknown parameter: \(\mu\), the mean lifetime

1.1.2 Problem 2 — Defective rate of a machine

Factory U is interested in the rate of defective parts produced by machine M.

Modeling elements

Variable studied: \(X=1\) if the part is defective, \(X=0\) otherwise
Unknown parameter: \(p\), the proportion of defective parts

1.2 What do we know about the law?

The distribution \(\mathcal{L}\) may be:

totally unknown, or
partially unknown: we know the family of laws but not the value of its parameters.

2 Inferential statistics

Definition

Inferential statistics is a set of methods that makes it possible to formulate, in probabilistic terms, a judgment about the characteristics of a population from the observations made on a sample.

Important remark

When moving from a sample to a population, we take a risk of error. Inferential statistics does not remove uncertainty; it manages it.

3 Sampling and empirical estimators

Suppose we observe a random sample of size \(n\):

\[ x_1, x_2, \dots, x_n \]

Empirical estimators

Empirical mean:

\[ \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i \]

Empirical variance:

\[ s^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2 \]

Empirical proportion:

\[ \hat{p} = \frac{\text{number with the property}}{n} \]

4 Central Limit Theorem

Key result

For large samples (typically \(n \geq 30\)):

\[ \bar{X} \approx \mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right) \]

which is equivalent to

\[ \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \approx \mathcal{N}(0,1). \]

5 Confidence interval for the mean

For a large sample and confidence level \(1-\alpha\):

Mean confidence interval

\[ CI_{\mu}=\left[\bar{x}-z_{\alpha/2}\frac{s}{\sqrt{n}},\;\bar{x}+z_{\alpha/2}\frac{s}{\sqrt{n}}\right] \]

where \(z_{\alpha/2}\) satisfies

\[ P(Z\leq z_{\alpha/2}) = 1-\frac{\alpha}{2}, \qquad Z\sim \mathcal{N}(0,1). \]

Confidence level	\(\alpha\)	\(z_{\alpha/2}\)
90%	10%	1.645
95%	5%	1.960
99%	1%	2.576

Interpretation

With confidence level \((1-\alpha)\times 100\%\), we say that the interval is compatible with the unknown mean \(\mu\).

6 Confidence interval for a proportion

For a large sample such that \(n\hat{p}\geq 5\) and \(n(1-\hat{p})\geq 5\):

Proportion confidence interval

\[ CI_p=\left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}},\;\hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right] \]

Exam tip

A higher confidence level means a wider confidence interval. Greater security comes with less precision.

7 Exercises

7.1 Exercise 1 — Number of calls

Exercise

The table below represents the number of calls received between 12:00 noon and 2:00 p.m. by a service department, observed over 200 randomly selected days.

Number of calls	0	1	2	3	4	5	6	7	8
Number of days	4	11	26	45	52	39	15	5	3

Compute the confidence interval of the average number of calls received for \(\alpha=10\%\), \(5\%\), and \(1\%\). Comment.
Let \(p\) be the probability that the number of calls exceeds 6. Compute the confidence interval of \(p\) for \(\alpha=10\%\), \(5\%\), and \(1\%\). Comment.

Solution

From the table,

\[ n=200, \qquad \bar{x}=3.75, \qquad s\approx 1.568. \]

For the mean:

90% confidence level (\(\alpha=10\%\)):

\[ [3.568,\;3.932] \]

95% confidence level (\(\alpha=5\%\)):

\[ [3.533,\;3.967] \]

99% confidence level (\(\alpha=1\%\)):

\[ [3.464,\;4.036] \]

Comment: the interval becomes wider when the confidence level increases.

For \(p=P(X>6)\), there are \(5+3=8\) such days, so

\[ \hat{p}=\frac{8}{200}=0.04. \]

The confidence intervals are approximately:

90%: \([0.017,\;0.063]\)
95%: \([0.013,\;0.067]\)
99%: \([0.004,\;0.076]\)

Comment: the probability that the number of calls exceeds 6 is small, around 4%.

7.2 Exercise 2 — Amount of taxes

Exercise

The table below represents the tax amount in euros of 300 randomly selected taxpayers.

Tax amount (€)	[600, 900[	[900, 1200[	[1200, 1500[	[1500, 1800[	[1800, 2100[
Number of taxpayers	18	60	90	87	45

Compute the confidence interval of the average amount paid (in taxes) for \(\alpha=7\%\), \(5\%\), and \(1\%\). Comment.
Let \(p\) be the rate of taxpayers who pay less than 1400€. Compute the confidence interval of \(p\) for \(\alpha=10\%\), \(4\%\), and \(1\%\). Comment.

Solution

Using class midpoints \(750, 1050, 1350, 1650, 1950\):

\[ n=300, \qquad \bar{x}=1431, \qquad s\approx 336.361. \]

Confidence intervals for the mean are approximately:

for \(\alpha=7\%\):

\[ [1395.813,\;1466.187] \]

for \(\alpha=5\%\):

\[ [1392.938,\;1469.062] \]

for \(\alpha=1\%\):

\[ [1380.978,\;1481.022] \]

Comment: the mean tax amount is centered around €1431, with moderate sampling uncertainty.

To estimate \(p=P(X<1400)\), we approximate the class \([1200,1500[\) uniformly. Since 1400 is two-thirds of the way through the class,

\[ \text{count below 1400} \approx 18+60+\frac{200}{300}\times 90 = 138. \]

Hence

\[ \hat{p}=\frac{138}{300}=0.46. \]

Approximate confidence intervals:

for \(\alpha=10\%\): \([0.413,\;0.507]\)
for \(\alpha=4\%\): \([0.401,\;0.519]\)
for \(\alpha=1\%\): \([0.386,\;0.534]\)

Comment: the proportion of taxpayers paying less than €1400 is close to 46%, but the answer is approximate because we interpolate inside a class.

7.3 Exercise 3 — Monthly invoice amounts

Exercise

The finance department of a company randomly selected \(80\) invoices from last quarter. The average invoice amount is \(\bar{x} = 245\)€ with standard deviation \(s = 48\)€. Among these invoices, \(12\) exceed \(300\)€.

Compute the confidence interval for the average invoice amount at confidence levels \(95\%\) and \(99\%\). Comment.
Let \(p\) be the proportion of invoices exceeding \(300\)€. Compute the confidence interval for \(p\) at the \(95\%\) level. Comment.

Solution

We have \(n=80\), \(\bar{x}=245\)€, and \(s=48\)€.

For the mean, the half-width is:

\[ z_{\alpha/2}\frac{s}{\sqrt{n}} = z_{\alpha/2}\times\frac{48}{\sqrt{80}} = z_{\alpha/2}\times 5.367. \]

95% confidence level (\(z_{0.025}=1.96\)):

\[ CI_\mu = [245 - 1.96\times 5.367;\; 245 + 1.96\times 5.367] \approx [234.5;\; 255.5]. \]

99% confidence level (\(z_{0.005}=2.576\)):

\[ CI_\mu = [245 - 2.576\times 5.367;\; 245 + 2.576\times 5.367] \approx [231.2;\; 258.8]. \]

Comment: the 99% interval is wider; the added security comes at the cost of precision.

For the proportion, \(\hat{p}=12/80=0.15\). Validity check: \(n\hat{p}=12\geq 5\) and \(n(1-\hat{p})=68\geq 5\) ✓.

\[ CI_p = \left[0.15\pm 1.96\sqrt{\frac{0.15\times 0.85}{80}}\right] = [0.15\pm 0.078] \approx [0.072;\; 0.228]. \]

Comment: the proportion of high-value invoices is estimated between roughly \(7\%\) and \(23\%\); the interval is wide because the event is moderately rare and the sample is not very large.

7.4 Exercise 4 — Employee satisfaction survey

Exercise

A firm surveyed \(150\) randomly selected employees. \(87\) declared that they were satisfied with the remote-work policy.

Compute the confidence interval for the proportion of satisfied employees at confidence levels \(90\%\), \(95\%\), and \(99\%\). Comment.
The HR director wants a \(95\%\) confidence interval with a width strictly less than \(0.10\). What minimum sample size \(n\) is required?

Solution

We have \(n=150\) and \(\hat{p}=87/150\approx 0.58\).

The standard error is

\[ \sqrt{\frac{0.58\times 0.42}{150}} \approx 0.04030. \]

Confidence intervals:

90% (\(z_{0.05}=1.645\)): \([0.58 \pm 0.066] \approx [0.514;\; 0.646]\)
95% (\(z_{0.025}=1.96\)): \([0.58 \pm 0.079] \approx [0.501;\; 0.659]\)
99% (\(z_{0.005}=2.576\)): \([0.58 \pm 0.104] \approx [0.476;\; 0.684]\)

Comment: as the confidence level increases, the interval widens. At 95%, we can say that between roughly \(50\%\) and \(66\%\) of employees are satisfied with remote work.

For question 2, the width of a 95% interval is

\[ 2\times 1.96\times\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < 0.10, \]

\[ \sqrt{\frac{0.58\times 0.42}{n}} < \frac{0.10}{2\times 1.96} = 0.02551. \]

Squaring both sides:

\[ \frac{0.2436}{n} < 0.000651 \quad\Longrightarrow\quad n > \frac{0.2436}{0.000651} \approx 374.2. \]

The firm must survey at least \(\mathbf{n=375}\) employees.

7.5 Exercise 5 — Manufacturing tolerance

Exercise

A quality engineer randomly selects \(50\) bolts from a production line. The measured diameters give \(\bar{x}=12.03\) mm and \(s=0.08\) mm. The engineering specification requires a target diameter of exactly \(12\) mm.

Compute the confidence interval for the true mean diameter at confidence levels \(95\%\) and \(99\%\). Comment.
Based on the intervals, does the production line appear to be centred on the target? Interpret.

Solution

We have \(n=50\), \(\bar{x}=12.03\) mm, and \(s=0.08\) mm.

The standard error is

\[ \frac{s}{\sqrt{n}} = \frac{0.08}{\sqrt{50}} \approx 0.01131. \]

95% confidence level (\(z_{0.025}=1.96\)):

\[ CI_\mu = [12.03 - 1.96\times 0.01131;\; 12.03 + 1.96\times 0.01131] \approx [12.008;\; 12.052]. \]

99% confidence level (\(z_{0.005}=2.576\)):

\[ CI_\mu = [12.03 - 2.576\times 0.01131;\; 12.03 + 2.576\times 0.01131] \approx [12.001;\; 12.059]. \]

Comment on the target: the target value of \(12\) mm lies outside the 95% interval and at the very edge of the 99% interval. This is statistical evidence that the production line is systematically producing bolts slightly above the target diameter. A recalibration of the machine should be considered.

7.6 Application — Lifetime of machines

Exercise

The research officer of an insurance company is interested in the lifetime (in months) of a machine of brand M. He randomly chose 100 machines and recorded their lifetime. The empirical mean is \(17.4\) and the empirical standard deviation is \(7.15821\).

Calculate the confidence interval for the average life of machines M, with confidence level 95%. Interpret.
Let \(p\) be the probability that a machine M exceeds 1 year. Compute the confidence interval for \(p\), with confidence levels 95% and 99%. Interpret.

Solution

For the mean, with \(n=100\), \(\bar{x}=17.4\), \(s=7.15821\), and \(z_{0.025}=1.96\):

\[ CI_{\mu} = \left[17.4-1.96\frac{7.15821}{10},\;17.4+1.96\frac{7.15821}{10}\right] \]

\[ CI_{\mu} \approx [15.997,\;18.803]. \]

Interpretation: with 95% confidence, the mean lifetime is compatible with values between about 16.0 and 18.8 months.

For \(p=P(X>12)\), the raw count of machines above 12 months is not given. If we additionally use a Normal approximation with mean \(17.4\) and standard deviation \(7.15821\), then

\[ \hat{p} \approx P(X>12) \approx 0.775. \]

This gives approximate confidence intervals:

95%: \([0.693,\;0.857]\)
99%: \([0.667,\;0.882]\)

Methodological remark

For a proportion, the most direct method would be to count how many of the 100 machines lasted more than 12 months. Since this count is absent, the result above relies on an additional modeling assumption.