Sampling From a Normal Distribution With unknown
m and
known s2.
The basic form of the confidence interval is:
_
P[|Xn - m| < c] = 1 - a, or
_ _
P[Xn - c < m < Xn + c] = 1 - a, where
a = .01 or .05 or .10 typically.
Note that the random variable here is a line length. The proper
interpretation of it is that, in the long run, 1 - a
_
percent of the time the interval around Xn will contain m.
_
Also note that, once we insert a value for Xn we no
longer have a random variable -- we have a defined interval.
Hence, we say that we are "1 - a confident
that the true mean is in the interval."
To find the value of c:
_ _
P[|Xn - m| < c] = P[-c < Xn - m < c] =
_
P[-cn1/2/s < (Xn - m)/s/n1/2 < cn1/2/s] =
P[-cn1/2/s < Z < cn1/2/s] = F(cn1/2/s) - F(-cn1/2/s) =
F(za/2) - F(-za/2) = 1 - a
Hence: za/2 = c/s/n1/2 and c = za/2s/n1/2
Which gives us our confidence interval:
_ _
P[Xn - za/2s/n1/2 < m < Xn + za/2s/n1/2] = 1 - a
Example: Suppose we take a random sample of 25 from
N(m, 1).
Construct a 95% confidence interval for
m.
We are given that a = .05,
so that a/2 = .025 and
z.025 = 1.96. Hence
za/2
s/n1/2 =
(1.96*1)/5 = .392, so the interval
is:
_
Xn ± .392
Sampling From a Normal Distribution With unknown
m and
unknown s2
with large sample size (n > 30).
In this case we simply substitute s2 for
s2
by appeal to the Central Limit Theorem and
obtain the confidence interval:
_ _
P[Xn - za/2s/n1/2 < m < Xn + za/2s/n1/2] = 1 - a
Suppose we have a large order of bolts delivered to our
factory. We are concerned about the precision with which these bolts
have been machined. In particular, we want to construct 95% confidence
limits for the true mean length of the bolts. Assume that the length
of the bolts is normally distributed.
_
We are given: n = 500, Xn = 6.1cm, s = .1cm,
1 - a = .95, a = .05, a/2 = .025,
Hence, z.025 = 1.96
So the confidence limits are:
6.1 ± (1.96*.1)/5001/2 or
(6.091, 6.109)
We are 95 percent confident that the true mean length of the bolts is in the
interval.
Large Sample (n > 50) Confidence Interval for proportions.
With large sample sizes, we can appeal to the Central Limit Theorem and
assume:
Ù Ù Ù
p ~ N[p, p(1 - p)/n] so that the confidence interval is:
Ù Ù Ù Ù Ù Ù
P{p - za/2[p(1 - p)/n]1/2 < p < p + za/2[p(1 - p)/n]1/2} = 1 - a
Problem 8.47 p.346
Ù
We are given: n = 1506, p = .73,
1 - a = .95, a = .05, a/2 = .025.
Hence, z.025 = 1.96
So the confidence limits are: .73 ± 1.96[(.73*.27)/1506]1/2
.73 ± .0224 or
(.7076, .7524)
We are 95 percent confident that the true proportion is in the interval.
Confidence Interval for
s2 when the random sample
is drawn from a Normal Distribution.
Ideally, the confidence interval would be built around the probability
distribution for s2 --
our unbiased estimator for
s2.
Unfortunately, this distribution is not so easily used. However,
the distribution of (n-1)s2/
s2 is known to be
Chi-Square with n-1 degrees of freedom.
Since the Chi-Square is an asymmetric distribution we define c1
to be a point below which
a/2
of the probability lies, and define
c2 be a point above which
a/2 of the probability lies.
Hence:
P[c1 < (n-1)s2/
s2 < c2] =
P[(n-1)s2/c2 <
s2 <
(n-1)s2/c1] =
1 - a
Problem 8.79 p.362
We are given: n = 6, df = n - 1 = 5,
1 - a = .99,
a = .1,
a/2 = .05.
Hence, c1 = 1.145476 and
c2 = 11.0705
s2 = .502667
(n-1)s2/c2 = (5*.502667)/11.0705 = .227, and
(n-1)s2/c1 = (5*.502667)/1.145476 = 2.194
Hence: (.227, 2.194)
We are 90 percent confident that s2
is in the interval.
Sampling From a Normal Distribution With unknown
m and
unknown s2
with small sample size (n < 30).
In this case we build our confidence interval from the
t distribution. In particular,
_
(Xn - m)/s/n1/2 ~ tn-1
So that we can write the confidence interval as:
_ _
P[Xn - ta/2s/n1/2 < m < Xn + ta/2s/n1/2] = 1 - a
Problem 8.68 p.358
We are given: n = 20, s = 57,
1 - a = .9,
a = .1,
a/2 = .05.
Hence, t.05,19df = 1.729
Confidence Limits: 419
± (1.729*57)/201/2 =
419 ± 22.04
or (396.96, 441.04)
Yes, the population mean is in the interval. All values in the
interval have 90% confidence.
Confidence Limits: 455
± (1.729*69)/201/2
or (428.33, 481.67)
Confidence Interval for the Difference Between Two Means when
sampling from two separate, independent, Normal Distributions with
known variances.
Here we use the same technique as the the testing problem discussed in
notes #10 (1). Namely, let the distributions of the sample means be:
_ _
Xn ~ N[mx, sx2/n] and Ym ~ N[my, sy2/m]
_ _
Then: Xn - Ym ~ N[mx - my, sx2/n + sy2/m]
And the confidence interval is:
_ _ _ _
P{Xn - Ym - za/2[sx2/n + sy2/m]1/2 < mx - my < Xn - Ym + za/2[sx2/n + sy2/m]1/2}
= 1 - a
Confidence Interval for the Difference Between Two Means when
sampling from two separate, independent, Normal Distributions with
unknown variances but large (n > 30 and m > 30) sample
sizes.
Here the confidence interval is the same as in (3) but
sx2 and sy2
are used in the formula. Namely:
_ _ _ _
P{Xn - Ym - za/2[sx2/n + sy2/m]1/2 < mx - my < Xn - Ym + za/2[sx2/n + sy2/m]1/2}
= 1 - a
Problem 8.52 p.347
_ _
We are given: n = 252, m = 307, Xn = 11.48, Ym = 13.21,
sx = 5.69, sy = 5.31,
1 - a = .95, a = .05, a/2 = .025.
Hence, z.025 = 1.96
Our confidence limits are:
11.48 - 13.21 ± 1.96[5.692/252 + 5.312/307]1/2 =
-1.73 ± .92
Which produces the interval, (-2.65, -.81)
_ _
We are given: n = 252, m = 307, Xn = 22.05, Ym = 25.96,
sx = 5.12, sy = 5.07,
1 - a = .90, a = .10, a/2 = .05.
Hence, z.05 = 1.645
Our confidence limits are:
22.05 - 25.96 ± 1.645[5.122/252 + 5.072/307]1/2 =
-3.91 ± .71
Which produces the interval, (-4.62, -3.20)
Note that both intervals do not include 0. Hence, we are
95 and 90 percent confident respectively, that there is a significant difference between
men and women on these two scales.
Confidence Interval for the Difference Between Two Proportions with
large (n > 50 and m > 50) sample sizes.
Here we appeal to the Central Limit Theorem to write:
Ù Ù Ù Ù Ù Ù
p1 - p2 ~ N[p1 - p2, p1(1 - p1)/n + p2(1 - p2)/m]
And the confidence limits can be computed from:
Ù Ù Ù Ù Ù Ù
p1 - p2 ± za/2[p1(1 - p1)/n + p2(1 - p2)/m]1/2
Problem 8.49 p.347
Ù Ù
We are given: p1 = .19, p2 = .70, n = 1250, m = 1251,
1 - a = .9, a = .1, a/2 = .05.
Hence, z.05 = 1.645.
The confidence limits are:
.19 - .70
± 1.645[(.19*.81)/1250 + (.7*.3)/1251] =
-.51 ± .028
The interval, (-.538, -.482), is well below 0 so we are 95% confident,
based upon this evidence,
that there was a change of opinion between
the two periods.
Problem 8.50 p.347
Ù Ù
We are given: p1 = .67, p2 = .90, n = 1250, m = 1251,
1 - a = .98, a = .02, a/2 = .01.
Hence, z.01 = 2.33.
The confidence limits are:
.67 - .90
± 2.33[(.67*.33)/1250 + (.90*.1)/1251] =
-.23 ± .0368
The interval, (-.2668, -.1932), is well below 0 so we are 98% confident,
based upon this evidence,
that there was a change of opinion regarding smoke detectors between
the two periods.
Confidence Interval for the Difference Between Two Means when
sampling from two separate, independent, Normal Distributions with
unknown variances and small (n < 30 and m < 30) sample
sizes.
Here we must assume that
sx2 =
sy2 and
use this assumption to combine the two sample
sum of squares to obtain
s2. Namely:
s2 = [(n - 1)(sx)2 +
(m - 1)(sy)2]/(n + m - 2)
The Confidence Interval is:
_ _ _ _
P{Xn - Ym - ta/2s(1/n + 1/m)1/2 < mx - my < Xn - Ym + ta/2s(1/n + 1/m)1/2} =
1 - a
Problem 8.71 p.359
_ _
We are given: Xn = 11, Ym = 20,
n = 16, m = 20, sx = 6, sy = 8,
1 - a = .95, a = .05, a/2 = .025.
Hence, t.025, 34df = 1.96
Pooling the sample sums of squares:
s2 = (15*36 + 19*64)/34 = 51.647
The confidence limits are:
11 - 12
± 1.96[51.647(1/16 + 1/20)]1/2 =
-1 ± 4.72
For an interval of: (-5.72, 3.72)