Confidence Intervals for One Population Mean
A common problem in statistics is to obtain information about the mean, μ, of a population. One way to obtain information about a population mean μ without taking a census is to estimate it by a sample mean x(bar). So, a point estimate of a parameter is the value of a statistic used to estimate the parameter. More generally, a statistic is called an unbiased estimator of a parameter if the mean of all its possible values equals the parameter; otherwise, the statistic is called a biased estimator of the parameter. Ideally, we want our statistic to be unbiased and have small standard error. In that case, chances are good that our point estimate (the value of the statistic) will be close to the parameter.
However, it is not uncommon that a sample mean is usually not equal to the population mean, especially when the standard error is not small as stated previously. Therefore, we should accompany any point estimate of μ with information that indicates the accuracy of that estimate. This information is called a confidence-interval estimate for μ. By definition, the confidence interval (CI) is an interval of numbers obtain from a point estimate of a parameter. The confidence level is the confidence we have that the parameter lies in the confidence interval. And the confidence-interval estimate is the confidence level and confidence interval. An confidence interval for a population mean depends on the sample mean, x(bar), which in turn depdends on the sample selected.
Margin of error E indicates how accurate the sample mean of x(bar) is as an estimate for the value of the unknown parameter of μ. With the point estimate and confidence-interval estimate (of 95% confidence interval), we can be 95% confident that the μ is within E of the sample mean. Simply, it means that the μ = point estimate +- E.
Summary
- Point estimate
- Confidence-interval estimate
- Margin of error
Computing the Confidence-Interval for One Population Mean (σ known)
We not develop a step-by-step procedure to obtain a confidence interval for a population mean when the population standard deviation is known. In doing so, we assume that the variable under consideration is normallhy distributed. Because of the central limit theorem, however, the procedure will also work to obtain an approximately correct confidence interval when the sample size is large, regardless of the distribution of the variable. The basis of our confidence-interval procedure is the sampling distribution of the sample mean for a normally distributed variable: Suppose that a variable x of a population is normally distributed with mean μ and standard deviation σ. Then, for samples of size n, the variable x(bar) is also normally distributed and has mean μ and standard deviation σ/√n. As a consequence, we have the procedure to compute the confidence-interval.
PS: The one-mean z-interval procedure is also known as the one-sample z-interval procedure and the one-variable z-interval procedure. We prefer "one-mean" because it makes clear the parameter being estimated.
PS: By saying that the confidence interval is exact, we mean that the true confidence level equals 1 – α; by saying that the confidence that the confidence interval is approximately correct, we mean that the true confidence level only approximately equals 1 – α.
Before applying Procedure 8.1, we need to make several comments about it and the assumptions for its use, including:
- We use the term normal population as an abbreviation for "the variable under consideration is normally distributed."
- The z-interval procedure works reasonably well even when the variable is not normally distributed and the sample size is small or moderate, provided the variable is not too far from being normally distributed. Thus we say that the z-interval procedure is robust to moderate violations of the normality assumption.
- Watch for outlilers because their presence calls into question the normality assumption. Moreover, even for large samples, outliers can sometimes unduly affect a z-interval because the sample mean is not resistant to outliers.
- A statistical procedure that works reasonably well even when one of its assumptions is violated (or moderately violated) is called a robust procedure relative to that assumption.
Summary
Key Fact 8.1 makes it clear that you should conduct preliminary data analyses before applying the z-interval procedure. More generally, the following fundamental principle of data analysis is relevant to all inferential procedures: Before performing a statistical-inference procedure, examine the sample data. If any of the conditions required for using the procedure appear to be violated, do not apply the procedure. Instead use a different, more appropriate procedure, if one exists. Even for small samples, where graphical displays must be interpreted carefully, it is far better to examine the data than not to. Remember, though, to proceed cautiously when conducting graphical analyses of small samples, especially very small samples – say, of size 10 or less.
Sample Size Estimation
If the margin of error and confidence level are specified in advance, then we must determine the sample size needed to meet those specifications. To find the formula for the required sample, we solve the margin-of-error formula, E = z_{α/2} · σ/√n, for n. See the computing formula in Formula 8.2.
Computing the Confidence-Interval for One Population Mean (σ unknown)
So far, we have discussed how to obtain the confidence-interval estimate when the population standard deviation, σ, is known. What if, as is usual in practice, the population standard deviation is unknown? Then we cannot base our confidence-interval procedure on the standardized version of x(bar). The best we can do is estimate the population standard deviation, σ, by the sample standard deviation, s; in other words, we replace σ by s in Procedure 8.1 and base our confidence-interval procedure on the resulting variable t (studentized version of x(bar)). Unlike the standardize version, the studentized version of x(bar) does not have a normal distribution.
Suppose that a variable x of population is normally distributed with mean μ. Then, for samples of size n, the variable t has the t-distribution with n-1 degrees of freedom. A variable with a t-distribution has an associated curve, called a t-curve. Although there is a different t-curve for each number of degrees of freedom, all t-curves are similar and resemble the standard normal cruve. As the number of degrees of freedom becomes larger, t-curves look increasingly like the standard normal curve.
Having discussed t-distributions and t-curves, we can now develop a procedure for obtaining a confidence interval for a population mean when the population standard deviation is unknown. The procedure is called the one-mean t-interval procedure or, when no confusion can arise, simply the t-interval procedure.
Properties and guidelines for use of the t-interval procedure are the same as those for the z-interval procedure. In particular, the t-interval procedure is robust to moderate violations of the normality assumption but, even for large samples, can sometimes be unduly affected by outliers because the sample mean and sample standard deviation are not resistant to outliers.
What If the Assumptions Are Not Satisfied?
Suppose you want to obtain a confidence interval for a population mean based on a small sample, but preliminary data analyses indicate either the presence of outliers or that the variable under consideration is far from normally distributed. As neither the z-interval procedure nor the t-interval procedure is appropriate, what can you do? Under certain conditions, you can use a nonparametric method. Most nonparametric methods do not require even approximate normality, are resistant to outliers and other extreme values, and can be applied regardless of sample size. However, parametric methods, such as the z-interval and t-interval procedures, tend to give more accurate results than nonparametric methods when the normality assumption and other requirements for their use are met.