**The Random Sampling Distribution of Means**

Imagine you have a hat containing 100 cards, numbered from 0 to 99. At random, you take out five cards, record the number written on each one, and find the mean of these five numbers. Then you put the cards back in the hat and draw another random sample, repeating the same process for about 10 minutes.

Do you expect that the means of each of these samples will be exactly the same? Of course not. Because of sampling error, they vary somewhat. If you plot all the means on a frequency distribution, the sample means form a distribution, called **the random sampling distribution of means**. If you actually try this, you will note that this distribution looks pretty much like a normal distribution. If you continued drawing samples and plotting their means ad infinitum, you would find that the distribution actually becomes a normal distribution! **This holds true even if the underlying population was not all normally distributed**: in our population of cards in the hat, there is just one card with each number, so the shape of the distribution is actually rectangular, yet its random sampling of means still tends to be normal.

These principles are stated by the **central limit theorem**, which states that the random sampling distribution of means will always tend to be normal, irrespective of the shape of the population distribution from which the samples were drawn. **According to the theorem, the mean of the random sampling distribution of means is equal the mean of the original population**.

Like all distributions, the random sampling distribution of means not only has a mean, but also has a standard deviation. This particular standard deviation, the standard deviation of the random sampling distribution of means is the standard deviation of the population of all the sample means. It has its own name: **standard error**, or **standard error of the mean**. It is a measure of the extent to which the sample means deviate from the true population mean.

When repeated random samples are drawn from a population, most of the means of those samples are going to cluster around the original population mean. If the samples each consisted of just two cards what would happen to the shape of the random sampling distribution of means? Clearly, with an n of just 2, there would be quite a high chance of any particular sample mean falling out toward the tails of the distribution, giving a broader, fatter shape to the curve, and hence a higher standard error. On the other hand, if the samples consisted of 25 cards each (n = 25), it would be very unlikely for many of their means to lie far from the center of the curve. Therefore, there would be a much thinner, narrower curve and a lower standard error.

So the shape of the random sampling distribution of means, as reflected by its standard error, is affected by the size of the samples. In fact, the standard error is equal to the population standard deviation (σ) divided by the square root of the size of the samples (n).

**Using the Standard Error**

Because the random sampling distribution of means is normal, so the z score could be expressed as follow. It is possible to find the limits between which 95% of all possible random sample means would be expected to fall (z score = 1.96).

**Estimating the Mean of a Population**

It has been shown that 95% of all possible members of the population (sample means) will lie within approximately +-2 (or, more exactly, +-1.96) standard errors of the population mean. The sample mean lies within +-1.96 standard errors of the population mean in 95% of the time; conversely, **the population mean lies within +-1.96 standard errors of the sample mean 95% of the time**. These limits of +-1.96 standard errors are called the **confidence limits**.

Therefore, 95% confidence limits are approximately equal to the sample mean plus or minus two standard errors. The difference between the upper and lower confidence limits is called the **confidence interval** – sometimes abbreviated as **CI**. Researchers obviously want the confidence interval to be as narrow as possible. The formula for confidence limits shows that to make the confidence interval narrower (for a given level of confidence, such as 95%), **the standard error must be made smaller**.

**Estimating the Standard Error**

According to the formula above, we cannot calculate standard error unless we know population standard deviation (σ). In practice, σ will not be known: researchers hardly ever know the standard deviation of the population (and if they did, they would probably not need to use inferential statistics anyway).

As a result, standard error cannot be calculated, and so z scores cannot be used. However, **the standard error can be estimated using data that are available from the sample alone**. The resulting statistic is the estimated standard error of the mean, usually called estimated standard error, as shown by formula below.

where S is the sample standard deviation.

**t Scores**

**The estimated standard error is used to find a statistic, t, that can be used in place of z score**. The t score, rather than the z score, must be used when making inferences about means that are based on estimates of population parameters rather than on the population parameters themselves. The t score is Student’s t, which is calculated in much the same way as z score. But while z was expressed in terms of the number of standard errors by which a sample mean lies above or below the population mean, t is expressed in terms of the number of estimated standard errors by which the sample mean lies above or below the population mean.

Just as z score tables give the proportions of the normal distribution that lie above and below any given z score, t score tables provide the same information for any given t score. However, there is one difference: while the value of z for any given proportion of the distribution is constant, **the value of t for any given proportion is not constant – it varies according to sample size**. When the sample size is large (n >100), the value of t and z are similar, but as samples get smaller, t and z scores become increasingly different.

**Degree of Freedom and t Tables**

**Table 2-1** (right-upper) is an abbreviated t score table that shows the values of t corresponding to different areas under the normal distribution for various sample sizes. Sample size (n) is not stated directly in t score tables; instead, the tables express sample size in terms of **degrees of freedom** (df). The mathematical concept behind degrees of freedom is complex and not needed for the purposes of USMLE or understanding statistics in medicine: for present purposes, df can be defined as simply equal to n – 1. Therefore, to determine the values of t that delineate the central 95% of the sampling distribution of means based on a sample size of 15, we would look in the table for the appropriate value of t for df = 14; this is sometimes written as t14. Table 2-1 shows that this value is 2.145.

As n becomes larger (100 or more), the values of t are very close to the corresponding values of z.