## How to compute the expected 95% CI The Random Sampling Distribution of Means

Imagine you have a hat containing 100 cards, numbered from 0 to 99. At random, you take out five cards, record the number written on each one, and find the mean of these five numbers. Then you put the cards back in the hat and draw another random sample, repeating the same process for about 10 minutes.

Do you expect that the means of each of these samples will be exactly the same? Of course not. Because of sampling error, they vary somewhat. If you plot all the means on a frequency distribution, the sample means form a distribution, called the random sampling distribution of means. If you actually try this, you will note that this distribution looks pretty much like a normal distribution. If you continued drawing samples and plotting their means ad infinitum, you would find that the distribution actually becomes a normal distribution! This holds true even if the underlying population was not all normally distributed: in our population of cards in the hat, there is just one card with each number, so the shape of the distribution is actually rectangular, yet its random sampling of means still tends to be normal.

These principles are stated by the central limit theorem, which states that the random sampling distribution of means will always tend to be normal, irrespective of the shape of the population distribution from which the samples were drawn. According to the theorem, the mean of the random sampling distribution of means is equal the mean of the original population.

Like all distributions, the random sampling distribution of means not only has a mean, but also has a standard deviation. This particular standard deviation, the standard deviation of the random sampling distribution of means is the standard deviation of the population of all the sample means. It has its own name: standard error, or standard error of the mean. It is a measure of the extent to which the sample means deviate from the true population mean.

When repeated random samples are drawn from a population, most of the means of those samples are going to cluster around the original population mean. If the samples each consisted of just two cards what would happen to the shape of the random sampling distribution of means? Clearly, with an n of just 2, there would be quite a high chance of any particular sample mean falling out toward the tails of the distribution, giving a broader, fatter shape to the curve, and hence a higher standard error. On the other hand, if the samples consisted of 25 cards each (n = 25), it would be very unlikely for many of their means to lie far from the center of the curve. Therefore, there would be a much thinner, narrower curve and a lower standard error.

So the shape of the random sampling distribution of means, as reflected by its standard error, is affected by the size of the samples. In fact, the standard error is equal to the population standard deviation (σ) divided by the square root of the size of the samples (n).

Using the Standard Error Because the random sampling distribution of means is normal, so the z score could be expressed as follow. It is possible to find the limits between which 95%  of all possible random sample means would be expected to fall (z score = 1.96). Estimating the Mean of a Population

It has been shown that 95% of all possible members of the population (sample means) will lie within approximately +-2 (or, more exactly, +-1.96) standard errors of the population mean. The sample mean lies within +-1.96 standard errors of the population mean in 95% of the time; conversely, the population mean lies within +-1.96 standard errors of the sample mean 95% of the time. These limits of +-1.96 standard errors are called the confidence limits. Therefore, 95% confidence limits are approximately equal to the sample mean plus or minus two standard errors. The difference between the upper and lower confidence limits is called the confidence interval – sometimes abbreviated as CI. Researchers obviously want the confidence interval to be as narrow as possible. The formula for confidence limits shows that to make the confidence interval narrower (for a given level of confidence, such as 95%), the standard error must be made smaller.

Estimating the Standard Error

According to the formula above, we cannot calculate standard error unless we know population standard deviation (σ). In practice, σ will not be known: researchers hardly ever know the standard deviation of the population (and if they did, they would probably not need to use inferential statistics anyway).

As a result, standard error cannot be calculated, and so z scores cannot be used. However, the standard error can be estimated using data that are available from the sample alone. The resulting statistic is the estimated standard error of the mean, usually called estimated standard error, as shown by formula below. where S is the sample standard deviation.

t Scores

The estimated standard error is used to find a statistic, t, that can be used in place of z score. The t score, rather than the z score, must be used when making inferences about means that are based on estimates of population parameters rather than on the population parameters themselves. The t score is Student’s t, which is calculated in much the same way as z score. But while z was expressed in terms of the number of standard errors by which a sample mean lies above or below the population mean, t is expressed in terms of the number of estimated standard errors by which the sample mean lies above or below the population mean. Just as z score tables give the proportions of the normal distribution that lie above and below any given z score, t score tables provide the same information for any given t score. However, there is one difference: while the value of z for any given proportion of the distribution is constant, the value of t for any given proportion is not constant – it varies according to sample size. When the sample size is large (n >100), the value of t and z are similar, but as samples get smaller, t and z scores become increasingly different.

Degree of Freedom and t Tables

Table 2-1 (right-upper) is an abbreviated t score table that shows the values of t corresponding to different areas under the normal distribution for various sample sizes. Sample size (n) is not stated directly in t score tables; instead, the tables express sample size in terms of degrees of freedom (df). The mathematical concept behind degrees of freedom is complex and not needed for the purposes of USMLE or understanding statistics in medicine: for present purposes, df can be defined as simply equal to n – 1. Therefore, to determine the values of t that delineate the central 95% of the sampling distribution of means based on a sample size of 15, we would look in the table for the appropriate value of t for df = 14; this is sometimes written as t14. Table 2-1 shows that this value is 2.145.

As n becomes larger (100 or more), the values of t are very close to the corresponding values of z.

## Basic Concepts in Statistics Three Kinds of Data

There are three types of data, including interval data where some variables are measured on a scale with constant intervals, nominal/categorical data, and ordinal data. For interval data, the absolute difference between two values can always be determined by subtraction. Interval variables include some such as tempterature, blood pressure, height, weight, and so on. There are other data, such as gender, state of birth, or whether or not a person has a certain disease, that are not measured on an interval scale. These variables are examples of nominal or categorical data, where individuals are classified into two or more mutually exclusive and exhaustive categories. For example, people could be categorised as male or female, dead or alive, or as being born in one of the 50 states, District of Columbia, or outside the United States. In every case, it is possible to categorise each individual into one and only one category. In addition, there is no arithmetic relationship or even ordering between the categories. Ordinal data fall between interval and nominal data. Like nominal data, ordinal data fall into categories, but there is an inherent ordering (or ranking) of the categories. Level of health (excellent, very good, good, fair, or poor) is a common example of a variable measured on an ordinal scale. The different values have a natural order, but the differences or “distances” between adjoining values on an ordinal scale are not necessarily the same and may not even be comparable. For example, excellent health is better than very good health, but this difference is not necessarily the same as the difference between fair and poor health. Indeed, these difference may not even be strictly comparable.

The Normal Distribution

If the observed measurement is the sum of many independent small random factors, the resulting measurements will take on values that are distributed in normal/Gaussian distribution. Note that the distribution is completely defined by the population mean μ and population standard deviation σ. The μ and σ, and the size of population are all the information one needs to describe the population fully if the distribution of values follows a normal distribution.

Getting The Data

We can get the data by examine every single member of the population, however, usually it is physically or fiscally impossible to do this, and we are limited to examining a sample of n individuals drawn from the population in the hope that it is representative of the complete population. Without knowledge of the entire population, we can no longer know the population mean – μ and population standard deviation – σ. Nevertheless, we can estimate them from the sample. To do so the sample has to be “representative” of the population from which it is drawn.

• Random Sample

All statistical methods are built on the assumption that the individuals included in your sample represent a random sample from the underlying (and unobserved) population. In a random sample every member of the population has an equal probability (chance) of being selected for the sample. The most direct way to create a simple random sample would be to obtain a list of every member of the population of interest, number them from 1 to N (where N is the number of population members), then use a computerised random number generator to select the n individuals for the sample. Every number has the same chance of appearing and there is no relationship between adjacent numbers. We could create several random samples by simply select samples that have not been selected before and the important point is not to reuse any sequence of random number already used to select a number. In this way, we ensure that every member of the population is equally likely (equal probability/chance) to be selected for observation in the sample.

The list of population members from which we drew the random sample is known as sampling frame. Sometimes it is possible to obtain such a list (for example, a list of all people hospitalised in a given hospital on a given day), but often no such list exists. When there is no list, investigators use other techniques for creating a random sample, such as dealing telephone numbers at random for public opinion polling or selecting geographic locations at random from maps. The issue of how the sampling frame is constructed can be very important in terms of how well and to whom the results of a given study generalize to individuals beyond the specific individuals in the sample. The procedure of random selection of samples above is known as simple random sample, by which we randomly selection samples from a population as a whole group. Conversely, investigators sometimes use stratified random samples in which they first divide the population into different population into different subgroups (perhaps based on gender, race, or geographic location), then construct simple random samples within each subgroup (strata). This procedure is used when there are widely varying numbers of people in the different subpopulations so that obtaining adequate sample sizes in the smaller subgroups would require collecting more data than necessary in the larger subpopulations if the sampling was done with a simple random sample. Stratification reduces data collection costs by reducing the total sample size necessary to obtain the desired precision in the results, but makes the data analysis more complicated. The basic need to create a random sample where each member of each subpopulation (strata) has the same chance of being selected is the same as in a simple random sample.

• The Mean and Standard Deviation

Having obtained a random sample from a population of interest, we are ready to use information from that sample to estimate the characteristics of the underlying population.

Sample mean = Sum of values/Number of observations in sample, that is in which the bar over the X denotes that it is the mean of the n observations of X.

The estimate of the population standard deviation is called the sample standard deviation s or SD and is defined as, The definition of the sample standard deviation, SD, differs from the definition of the population standard deviation σ in two ways: 1.the population mean μ has been replaced by our estimate of it, the sample mean X, and 2.we compute the “average” squared deviation of a sample by dividing n-1 rather than n. The precise reason for dividing by n-1 rather than n requires substantial mathematical arguments, but we can present the following intuitive justification: The sample will never show as much variability as the entire population and dividing by n-1 instead of n compensates for the resultant tendency of the sample standard deviation to underestimate the population standard deviation. In conclusion, if you are willing to assume that the sample was drawn from a normal distribution, summarise data with the sample mean and sample standard deviation, the best estimates of the population mean and population standard deviation, because these two parameters completely define the normal distribution. When there is evidence that the population under study does not follow a normal distribution, summarise data with the median and upper and lower percentiles discussed later in this thread.

• Standard Error of The Mean/How Good Are These Estimates (sample mean and sample standard deviation)

The mean and standard deviation computed from a random sample are estimates of the mean and standard deviation of the entire population from which the sample was drawn. There is nothing special about the specific random sample used to compute these statistics, and different random samples will yield slightly different estimates of the true population mean and standard deviation. To quantitate how accurate these estimates are likely to be, we can compute their standard errors. It is possible to compute a standard error for any statistic, but here we shall focus on the standard error of the mean. This statistic quantifies the certainty with which the mean computed from a random sample estimates the true mean of the population from which the sample was drawn.

To compute the standard error of the mean, you have to get two or more samples, in which each sample has a certain size of individuals. For each sample, you could get a mean value of the individuals in the sample. For example, if you have four samples, you would get 4 means. And the standard error of the mean could be calculated by the equation for the calculation of SD, as described above. We denote the standard error of the mean σX.

Just as the standard deviation of the original sample of a certain amount of individuals is an estimate of the variability distribution for the whole population, σX is an estimate of the variability of possible values of means of samples of a certain amount of individuals. Since when one computes the mean, extreme values tend to balance each other, there will be less variability in the values of the sample means than in the original population (that is, SD>σX). σX is a measure of the precision with which a sample mean estimates X the population mean μ.

The standard error of the mean tells not about variability in the original population, as the standard deviation does, but about the certainty with which a sample mean estimates the true population mean.

Since the precision with which we can estimate the mean increases as the sample size increases, the standard error of the mean decreases as the sample size increases. Conversely, the more variability in the original population, the more variability will appear in possible mean values of samples; therefore, the standard error of the mean increases as the population standard deviation increases. The true standard error of the mean of samples of size n drawn from a population with standard deviation σ is, Mathematicians have shown that the distribution of mean values will always approximately follow a normal distribution, regardless of how the population from which the original samples were drawn is distributed. We have developed what statisticians call the Central Limit Theorem. It says,

• The distribution of sample means will be approximately normal regardless of the distribution of values in the original population from which the samples were drawn.
• The mean value of the collection of all possible sample means will equal the mean of the the original population.
• The standard deviation of the collection of all possible means of samples of a given size, called the standard error of the mean, depends on both the standard deviation of the original population and the size of the sample.