The data in a case-control study represent two samples: The cases are drawn from a population of people who have the disease and the controls from a population of people who do not have the disease. The predictor variable (risk factor) is measured, and the results can be summarized in a 2 X 2 table like the following one:

If this 2 X 2 table represented data from a cohort study, then the incidence of the disease in those with the risk factor would be a/(a + b) and the relative risk would be simply [a/(a + b)]/[c/(c + d)]. However, it is not appropriate to compute either incidence or relative risk in this way in a case-control study **because the two samples are not drawn from the population in the same proportions**. Usually, there are roughly equal numbers of cases and controls in the study samples but many fewer cases than controls in the population. Instead, relative risk in a case-control study can be approximately by the odds ratio, computed as the cross-product of the 2 X 2 table, ad/cb.

This extremely useful fact is difficult to grasp intuitively but easy to demonstrate algebraically. Consider the situation for the full population, represented by a’, b’, c’, and d’.

Here it is appropriate to calculate the risk of disease among people with the risk factor as a’/(a’ + b’), the risk among those without the risk factor as c’/(c’ + d’), and the relative risk as [a’/(a’ + b’)]/[c’/(c’ + d’)]. We have already discussed the fact that a’/(a’ + b’) is not equal to a/(a + b). However, if the disease is relatively uncommon in both those with and without the risk factor (as most are), then a’ is much smaller than b’, and c’ is much smaller than d’. This means that a’/(a’ + b’) is closely approximated by a’/b’ and that c’/(c’ + d’) is closely approximated by c’/d’. Therefore, the relative risk of the population can be approximated as follows:

The latter term is the odds ratio of the population (literally, the ratio of the odds of disease in those with the risk factor, a’/b’, to the odds of disease in those without the risk factor, c’/d’).

**a’/c’ in the population equals a/c in the sample is the cases are representative of all cases in the population** (i.e., have the same prevalence of the risk factor). Similarly, b’/d’ equals b/d if the controls are representative.

Therefore, the population parameters in this last term can be replaced by the sample parameters, and we are left with the fact that the odds ratio observed in the sample, ad/bc, is a close approximation of the relative risk in the population [a’/(a’ + b’)]/[c’/(c’ + d’)], provided that the disease is rare.

**Why can't calculate risk in a case-control study?**

For most people, the risk of some particular outcome, being akin to probability, makes more sense and is easier to interpret than the odds for that same outcome. To calculate the risk, you need to know two things: the total number who'd had a outcome and the number of those who had been exposed to the risk. You would then divide the latter by the former. In a cohort study on the other hand, you start with healthy individuals and follow them to measure the proportion exposed to the risk factor who subsequently developed the illness. This proportion would be an estimate of the risk in the population.

However, in a case-control study, you select on the basis of whether people have some illness or condition or not. So you have one group composed of individuals who've had an illness, and one group who have not had the illness, but **both groups will contain individual who were, and were not, exposed to the risk**. Moreover, **you can select whatever number of cases and controls you want**. You could, for example, halve the number of cases and double the number of controls. This means that the column totals, which you would otherwise need for your risk calculation, are meaningless. The result of this is that the population at risk cannot be estimated using a case-control study and so risks and risk ratios cannot be calculated.

texttexttext`code`

more code

~~~~