Factorial Designs

March 5, 2018 Clinical Trials, Medical Statistics, Research No comments , , , , , , , , ,

In this section we will describe the completely randomized factorial design. This design is commonly used when there are two or more factors of interest. Recall, in particular, the difference between an observational study and a designed experiment. Observational studies involve simply observing characteristics and taking measurements, as in a sample survey. A designed experiment involves imposing treatments on experimental units, controlling extraneous sources of variation that might affect the experiment, and then observing characteristics and taking measurement on the experimental units.

Also recall that in an experiment, the response variable is the characteristic of the experimental outcome that is measured or observed. A factor is a variable whose effect on the response variable is of interest to the experimenter. Generally a factor is a categorical variable whose possible values are referred to as the levels of the factor. In a single factor experiment, we will assign experimental unit to the treatments (or vice versa). Experimental units should be assigned to the treatments in such a way as to eliminate any bias that might be associated with the assignment. This is generally accomplished by randomly assigning the experimental units to the treatments.

In certain medical experiments, called clinical trials, randomization is essential. To compare two or more methods of treating illness, it is important to eliminate any bias that could be introduced by medical personnel assigning patients to the treatments in a nonrandom fashion. For example, a doctor might erroneously assign patients who exhibit less severe symptoms of the illness to a less risky treatment.

PS: Advantages of randomized design over other methods for selecting controls

  • First, randomization removes the potential of bias in the allocation of participants to the intervention group or to the control group. Such selection bias could easily occur, and cannot be necessarily prevented, in the non-randomized concurrent or historical control study because the investigator or the participant may influence the choice of intervention. This influence can be conscious or subconscious and can be due to numerous factors, including the prognosis of the participant. The direction of the allocation bias may go either way and can easily invalidate the comparison. This advantage of randomization assumes that the procedure is performed in a valid manner and that the assignment cannot be predicted.
  • Second, somewhat related to the first, is that randomization tends to produce comparable groups; that is, measured as well as unknown or unmeasured prognostic factors and other characteristics of the participants at the time of randomization will be, on the average, evenly balanced between the intervention and control groups. This does not mean that in any single experiment all such characteristics, sometimes called baseline variables or covariates, will be perfectly balanced between the two groups. However, it does mean that for independent covariates, whatever the detected or undetected differences that exist between the groups, the overall magnitude and direction of the differences will tend to be equally divided between the two groups. Of course, many covariates are strongly associated; thus, any imbalance in one would tend to produce imbalances in the others.
  • Third, the validity of statistical tests of significance is guaranteed. As has been stated, “although groups compared are never perfectly balanced for important covariates in any single experiment, the process of randomization makes it possible to ascribe a probability distribution to the difference in outcome between treatment groups receiving equal effective treatments and thus to assign significance levels to observed differences.” The validity of the statistical tests of significance is not dependent on the balance of prognostic factors between the randomized groups.

Often in clinical trials, double blind studies are used. In this type of study, patients (the experimental units) are randomly assigned to treatments, and neither the doctor nor the patient knows which treatment has been assigned to the patient. This is an effective way to eliminate bias in treatment assignment so that the treatment effects are not confounded (associated) with other non experimental and uncontrolled factors.

Factorial design involve two or more factors. Consider the experiment in this example. There the researchers studied the effects of two factors (hydrophilic polymer and irrigation regimen) on weight gain (the response variable) of Golden Torch cacti. The two levels of the polymer factor were: used and not used. The irrigation regimen had five levels to indicate the amount of water usage: none, light, medium, heavy, and very heavy. This is an example of a two-factor or two-way factorial design.

In this experiment every level of polymer occurred with every level of irrigation regimen, for a total of 2 * 5 = 10 treatments. Often these 10 treatments are called treatment combinations to indicate that we combine the levels of the various factors together to obtain the actual collection of treatments. Since, in this case, every level of one factor is combined with every level of the other factor, we say that the levels of one factor are crossed with the levels of the other factor. When all the possible treatment combinations obtained by crossing the levels of the factors are included in the experiment, we call the design a complete factorial design, or simply a factorial design.

It is possible to extend the two-way factorial design to include more factors. For example, in the Golden Torch cacti experiment, the amount of sunlight the cacti receive could have an effect on weight gain. If the amount of sunlight is controlled in the two-way study so that all plants receive the same amount sunlight, then the amount of sunlight would not be considered a factor in the experiment.

However, since the amount of sunlight a cactus receives might have an effect on its growth, the experimenter might want to introduce this additional factor. Suppose we consider three levels of sunlight: high, medium, and low. The levels of sunlight could be achieved by placing screens of various mesh sizes over the cacti. If amount of sunlight is added as a third factor, there would be 2 * 5 * 3 = 30 different treatment combinations in a complete factorial design.

Possibly we could add even more factors to the experiment to take into account other factors that might affect weight gain of the cacti. Adding more factors will increase the number of treatment combinations for the experiment (unless the level of that factor is 1). In general, the total number of treatment combinations for a complete factorial design is the product of the number of levels of all factors in the experiment.

Obviously, as the number of factors increases, the number of treatment combinations increases. A large number of factors can result in so many treatment combinations that the experiment is unwieldy, too costly, or too time consuming to carry out. Most complete factorial designs involve only two or three factors.

To handle many factors, statisticians have devised experimental designs that use only a fraction of the total number of possible treatment combinations. These designs are called fractional factorial designs and are usually restricted to the case of all factors having two or three levels each. Fractional factorial designs cannot provide as much information as a complete factorial design, but they are very useful when a large number of factors is involved and the number of experimental units is limited by availability, cost, time, or other considerations. Fractional factorial designs are beyond the scope of this thread.

Once the treatment combinations are determined, the experimental units need to be assigned to the treatment combinations. In a completely randomized design, the experimental units are randomly assigned to the treatment combinations. If this random assignment is not done or is not possible, the treatment effects might become confounded with other uncontrolled factors that would make it difficult or impossible to determine whether an effect is due to the treatment or due to the confounding with uncontrolled factors.

Besides the random assignment of experimental units to treatment combinations, it is important that we use randomization in other ways when conducting an experiment. Often experiments are conducted in sequence. One treatment combination is applied to an experimental unit, and then the next treatment combination is applied to the next experimental unit, and so forth. It is essential that the order in which the experiments are conducted be randomized.

For example, consider an experiment in which measurements are made that are sensitive to heat or humidity. If all experiments associated with the first level of a factor are conducted on a hot and humid day, all experiments are associated with the second level of the factor are conducted on a cooler, less humid day, and so on, then the factor effect is confounded with the heat/humidity conditions on the days that the experiments are conducted. If the analysis indicates an effect due to the factor, we do not know whether there is actually a factor effect or a heat/humidity effect (or both). Randomization of the order in which the experiments are conducted would help keep the heat/humidity effect from being confounded with any factor effect.

Experimental and Classification Factors

In the description of designing experiments for factorial designs, we emphasized the idea of being able to assign experimental units to treatment combinations. If the experimental units are assigned randomly to the levels of a factor, the factor is called an experimental factor. If all the factors of a factorial design are experimental factors, we consider the study a designed experiment.

In some factorial studies, however, the experimental units cannot be assigned at random to the levels of a factor, as in the case when the levels of the factor are characteristics associated with the experimental units. A factor whose levels are characteristics of the experimental unit is called a classification factor. If all the factors of a factorial design are classification factors, we consider the study an observation study.

Consider, for instance, in the household energy consumption study, the response variable is household energy consumption and the factor of interest is the region of the United States in which a household is located. A household cannot be randomly assigned to a region of the country. The region of the country is a characteristic of the household and, thus, a classification factor. If we were to add home type as a second factor, the levels of this factor would also be a characteristic of a household, and, hence, home type would also be a classification factor. This two-way factorial design would be considered an observational study, since both of its factors are classification factors.

There are many studies that involve a mixture of experimental and classification factors. For example, in studying the effect of four different medications on relieving headache pain, the age of an individual might play a role in how long it takes before headache pain dissipates. Suppose a researcher decides to consider four age groups: 21 to 35 years old, 36 to 50 years old, 51 to 65 years old, and 66 years and older. Obviously, since age is a characteristic of an individual, age group is a classification factor.

Suppose that the researcher randomly selects 40 individuals from each age group and then randomly assigns 10 individuals in each age group to one of the four medications. Since each person is assigned at random to a medication, the medication factor is an experimental factor. Although one of the factors here is a classification factor and the other is an experimental factor, we would consider this designed experiment.

Fixed and Random Effect Factors

There is another important way to classify factors that depends on the way the levels of a factor are selected. If the levels of a factor are the only levels of interest to the researcher, then the factor is called a fixed effect factor. For example, in the Golden Torch cacti experiment, both factors (polymer and irrigation regimen) are fixed effect factors because the levels of each factor are the only levels of interest to the experimenter.

In the levels of a factor are selected at random from a collection of possible levels, and if the researcher wants to make inferences to the entire collection of possible levels, the factor is called a random effect factor. For example, consider a study to be done on the effect of different types of advertising on sales of a new sandwich at a national fast-food chain. The marketing group conducting the study feels that the city in which a franchise store is located might have an effect on sales. So they decide to include a city factor in the study, and randomly select eight cities from the collection of cities in which the company’s stores are located. They are not interested in these eight cities alone, but want to make inferences to the entire collection of cities. In this case the city factor is a random effect factor.

Evaluate The Article About Therapy (Randomized Trials)

January 28, 2016 Clinical Trials, Evidence-Based Medicine No comments , , , , , , , , , , ,

Section 1 How Serious Is The Risk of Bias?

Did Intervention and Control Groups Start With The Same Prognosis?

Consider the question of whether hospital care prolongs life. A study finds that more sick people die in the hospital than in the community. We would easily reject the naive conclusion that hospital care kills people because we recognize that hospitalized patients are sicker (worse prognosis) than patients in the community. Although the logic of prognostic balance is vividly clear in comparing hospitalized patients with those in the community, it may be less obvious in other contexts.

Were Patients Randomized?

The purpose of randomization is to create groups whose prognosis, with respect to the target outcomes, is similar. The reason that studies in which patient or physician preference determines whether a patient receives treatment or control (observational studies) often yield misleading results is that morbidity and mortality result from many causes. Treatment studies attempt to determine the impact of an intervention on events such as stroke, myocardial infarction, and death – occurrences that we call the trial's target outcomes. A patient's age, the underlying severity of illness, the presence of comorbidity, and a host of other factors typically determine the frequency with which a trial's target outcome occurs (prognostic factors or determinants of outcome). If prognostic factors – either those we know about or those we do not know about – prove unbalanced between a trial's treatment and control groups, the study's outcome will be biased, either underestimating or overestimating the treatment's effect. Because known prognostic factors often influence clinicians' recommendations and patients' decisions about taking treatment, observational studies often yield biased results that may get the magnitude or even the direction of the effect wrong.

Observational studies can theoretically match patients, either in the selection of patients for study or in the subsequent statistical analysis, for known prognostic factors. However, not all prognostic factors are easily measured or characterized, and in many diseases only a few are known. Therefore, even the most careful patient selection and statistical methods are unable to completely address the bias in the estimated treatment effect. The power of randomization is that treatment and control groups are more likely to have a balanced distribution of know and unknown prognostic factors. However, although randomization is a powerful technique, it does not always succeed in creating groups with similar prognosis. Investigators may make mistakes that compromise randomization, or randomization may fail because of chance – unlikely events sometimes happen.

Was Randomization Concealed?

When those enrolling patients are unware and cannot control the arm to which the patient is allocated, we refer to randomization as concealed. In unconcealed trials, those responsible for recruitment may systematically enroll sicker – or less sick – patients to either a treatment or control group. This behavior will compromise the purpose of randomization, and the study will yield a biased result (imbalance in prognosis).

Were Patients in the Treatment and Control Groups Similar With Respect to Known Prognostic Factors? (The Importance of Sample Size)

The purpose of randomization is to create groups whose prognosis, with respect to the target outcomes, is similar. Some times, through bad luck, randomization will fail to achieve this goal. The smaller the sample size, the more likely the trial will have prognostic imbalance.

Picture a trial testing a new treatment for heart failure that is enrolling patients classified as having New York Heart Association functional class III and class IV heart failure. Patients with class IV heart failure have a much worse prognosis than those with class III heart failure. The trial is small, with only 8 patients. One would not be surprised if all 4 patients with class III heart failure were allocated to the treatment group and all 4 patients with class IV heart failure were allocated to the control group. Such a result of the allocation process would seriously bias the study in favor of the treatment. Were the trial to enroll 800 patients, one would be startled if randomization placed all 400 patients with class III heart failure in the treatment arm. The larger the sample size, the more likely randomization will achieve its goal of prognostic balance.

The smaller the sample size, the more likely the trial will have prognostic imbalance. We can check how effectively randomization has balanced known prognostic factors by looking for a display of patient characteristics of the treatment and control groups at the study's commencement – the baseline or entry prognostic features. Although we will never know whether similarity exists for the unknown prognostic factors, we are reasssured when the known prognostic factors are well balanced. All is not lost if the treatment groups are not similar at baseline. Statistical techniques permit adjustment of the study result for baseline differences. When both adjusted analyses and unadjusted analyses generate the same conclusion, clinicians gain confidence that the risk of bias is not excessive.

Was Prognostic Balance Maintained as the Study Progressed?

To What Extent Was the Study Blinded?

If randomization succeeds, treatment and control groups begin wtih a similar prognosis. Randomization, however, provides no guarantees that the 2 groups will remain prognostically balanced. Blinding is the optimal strategy for maintaining prognostic balance. There are five groups that should, if possible, be blind to treatment assignment, including:

  • Patients – to avoid placebo effects
  • Clinicians – to prevent differential administration of therapies that affect the outcome of interest (cointervention)
  • Data collectors – to prevent bias in data collection
  • Adjudicators of outcome – to prevent bias in decisions about whether or not a patient has had an outcome of interest
  • Data analysts – to avoid bias in decisions regarding data analysis

These 5 groups involved in clinical trials will remain unware of whether patients are receiving the experimental therapy or control therapy.

Were the Groups Prognostically Balanced at the Study's Completion?

It is possible for investigators to effectively conceal and blind treatment assignment and still fail to achieve an unbiased result.

Was Follow-up Complete?

Ideally, at the conclusion of a trial, investigators will know the status of each patient with respect to the target outcome. The greater the number of patients whose outcome is unknown – patients lost of follow-up – the more a study is potentially compromised. The reason is that patients who are retained – they may disappear because they have adverse outcomes or because they are doing well and so did not return for assessment. The magnitude of the bias may be substantial. See  two examples in Pharmacy Profession Forum at http://forum.tomhsiung.com/pharmacy-practice/clinical-trials/852-example-how-lost-to-follow-up-affect-the-outcome-of-a-rct.html

Loss to follow-up may substantially increase the risk of bias. If assuming a worst-case scenario does not change the inferences arising from study results, then loss to follow-up is unlikely a problem. If such an assumption would significantly alter the results, the extent to which bias is introduced depends on how likely it is that treatment patients lost to follow-up fared badly, whereas control patients lost to follow-up fared well. That decision is a matter of judgement.

Was the Trial Stopped Too Early?

Stopping trial early (i.e., before enrolling the planned sample size) when one sees an apparent large benefit is risky and may compromise randomization. These stopped early trials run the risk of greatly overestimating the treatment effect.

A trial designed with too short a follow-up also may compromise crucial information that adequate length of follow-up would reveal. For example, consider a trial that randomly assigned patients with an abdominal aortic aneurysm to either an open surgical repair or a less invasive, endovascular repair technique. At the end of the 30-day follow-up, mortality was significantly lower in the endovascular technique group. The investigators followed up participants for an additional 2 years and found that there was no difference in mortality between groups after the first year. Had the trial ended earlier, the endovascular technique may have been considered substantially better than the open surgical techinique.

Were Patients Analyzed in the Groups to Which They Were Randomized?

Investigators will undermine the benefits of randomization if they omit from the analysis patients who do not receive their assigned treatment or, worst yet, count events that occur in nonadherent patients who were assigned to treatment against the controll group. Such analyses will bias the results if the reasons for nonadherence are related to prognosis. In a number of randomized trials, patients who did not adhere to their assigned drug regimens fared worse than those who took their medication as instructed, even after taking into account all known prognostic factors. When adherent patients are destined to have a better outcome, omitting those who do not receive assigned treatment undermines the unbiased comparison provided by randomization. Investigators prevent this bias when they follow the intention-to-treat principle and analyze all patients in the group to which they were randomized irrespective of what treatment they actually received. Following the intention-to-treat principle does not, however, reduce bias associated with loss to follow-up.

Section 2 What Are the Results?

How Large Was the Treatment Effect?

Most frequently, RCTs monitor dichotomous outcomes (e.g., "yes" or "no" classifications for cancer recurrence, myocardial infarction, or death). Patients either have such an event or they do not, and the article reports the proportion of patients who develop such events. Consider, for example, a study in which 20% of a control group died but only 15% of those receiving a new treatment died. How might one express these results?

One possibility is the absolute difference (known as the absolute risk reduction [ARR] or risk difference) between the proportion who died in the control group (control group risk [CGR]) and the proportion who died in the experimental group (experimental group risk [EGR]), or CGR – EGR = 0.20 – 0.15 = 0.05. Another way to express the impact of treatment is as the RR: the risk of events among patients receiving the new treatment relative to that risk among patients in the control group, or EGR/CGR = 0.15/0.20 = 0.75.

The most commonly reported measure of dichotomous treatment effects is the complement of the RR, the RRR. It is expressed as a percentage: 1 – (EGR/CGR) x 100% = (1 – 0.75) x 100% = 25%. An RRR of 25% means that of those who would have died had they been in the control group, 25% will not die if they receive treatment; the greater the RRR, the more effective the therapy. Investigators may compute the RR during a specific period, as in a survival analysis; the relative measure of effect in such a time-to-event analysis is called the hazard ratio. When people do not specify whether they are talking about RRR or ARR – for instance, "Drug X was 30% effective in reducing the risk of death" or "The efficacy of the vaccine was 92%" – they are almost invariably taking about RRR.

How Precise Was the Estimate of the Treatment Effect?

We can never be sure of the true risk reduction; the best estimate of the true treatment effect is what we observe in a well-designed randomized trial. This estimate is called a point estimate to remind us that, although the true value lies somewhere in its neighborhood, it is unlikely to be precisely correct. Investigators often tell us the neighborhood within which the true effect likely lies by calculating CIs, a range of values within which one can be confident the true effect lies.

We usually use the 95% CI. You can consider the 95% CI as defining the range that – assuming the study has low risk of bias – includes the true RRR 95% of the time. The true RRR will generally lie beyond these extremes only 5% of the time, a property of the CI that relates closely to the conventional level of statistical significance of P <0.05.


If a trial randomized 100 patients each to experimental and control groups, and there were 20 deaths in the control group and 15 deaths in the experimental group, the authors would calculate a point estimate for the RRR of 25% [(1-0.15/0.20) x 100 = 25%]. You might guess, however, that the true RRR might be much smaller or much greater than 25%, based on a difference of only 5 deaths. In fact, you might surmise that the treatment might provide no benefit (an RRR of 0%) or might even do harm (a negative RRR). And you would be right; in fact, these results are consistent with both an RRR of -38% and and RRR of nearly 59%. In other words, the 95% CI on this RRR is -38% to 59%, and the trial really has not helped us decide whether or not to offer the new treatment.

If the trial enrolled 1000 patients per group rather than 100 patients per group, and the same event rates were observed as before. There were 200 deaths in the control group and 150 deaths in the experimental group. Again, the point estimate of the RRR is 25%. In this larger trial, you might think that our confidence that the true reduction in risk is close to 25% is much greater. Actually, in the larger trial the 95% CI on the RRR for this set of results is all on the positive side of 0 and runs from 9% to 41%.

These two examples show that the larger the sample size and higher the number of outcome events in a trial, the greater our confidence that the true RRR (or any other measure of effect) is close to what we observed. As one considers values farther and farther from the point estimate, they become less and less likely to represent the truth. By the time one crosses the upper or lower bundaries of the 95% CI, the values are unlikely to represnet the true RRR. All of this assumes the study is at low risk of bias.

Section 3 How Can I Apply the Results to Patient Care?

Were the Study Patients Similar to the Patient in My Practice?

If the patient before you would have qualified for enrollment in the study, you can apply the results with considerable confidence or consider the results generalizable. Often, your patient has different attributes or characteristics from those enrolled in the trial and would not have met a study's eligibility criteria. Patients may be older or younger, may be sicker or less sick, or may have comorbid disease that would have excluded them from participation in the study.

A study result probably applies even if, for example, adult patients are 2 years too old for enrollment in the study, had more severe disease, had previously been treated with a competing therapy, or had a comorbid condition. A better approach than rigidly applying the study inclusion and exclusion criteria is to ask whether there is some compelling reason why the results do not apply to the patient. You usually will not find a compelling reason, in which case you can generalize the results to your patient with confidence.

A related issue has to do with the extent to which we can generalize findings from a study using a particular drug to another closely (or not so closely) related agent. The issue of drug class effects and how conservative one should be in assuming class effects remains controversial. Generalizing findings of surgical treatment may be even riskier. Randomized trials of carotid endarterectomy, for instance, demonstrate much lower perioperative rates of stroke and death than one might expect in one's own community, which may reflect on either the patients or surgeons (and their relative expertise) selected to participate in randomized trials.

A final issue arises when a patient fits the features of a subgroup of patients in the trial report. We encourage you to be skeptical of subgroup analyses. The treatment is likely to benefit the subgroup more or less than the other patients only if the difference in the effects of treatment in the subgroups is large and unlikely to occur by chance. Even when these conditions apply, the results may be misleading, particularly when investigators did not specify their hypotheses before the study began, if they had a large number of hypotheses, or if other studies fail to replicate the finding.

Were All Patient-Important Outcomes Considered?

Treatments are indicated when they provide important benefits. Demonstrating that  a bronchodilator produce small increments in forced expiratory volume in patients with chronic airflow limitation, that a vasodilator improves cardiac output in heart failure patients, or that a lipid-lowering agent improves lipid profiles does not provide sufficient justification for administering these drugs. In these instances, investigators have chosen substitute outcomes or surrogate outcomes rather than those that patients would consider important. What clinicians and patients require is evidence that treatments improve outcomes that are important to patients, such as reducing shortness of breath during the activities required for daily living, avoiding hospitalization for heart failure, or decreasing the risk of a major stroke.

Substitute/Surrogate Outcomes

Trial of the impact of antiarrhythmic drugs after myocardial infarction illustrate the danger of using substitute outcomes or end points. Because abnormal ventricular depolarizations were associated with a high risk of death and antiarrhythmic drugs demonstrated a reduction in abnormal ventricular depolarizations (the substitute end point), it made sense that they should reduce death. A group of investigators, performed randomized trials on 3 agents (encainide, flecainide, and moricizine) that were previously found to be effective in suppressing the substitute end point of abnormal ventricular depolarizations. The investigators had to stop the trials when they discovered that mortality was substantially higher in patients receiving antiarrhythmic treatment than in those receiving placebo. Clinicians replying on the substitue end point of arrhythmia suppression would have continued to administer the 3 drugs, to the considerable detriment of their patients.

Even when investigators report favorable effects of treatment on a patient-important outcome, you must consider whether there may be deleterious effects on other outcomes. For instance, cancer chemotherapy may lengthen life but decrease its quality. Randomized trials often fail to adequately document the toxicity or adverse effects of the experimental intervention.

Composite End Points

Composite end points represent a final dangerous trend in presenting outcomes. Like surrogate outcomes, composite end points are attractive for reducing sample size and decreasing length of follow-up. Unfortunately, they can mislead. For example, a trial that reduced a composite outcome of death, nonfatal myocardial infarction, and admission for an acute coronary syndrome actually demonstrated a trend toward increased mortality with the experimental therapy and covincing effects only on admission for an acute coronary syndrome. The composite outcome would most strongly reflect the treatment effect of the most common of the components, admission for an acute coronary syndrome, even though there is no convincing evidence the treatment reduces the risk of death or myocardial infarction.

Another long-neglected outcome is the resource implications of alternative management strategies. Health care systems face increasing resource constraints the mandate careful attention to economic analysis.

PS: Substitute/surrogate end points

In clinical trials, a surrogate endpoint (or marker) is a measure of effect of a specific treatment that may correlate with a real clinical endpoint but does not necessarily have a guaranteed relationship. The National Institutes of Health(USA) defines surrogate endpoint as "a biomarker intended to substitute for a clinical endpoint".[1][2]

Surrogate markers are used when the primary endpoint is undesired (e.g., death), or when the number of events is very small, thus making it impractical to conduct a clinical trial to gather a statistically significant number of endpoints. The FDA and other regulatory agencies will often accept evidence from clinical trials that show a direct clinical benefit to surrogate markers. [3]

A surrogate endpoint of a clinical trial is a laboratory measurement or a physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions or survives. Changes induced by a therapy on a surrogate endpoint are expected to reflect changes in a clinically meaningful endpoint. [6]

A commonly used example is cholesterol. While elevated cholesterol levels increase the likelihood for heart disease, the relationship is not linear – many people with normal cholesterol develop heart disease, and many with high cholesterol do not. "Death from heart disease" is the endpoint of interest, but "cholesterol" is the surrogate marker. A clinical trial may show that a particular drug (for example, simvastatin (Zocor)) is effective in reducing cholesterol, without showing directly that simvastatin prevents death.

Are the Likely Treatment Benefits Worth the Potential Harm and Costs?

If the results of a study apply to your patient and the outcomes are important to your patient, the next question concerns whether the probable treatment benefits are worth the associated risks, burdern, and resource requirements. A 25% reduction in the RR of death may sound impressive, but its impact on your patient may nevertheless be minimal. This notion is illustrated by using a concept called number needed to treat (NNT), the number of patients who must receive an intervention fo therapy during a specific period to prevent 1 adverse outcome or produce 1 positive outcome. See here for how to calcuate NNT: http://forum.tomhsiung.com/pharmacy-practice/pharmacy-informatics-and-drug-information/424-evidence-based-medicine-what-is-number-needed-to-treat-and-number-needed-to-harm.html

The impact of a treatment is related not only to its RRR but also to the risk of the adverse outcome it is designed to prevent. One large trial in myocardial infarction suggests that clopidogrel in addition to aspirin reduces the RR of death from a cardiovascular cause, nonfatal myocardial infarction, or stroke by approximately 20% in comparison to aspirin alone. Table 6-3 considers 2 patients presenting with acute myocardial infarction without elevation of ST segments on their electrocardiograms. Compared with aspirin alone, both patients have a RRR of approximately 20%, but the ARR is quite different between the two patients, which results in a siginifant different NNT.Screen Shot 2016-02-22 at 7.59.31 PM

A key element of the decision to start therapy, therefore, is to consider the patient's risk of the event if left untreated. For any given RRR, the higher the probability that a patient will experience an adverse outcome if we do not treat, the more likely the patient will benefit from treatment and the fewer such patients we need to treat to prevent 1 adverse outcome. Knowing the NNT assists clinicians in helping patients weigh the benefits and downsides associated with their management options. What if the siutation changes to the other end (Treatment usually will induces harm compared with control [adverse event is the nature of drugs], in this example, the harm is the increased risk of bleeding)? The answer is, for any given RRI (relative risk increasing), the higher the probability that a patient will experience an adverse outcome if we treat, the more likely the patient will get harm from treatment and the fewer such patients we need to treat to cause 1 adverse outcome.

Trading off benefits and risk also requires an accurate assessment of the adverse effects of treatment. Randomized trials with relatively small sample sizes are unsuitable for detecting rare but catastrophic adverse effects of therapy. Clinicians often must look to other sources of information – often characterized by higher risk of bias – to obtain an estimate of the adverse effects of therapy.

When determining the optimal treatment choice based on the relative benefits and harms of a therapy, the values and preferences of each individual patient must be considered. How best to communicate information to patients and how to incorporate their values into clinical decision making remain areas of active investigation in evidence-based medicine.

(The End)