Section 1 How Serious Is The Risk of Bias?
Did Intervention and Control Groups Start With The Same Prognosis?
Consider the question of whether hospital care prolongs life. A study finds that more sick people die in the hospital than in the community. We would easily reject the naive conclusion that hospital care kills people because we recognize that hospitalized patients are sicker (worse prognosis) than patients in the community. Although the logic of prognostic balance is vividly clear in comparing hospitalized patients with those in the community, it may be less obvious in other contexts.
Were Patients Randomized?
The purpose of randomization is to create groups whose prognosis, with respect to the target outcomes, is similar. The reason that studies in which patient or physician preference determines whether a patient receives treatment or control (observational studies) often yield misleading results is that morbidity and mortality result from many causes. Treatment studies attempt to determine the impact of an intervention on events such as stroke, myocardial infarction, and death – occurrences that we call the trial's target outcomes. A patient's age, the underlying severity of illness, the presence of comorbidity, and a host of other factors typically determine the frequency with which a trial's target outcome occurs (prognostic factors or determinants of outcome). If prognostic factors – either those we know about or those we do not know about – prove unbalanced between a trial's treatment and control groups, the study's outcome will be biased, either underestimating or overestimating the treatment's effect. Because known prognostic factors often influence clinicians' recommendations and patients' decisions about taking treatment, observational studies often yield biased results that may get the magnitude or even the direction of the effect wrong.
Observational studies can theoretically match patients, either in the selection of patients for study or in the subsequent statistical analysis, for known prognostic factors. However, not all prognostic factors are easily measured or characterized, and in many diseases only a few are known. Therefore, even the most careful patient selection and statistical methods are unable to completely address the bias in the estimated treatment effect. The power of randomization is that treatment and control groups are more likely to have a balanced distribution of know and unknown prognostic factors. However, although randomization is a powerful technique, it does not always succeed in creating groups with similar prognosis. Investigators may make mistakes that compromise randomization, or randomization may fail because of chance – unlikely events sometimes happen.
Was Randomization Concealed?
When those enrolling patients are unware and cannot control the arm to which the patient is allocated, we refer to randomization as concealed. In unconcealed trials, those responsible for recruitment may systematically enroll sicker – or less sick – patients to either a treatment or control group. This behavior will compromise the purpose of randomization, and the study will yield a biased result (imbalance in prognosis).
Were Patients in the Treatment and Control Groups Similar With Respect to Known Prognostic Factors? (The Importance of Sample Size)
The purpose of randomization is to create groups whose prognosis, with respect to the target outcomes, is similar. Some times, through bad luck, randomization will fail to achieve this goal. The smaller the sample size, the more likely the trial will have prognostic imbalance.
Picture a trial testing a new treatment for heart failure that is enrolling patients classified as having New York Heart Association functional class III and class IV heart failure. Patients with class IV heart failure have a much worse prognosis than those with class III heart failure. The trial is small, with only 8 patients. One would not be surprised if all 4 patients with class III heart failure were allocated to the treatment group and all 4 patients with class IV heart failure were allocated to the control group. Such a result of the allocation process would seriously bias the study in favor of the treatment. Were the trial to enroll 800 patients, one would be startled if randomization placed all 400 patients with class III heart failure in the treatment arm. The larger the sample size, the more likely randomization will achieve its goal of prognostic balance.
The smaller the sample size, the more likely the trial will have prognostic imbalance. We can check how effectively randomization has balanced known prognostic factors by looking for a display of patient characteristics of the treatment and control groups at the study's commencement – the baseline or entry prognostic features. Although we will never know whether similarity exists for the unknown prognostic factors, we are reasssured when the known prognostic factors are well balanced. All is not lost if the treatment groups are not similar at baseline. Statistical techniques permit adjustment of the study result for baseline differences. When both adjusted analyses and unadjusted analyses generate the same conclusion, clinicians gain confidence that the risk of bias is not excessive.
Was Prognostic Balance Maintained as the Study Progressed?
To What Extent Was the Study Blinded?
If randomization succeeds, treatment and control groups begin wtih a similar prognosis. Randomization, however, provides no guarantees that the 2 groups will remain prognostically balanced. Blinding is the optimal strategy for maintaining prognostic balance. There are five groups that should, if possible, be blind to treatment assignment, including:
- Patients – to avoid placebo effects
- Clinicians – to prevent differential administration of therapies that affect the outcome of interest (cointervention)
- Data collectors – to prevent bias in data collection
- Adjudicators of outcome – to prevent bias in decisions about whether or not a patient has had an outcome of interest
- Data analysts – to avoid bias in decisions regarding data analysis
These 5 groups involved in clinical trials will remain unware of whether patients are receiving the experimental therapy or control therapy.
Were the Groups Prognostically Balanced at the Study's Completion?
It is possible for investigators to effectively conceal and blind treatment assignment and still fail to achieve an unbiased result.
Was Follow-up Complete?
Ideally, at the conclusion of a trial, investigators will know the status of each patient with respect to the target outcome. The greater the number of patients whose outcome is unknown – patients lost of follow-up – the more a study is potentially compromised. The reason is that patients who are retained – they may disappear because they have adverse outcomes or because they are doing well and so did not return for assessment. The magnitude of the bias may be substantial. See two examples in Pharmacy Profession Forum at http://forum.tomhsiung.com/pharmacy-practice/clinical-trials/852-example-how-lost-to-follow-up-affect-the-outcome-of-a-rct.html
Loss to follow-up may substantially increase the risk of bias. If assuming a worst-case scenario does not change the inferences arising from study results, then loss to follow-up is unlikely a problem. If such an assumption would significantly alter the results, the extent to which bias is introduced depends on how likely it is that treatment patients lost to follow-up fared badly, whereas control patients lost to follow-up fared well. That decision is a matter of judgement.
Was the Trial Stopped Too Early?
Stopping trial early (i.e., before enrolling the planned sample size) when one sees an apparent large benefit is risky and may compromise randomization. These stopped early trials run the risk of greatly overestimating the treatment effect.
A trial designed with too short a follow-up also may compromise crucial information that adequate length of follow-up would reveal. For example, consider a trial that randomly assigned patients with an abdominal aortic aneurysm to either an open surgical repair or a less invasive, endovascular repair technique. At the end of the 30-day follow-up, mortality was significantly lower in the endovascular technique group. The investigators followed up participants for an additional 2 years and found that there was no difference in mortality between groups after the first year. Had the trial ended earlier, the endovascular technique may have been considered substantially better than the open surgical techinique.
Were Patients Analyzed in the Groups to Which They Were Randomized?
Investigators will undermine the benefits of randomization if they omit from the analysis patients who do not receive their assigned treatment or, worst yet, count events that occur in nonadherent patients who were assigned to treatment against the controll group. Such analyses will bias the results if the reasons for nonadherence are related to prognosis. In a number of randomized trials, patients who did not adhere to their assigned drug regimens fared worse than those who took their medication as instructed, even after taking into account all known prognostic factors. When adherent patients are destined to have a better outcome, omitting those who do not receive assigned treatment undermines the unbiased comparison provided by randomization. Investigators prevent this bias when they follow the intention-to-treat principle and analyze all patients in the group to which they were randomized irrespective of what treatment they actually received. Following the intention-to-treat principle does not, however, reduce bias associated with loss to follow-up.
Section 2 What Are the Results?
How Large Was the Treatment Effect?
Most frequently, RCTs monitor dichotomous outcomes (e.g., "yes" or "no" classifications for cancer recurrence, myocardial infarction, or death). Patients either have such an event or they do not, and the article reports the proportion of patients who develop such events. Consider, for example, a study in which 20% of a control group died but only 15% of those receiving a new treatment died. How might one express these results?
One possibility is the absolute difference (known as the absolute risk reduction [ARR] or risk difference) between the proportion who died in the control group (control group risk [CGR]) and the proportion who died in the experimental group (experimental group risk [EGR]), or CGR – EGR = 0.20 – 0.15 = 0.05. Another way to express the impact of treatment is as the RR: the risk of events among patients receiving the new treatment relative to that risk among patients in the control group, or EGR/CGR = 0.15/0.20 = 0.75.
The most commonly reported measure of dichotomous treatment effects is the complement of the RR, the RRR. It is expressed as a percentage: 1 – (EGR/CGR) x 100% = (1 – 0.75) x 100% = 25%. An RRR of 25% means that of those who would have died had they been in the control group, 25% will not die if they receive treatment; the greater the RRR, the more effective the therapy. Investigators may compute the RR during a specific period, as in a survival analysis; the relative measure of effect in such a time-to-event analysis is called the hazard ratio. When people do not specify whether they are talking about RRR or ARR – for instance, "Drug X was 30% effective in reducing the risk of death" or "The efficacy of the vaccine was 92%" – they are almost invariably taking about RRR.
How Precise Was the Estimate of the Treatment Effect?
We can never be sure of the true risk reduction; the best estimate of the true treatment effect is what we observe in a well-designed randomized trial. This estimate is called a point estimate to remind us that, although the true value lies somewhere in its neighborhood, it is unlikely to be precisely correct. Investigators often tell us the neighborhood within which the true effect likely lies by calculating CIs, a range of values within which one can be confident the true effect lies.
We usually use the 95% CI. You can consider the 95% CI as defining the range that – assuming the study has low risk of bias – includes the true RRR 95% of the time. The true RRR will generally lie beyond these extremes only 5% of the time, a property of the CI that relates closely to the conventional level of statistical significance of P <0.05.
If a trial randomized 100 patients each to experimental and control groups, and there were 20 deaths in the control group and 15 deaths in the experimental group, the authors would calculate a point estimate for the RRR of 25% [(1-0.15/0.20) x 100 = 25%]. You might guess, however, that the true RRR might be much smaller or much greater than 25%, based on a difference of only 5 deaths. In fact, you might surmise that the treatment might provide no benefit (an RRR of 0%) or might even do harm (a negative RRR). And you would be right; in fact, these results are consistent with both an RRR of -38% and and RRR of nearly 59%. In other words, the 95% CI on this RRR is -38% to 59%, and the trial really has not helped us decide whether or not to offer the new treatment.
If the trial enrolled 1000 patients per group rather than 100 patients per group, and the same event rates were observed as before. There were 200 deaths in the control group and 150 deaths in the experimental group. Again, the point estimate of the RRR is 25%. In this larger trial, you might think that our confidence that the true reduction in risk is close to 25% is much greater. Actually, in the larger trial the 95% CI on the RRR for this set of results is all on the positive side of 0 and runs from 9% to 41%.
These two examples show that the larger the sample size and higher the number of outcome events in a trial, the greater our confidence that the true RRR (or any other measure of effect) is close to what we observed. As one considers values farther and farther from the point estimate, they become less and less likely to represent the truth. By the time one crosses the upper or lower bundaries of the 95% CI, the values are unlikely to represnet the true RRR. All of this assumes the study is at low risk of bias.
Section 3 How Can I Apply the Results to Patient Care?
Were the Study Patients Similar to the Patient in My Practice?
If the patient before you would have qualified for enrollment in the study, you can apply the results with considerable confidence or consider the results generalizable. Often, your patient has different attributes or characteristics from those enrolled in the trial and would not have met a study's eligibility criteria. Patients may be older or younger, may be sicker or less sick, or may have comorbid disease that would have excluded them from participation in the study.
A study result probably applies even if, for example, adult patients are 2 years too old for enrollment in the study, had more severe disease, had previously been treated with a competing therapy, or had a comorbid condition. A better approach than rigidly applying the study inclusion and exclusion criteria is to ask whether there is some compelling reason why the results do not apply to the patient. You usually will not find a compelling reason, in which case you can generalize the results to your patient with confidence.
A related issue has to do with the extent to which we can generalize findings from a study using a particular drug to another closely (or not so closely) related agent. The issue of drug class effects and how conservative one should be in assuming class effects remains controversial. Generalizing findings of surgical treatment may be even riskier. Randomized trials of carotid endarterectomy, for instance, demonstrate much lower perioperative rates of stroke and death than one might expect in one's own community, which may reflect on either the patients or surgeons (and their relative expertise) selected to participate in randomized trials.
A final issue arises when a patient fits the features of a subgroup of patients in the trial report. We encourage you to be skeptical of subgroup analyses. The treatment is likely to benefit the subgroup more or less than the other patients only if the difference in the effects of treatment in the subgroups is large and unlikely to occur by chance. Even when these conditions apply, the results may be misleading, particularly when investigators did not specify their hypotheses before the study began, if they had a large number of hypotheses, or if other studies fail to replicate the finding.
Were All Patient-Important Outcomes Considered?
Treatments are indicated when they provide important benefits. Demonstrating that a bronchodilator produce small increments in forced expiratory volume in patients with chronic airflow limitation, that a vasodilator improves cardiac output in heart failure patients, or that a lipid-lowering agent improves lipid profiles does not provide sufficient justification for administering these drugs. In these instances, investigators have chosen substitute outcomes or surrogate outcomes rather than those that patients would consider important. What clinicians and patients require is evidence that treatments improve outcomes that are important to patients, such as reducing shortness of breath during the activities required for daily living, avoiding hospitalization for heart failure, or decreasing the risk of a major stroke.
Trial of the impact of antiarrhythmic drugs after myocardial infarction illustrate the danger of using substitute outcomes or end points. Because abnormal ventricular depolarizations were associated with a high risk of death and antiarrhythmic drugs demonstrated a reduction in abnormal ventricular depolarizations (the substitute end point), it made sense that they should reduce death. A group of investigators, performed randomized trials on 3 agents (encainide, flecainide, and moricizine) that were previously found to be effective in suppressing the substitute end point of abnormal ventricular depolarizations. The investigators had to stop the trials when they discovered that mortality was substantially higher in patients receiving antiarrhythmic treatment than in those receiving placebo. Clinicians replying on the substitue end point of arrhythmia suppression would have continued to administer the 3 drugs, to the considerable detriment of their patients.
Even when investigators report favorable effects of treatment on a patient-important outcome, you must consider whether there may be deleterious effects on other outcomes. For instance, cancer chemotherapy may lengthen life but decrease its quality. Randomized trials often fail to adequately document the toxicity or adverse effects of the experimental intervention.
Composite End Points
Composite end points represent a final dangerous trend in presenting outcomes. Like surrogate outcomes, composite end points are attractive for reducing sample size and decreasing length of follow-up. Unfortunately, they can mislead. For example, a trial that reduced a composite outcome of death, nonfatal myocardial infarction, and admission for an acute coronary syndrome actually demonstrated a trend toward increased mortality with the experimental therapy and covincing effects only on admission for an acute coronary syndrome. The composite outcome would most strongly reflect the treatment effect of the most common of the components, admission for an acute coronary syndrome, even though there is no convincing evidence the treatment reduces the risk of death or myocardial infarction.
Another long-neglected outcome is the resource implications of alternative management strategies. Health care systems face increasing resource constraints the mandate careful attention to economic analysis.
PS: Substitute/surrogate end points:
In clinical trials, a surrogate endpoint (or marker) is a measure of effect of a specific treatment that may correlate with a real clinical endpoint but does not necessarily have a guaranteed relationship. The National Institutes of Health(USA) defines surrogate endpoint as "a biomarker intended to substitute for a clinical endpoint".
Surrogate markers are used when the primary endpoint is undesired (e.g., death), or when the number of events is very small, thus making it impractical to conduct a clinical trial to gather a statistically significant number of endpoints. The FDA and other regulatory agencies will often accept evidence from clinical trials that show a direct clinical benefit to surrogate markers. 
A surrogate endpoint of a clinical trial is a laboratory measurement or a physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions or survives. Changes induced by a therapy on a surrogate endpoint are expected to reflect changes in a clinically meaningful endpoint. 
A commonly used example is cholesterol. While elevated cholesterol levels increase the likelihood for heart disease, the relationship is not linear – many people with normal cholesterol develop heart disease, and many with high cholesterol do not. "Death from heart disease" is the endpoint of interest, but "cholesterol" is the surrogate marker. A clinical trial may show that a particular drug (for example, simvastatin (Zocor)) is effective in reducing cholesterol, without showing directly that simvastatin prevents death.
Are the Likely Treatment Benefits Worth the Potential Harm and Costs?
If the results of a study apply to your patient and the outcomes are important to your patient, the next question concerns whether the probable treatment benefits are worth the associated risks, burdern, and resource requirements. A 25% reduction in the RR of death may sound impressive, but its impact on your patient may nevertheless be minimal. This notion is illustrated by using a concept called number needed to treat (NNT), the number of patients who must receive an intervention fo therapy during a specific period to prevent 1 adverse outcome or produce 1 positive outcome. See here for how to calcuate NNT: http://forum.tomhsiung.com/pharmacy-practice/pharmacy-informatics-and-drug-information/424-evidence-based-medicine-what-is-number-needed-to-treat-and-number-needed-to-harm.html
The impact of a treatment is related not only to its RRR but also to the risk of the adverse outcome it is designed to prevent. One large trial in myocardial infarction suggests that clopidogrel in addition to aspirin reduces the RR of death from a cardiovascular cause, nonfatal myocardial infarction, or stroke by approximately 20% in comparison to aspirin alone. Table 6-3 considers 2 patients presenting with acute myocardial infarction without elevation of ST segments on their electrocardiograms. Compared with aspirin alone, both patients have a RRR of approximately 20%, but the ARR is quite different between the two patients, which results in a siginifant different NNT.
A key element of the decision to start therapy, therefore, is to consider the patient's risk of the event if left untreated. For any given RRR, the higher the probability that a patient will experience an adverse outcome if we do not treat, the more likely the patient will benefit from treatment and the fewer such patients we need to treat to prevent 1 adverse outcome. Knowing the NNT assists clinicians in helping patients weigh the benefits and downsides associated with their management options. What if the siutation changes to the other end (Treatment usually will induces harm compared with control [adverse event is the nature of drugs], in this example, the harm is the increased risk of bleeding)? The answer is, for any given RRI (relative risk increasing), the higher the probability that a patient will experience an adverse outcome if we treat, the more likely the patient will get harm from treatment and the fewer such patients we need to treat to cause 1 adverse outcome.
Trading off benefits and risk also requires an accurate assessment of the adverse effects of treatment. Randomized trials with relatively small sample sizes are unsuitable for detecting rare but catastrophic adverse effects of therapy. Clinicians often must look to other sources of information – often characterized by higher risk of bias – to obtain an estimate of the adverse effects of therapy.
When determining the optimal treatment choice based on the relative benefits and harms of a therapy, the values and preferences of each individual patient must be considered. How best to communicate information to patients and how to incorporate their values into clinical decision making remain areas of active investigation in evidence-based medicine.