## The Logic Behind Meta-analysis – Random-effects Model The fixed model starts with the assumption that true effect size is the same in all studies. However, in many systematic reviews this assumption is implausible. When we decide to incorporate a group of studies in a meta-analysis, we assume that the studies have enough in common that it makes sense to synthesize the information, but there is generally no reason to assume that they are identical in the sense that the true effect size is exactly the same in all the studies. For example, suppose that we are working with studies that compare the proportion of patients developing a disease in two groups (vaccinated versus placebo). If the treatment works we would expect the effect size (say, the risk ratio) to be similar but not identical across studies. The effect size might be higher (or lower) when the participants are older, or more educated, or healthier than others, or when a more intensive variant of an intervention is used, and so on. Because studies will differ in the mixes of participants and in the implementations of interventions, among other reasons, there maybe different effect sizes underlying different studies.

Or suppose that we are working with studies that assess the impact of an educational intervention. The magnitude of the impact might vary depending on the other resources available to the children, the class size, the age, and other factors, which are likely to vary from study to study. We might not have assessed these covariates in each study. Indeed, we might not even know what covariates actually are related to the size of the effect. Nevertheless, logic dictates that such factors do exist and will lead to variations in the magnitude of the effect.

One way to address this variation across studies is to perform a random-effects meta-analysis. In a random-effects meta-analysis we usually assume that the true effects are normally distributed. For example, in Figure 12.1 the mean of all true effect sizes is 0.60 but the individual effect sizes are distributed about this mean, as indicated by the normal curve. The width of the curve suggests that most of the true effects fall in the range of 0.50 to 0.70. Suppose that our meta-analysis includes three studies drawn from the distribution of studies depicted by the normal curve, and that the true effects in these studies happen to be 0.50, 0.55, and 0.65. If each study had an infinite sample size the sampling error would be zero and the observed effect for each study would be the same as the true effect for that study. If we were to plot the observed effects rather than the true effects, the observed effects would exactly coincide with the true effects.

Of course, the sample size in any study is not infinite and therefore the sampling error is not zero. If the true effect size for a study is 𝜗i, then the observed effect for that study will be less than or greater than 𝜗i, because of sampling error. This figure also highlights the fact that the distance between the overall mean and the observed effect in any given study consists of two distinct parts: true variation in effect sizes (𝜁i) and sampling error (𝜀i). More generally, the observed effect Yi for any study is given by the grand mean, the deviation of the study’s true effect from the grand mean, and the deviation of the study’s observed effect from the study’s true effect. That is, Therefore, to predict how far the observed effect Yi is likely to fall from 𝜇 in any given study we need to consider both the variance of 𝜁i and the variance of 𝜀i. The distance from 𝜇 to each 𝜗i depends on the standard deviation of the distribution of the true effects across studies, called 𝜏 (or 𝜏2 for its variance). The same value of 𝜏2 applies to all studies in the meta-analysis, and in Figure 12.4 is represented by the normal curve at the bottom, which extends roughly from 0.50 to 0.70. The distance from 𝜗i to Yi depends on the sampling distribution of the sample effects about 𝜗i. This depends on the variance of the observed effect size from each study, VYi, and so will vary from one study to the next. In Figure 12.4 the curve for Study 1 is relatively wide while the curve for Study 2 is relatively narrow.

Performing A Random-Effects Meta-Analysis In an actual meta-analysis, of course, rather than start with the population effect and make projections about the observed effects, we start with the observed effects and try to estimate the population effect. In other words our goal is to use the collection of Yi to estimate the overall mean, 𝜇. In order to obtain the most precise estimate of the overall mean (to minimize the variance) we compute a weight mean, where the weight assigned to each study is the inverse of that study’s variance. To compute a study’s variance under the random-effects model, we need to know both the within-study variance and 𝜏2, since the study’s total variance is the sum of these two values.

The parameter 𝜏2 (tau-squared) is the between-studies variance (the variance of the effect size parameters across the population of studies). In other words, if we somehow knew the true effect size for each study, and computed the variance of these effect sizes (across an infinite number of studies), this variance would be 𝜏2. One method for estimating 𝜏2 is the method of moments (or the DerSimonian and Laird) method, as follows. where where k is the number of studies, and In the fixed-effect analysis each study was weighted by the inverse of its variance. In the random-effects analysis, each study will be weighted by the inverse of its variance. The difference is that the variance now includes the original (within-studies) variance plus the estimate of the between-studies variance, T2. To highlight the parallel between the formulas here (random effects) and those in the previous threads (fixed effect) we use the same notations but add an asterisk (*) to represent the random-effects version. Under the random-effects model the weight assigned to each study is where Vyi(*) is the within-study variance for study I plus the between-studies variance, T2. That is, The weight mean, M(*), is then computed as that is, the sum of the products (effect size multiplied by weight) divided by the sum of the weights.

The variance of the summary effect is estimated as the reciprocal of the sum of the weights, or and the estimated standard error of the summary effect is then the square root of the variance, Summary

• Under the random-effects model, the true effects in the studies are assumed to have been sampled from a distribution of true effects.
• The summary effect is our estimate of the mean of all relevant true effects, and the null hypothesis is that the mean of these effects is 0.0 (equivalent to a ratio fo 1.0 for ratio measures).
• Since our goal is to estimate the mean of the distribution, we need to take account of two sources of variance. First, there is within-study error in estimating the effect in each study. Second (even if we knew the true mean for each of our studies), there is variation in the true effects across studies. Study weights are assigned with the goal of minimizing both sources of variance.

## The Logic Behind Meta-analysis – Fixed-ffect Model Effect Size (Based on Means)

When the studies report means and standard deviations (more precisely, the sample standard error of the mean), the preferred effect size is usually the raw mean difference, the standardized mean difference mean difference, or the response ratio. When the outcome is reported on a meaningful scale and all studies in the analysis use the same scale, the meta-analysis can be performed directly on the raw data.

Consider a study that reports means for two groups and (Treated and Control) and suppose we wish to compare the means of these two groups, the population mean difference (effect size) is defined as

Population mean difference = 𝜇1 – 𝜇2

Population standard error of mean difference (pooled) = Spooled*(Square Root of [1/n1 + 1/n2])

Overview

Most meta-analyses are based on one of two statistical models, the fixed-effect model or the random-effects model. Under the fixed-effect model we assume that there is one true effect size (hence the term fixed effect) which underlies all the studies in the analysis, and that all differences in observed effects are due to sampling error. While we follow the practice of calling this a fixed-effect model, a more descriptive term would be a common-effect model.

By contrast, under the random-effects model we allow that the true effect could vary from study to study. For example, the effect size might be higher (or lower) in studies where the participants are older, or more educated, or healthier than in others, or when a more intensive variant of an intervention is used, and so on. Because studies will differ in the mixes of participants and in the implementations of interventions, among other reasons, there may be different effect sizes underlying different studies.

Since all studies share the same true effect, it follows that the observed effect size varies from one study to the next only because of the random error inherent in each study. If each study had an infinite sample size the sampling error would be zero and the observed effect for each study would be the same as the true effect. If we were to plot the observed effects rather than the true effects, the observed effects would exactly coincide with the true effects.

In practice, of course, the sample size in each study in not infinite, and so there is sampling error and the effect observed in the study is not the same as the true effect. In Figure 11.2 the true effect for each study is still 0.60 but the observed effect differs from one study to the next.

While the error in any given study is random, we can estimate the sampling distribution of the errors. In Figure 11.3 we have placed a normal curve about the true effect size for each study, with the width of the curve being based on the variance in that study. In Study 1 the sample size was small, the variance large, and the observed effect is likely to fall anywhere in the relatively wide range of 0.20 to 1.00. By contrast, in Study 2 the sample size was relative large, the variance is small, and the observed effect is likely to fall in the relatively narrow range of 0.40 to 0.80. Note that the width of the normal curve is based on the square root of the variance, or standard error. Meta-analysis Procedure

In an actual meta-analysis, of course, rather than starting with the population effect and making projections about the observed effects, we work backwards, starting with the observed effects and trying to estimate the population effect. In order to obtain the most precise estimate of the population effect (to minimize the variance) we compute a weighted mean, where the weight assigned to each study is the inverse of that study’s variance. Concretely, the weight assigned to each study in a fixed-effect meta-analysis is Where VYi is the within-study variance for study (i). The weighted mean (M) is then computed as That is, the sum of the products WiYi (effect size multiplied by weight) divided by the sum of the weights.

The variance of the summary effect is estimated as the reciprocal of the sum the weights, or Once VM is estimated, the standard deviation of the weighted mean (or, standard error of the weighted mean) is computed as the square root of the variance of the summary effect. Now we know the distribution, the point estimation, and the standard deviation, of the weight mean. Thus, the confidence interval of the summary effect could be computed by the confidence interval Z-procedure.

Effect Sizes Measurements

Raw Mean Difference

When the studies report means and standard deviations (continuous variables), the preferred effect size is usually the raw mean difference, the standard mean difference (SMD), or the response ratio. When the outcome is reported on a meaningful scale and all studies in the analysis use the same scale, the meta-analysis can be performed directly on the raw difference in means, or the raw mean difference. The primary advantage of the raw mean difference is that it is intuitively meaningful, either inherently or because of widespread use. Examples of raw mean difference include systolic blood pressure (mm Hg), serum LDL-C level (mg/dL), body surface area (m2), and so on.

We can estimate the mean difference D from a study that used two independent groups revealed by the inference procedure for two population means (independent samples). Let’s recall a little for the inference procedure for two population means. The sampling distribution of the difference between two sample meets these characteristics: PS: All is based on the central limit theorem – if the sample size is large, the mean is approximately normally distributed, regardless of the distribution of the variable under consideration.

Once we know the sample mean difference, D, the standard deviation of the mean difference (or the standard error), and in the light of the central limit theorem, we could compute the variance of D. In addition to know the group mean, the standard deviation of group mean, and the group size, we also could compute the pooled sample standard deviation (Sp) or the nonpooled method. Therefore, we would have the value of variance of D, which will be used by meta-analysis procedures (fixed-effect, or random-effects model) to compute the weight (Wi = 1 / VYi). And once the standard error is known, the synthesized confidence interval could be computed.

Standardized Mean Difference, d and g

As noted, the raw mean difference is a useful index when the measure is meaningful, either inherently or because of widespread use. By contrast, when the measure is less well known, the use of a raw mean difference has less to recommend it. In any event, the raw mean difference is an option only if all the studies in the meta-analysis use the same scale. If different studies use different instruments to assess the outcome, then the scale of measurement will differ from study to study and it would not be meaningful to combine raw mean differences.

In such cases we can divide the mean difference in each study by that study’s standard deviation to create an index (the standard mean difference, SMD) that would be comparable across studies. This is the same approach suggested by Cohen in connection with describing the magnitude of effects in statistical power analysis. The standard mean difference can be considered as being comparable across studies based on either of two arguments (Hedges and Olkin, 1985). If the outcome measures in all studies are linear transformations of each other, the standardized mean difference can be seen as the mean difference that would have been obtained if all data were transformed to a scale where the standard deviation within-groups was equal to 1.0.

The other argument for comparability of standardized mean differences is the fact that the standardized mean difference is a measure of overlap between distributions. In this telling, the standardized mean difference reflects the difference between the distributions in the two groups (and how each represents a distinct cluster of scores) even if they do not measure exactly the same outcome.

Computing d and g from studies that use independent groups

We can estimate the standardized mean difference from studies that used two independent groups as where Swithin is the pooled standard deviation across groups. And n1 and n2 are the sample sizes in the two groups, S1 and S2 are the standard deviations in the two groups. The reason that we pool the two sample estimates of the standard deviation is that even if we assume that the underlying population standard deviations are the same, it is unlikely that the sample estimates S1 and S2 will be identical. By pooling the two estimates of the standard deviation, we obtain a more accurate estimate of their common value.

The sample estimate of the standardized mean difference is often called Cohen’s d in research synthesis. Some confusion about the terminology has resulted from the fact that the index 𝛿, originally proposed by Cohen as a population parameter for describing the size of effects for statistical power analysis is also sometimes called d. The variance of d is given by, Again, with the standard mean difference and variance of the standard mean difference known, we could compute the confidence interval of the standard mean difference. However, it turns out that d has a slight bias, tending to overestimate the absolute value of 𝛿 in small samples. This bias can be removed by a simple correction that yields an unbiased estimate of 𝛿, with the unbiased estimate sometimes called Hedges’ g (Hedges, 1981). To convert from d to Hedges’ g we use a correction factor, which is called J. Hedges (1981) gives the exact formula for J, but in common practice researchers use an approximation,  Summary

• Under the fixed-effect model all studies in the analysis share a common true effect.
• The summary effect is our estimate of this common effect size, and the null hypothesis is that this common effect is zero (for a difference) or one (for a ratio).
• All observed dispersion reflects sampling error, and study weights are assigned with the goal of minimizing this within-study error. Converting Among Effect Sizes

Despite that widespread used outcome measures would be across studies under investigation, it is not uncommon that the outcome measures among individual studies are different. When we convert between different measures we make certain assumptions about the nature of the underlying traits or effects. Even if these assumptions do not hold exactly, the decision to use these conversions is often better than the alternative, which is to simply omit the studies that happened to use an alternate metric. This would involve loss of information, and possibly the systematic loss of information, resulting in a biased sample of studies. A sensitivity analysis to compare the meta-analysis results with and without the converted studies would be important. Figure 7.1 outlines the mechanism for incorporating multiple kinds of data in the same meta-analysis. First, each study is used to compute an effect size and variance of native index, the log odds ratio for binary data, d for continuous data, and r for correlational data. Then, we convert all of these indices to a common index, which  would be either the log odds ratio, d, or r. If the final index is d, we can move from there to Hedges’ g. This common index and its variance are then used in the analysis.

We can convert from a log odds ratio to the standardized mean difference d using where 𝜋 is the mathematical constant. The variance of d would then be where VlogOddsRatio is the variance of the log odds ratio. This method was originally proposed by Hasselblad and Hedges (1995) but variations have been proposed. It assumes that an underlying continuous trait exists and has a logistic distribution (which is similar to a normal distribution) in each group. In practice, it will be difficult to test this assumption.