## The Logic Behind Meta-analysis – Fixed-ffect Model

Effect Size (Based on Means)

When the studies report means and standard deviations (more precisely, the sample standard error of the mean), the preferred effect size is usually the raw mean difference, the standardized mean difference mean difference, or the response ratio. When the outcome is reported on a meaningful scale and all studies in the analysis use the same scale, the meta-analysis can be performed directly on the raw data.

Consider a study that reports means for two groups and (Treated and Control) and suppose we wish to compare the means of these two groups, the population mean difference (effect size) is defined as

Population mean difference = 𝜇1 – 𝜇2

Population standard error of mean difference (pooled) = Spooled*(Square Root of [1/n1 + 1/n2])

Overview

Most meta-analyses are based on one of two statistical models, the fixed-effect model or the random-effects model. Under the fixed-effect model we assume that there is one true effect size (hence the term fixed effect) which underlies all the studies in the analysis, and that all differences in observed effects are due to sampling error. While we follow the practice of calling this a fixed-effect model, a more descriptive term would be a common-effect model.

By contrast, under the random-effects model we allow that the true effect could vary from study to study. For example, the effect size might be higher (or lower) in studies where the participants are older, or more educated, or healthier than in others, or when a more intensive variant of an intervention is used, and so on. Because studies will differ in the mixes of participants and in the implementations of interventions, among other reasons, there may be different effect sizes underlying different studies.

Since all studies share the same true effect, it follows that the observed effect size varies from one study to the next only because of the random error inherent in each study. If each study had an infinite sample size the sampling error would be zero and the observed effect for each study would be the same as the true effect. If we were to plot the observed effects rather than the true effects, the observed effects would exactly coincide with the true effects.

In practice, of course, the sample size in each study in not infinite, and so there is sampling error and the effect observed in the study is not the same as the true effect. In Figure 11.2 the true effect for each study is still 0.60 but the observed effect differs from one study to the next.

While the error in any given study is random, we can estimate the sampling distribution of the errors. In Figure 11.3 we have placed a normal curve about the true effect size for each study, with the width of the curve being based on the variance in that study. In Study 1 the sample size was small, the variance large, and the observed effect is likely to fall anywhere in the relatively wide range of 0.20 to 1.00. By contrast, in Study 2 the sample size was relative large, the variance is small, and the observed effect is likely to fall in the relatively narrow range of 0.40 to 0.80. Note that the width of the normal curve is based on the square root of the variance, or standard error.

Meta-analysis Procedure

In an actual meta-analysis, of course, rather than starting with the population effect and making projections about the observed effects, we work backwards, starting with the observed effects and trying to estimate the population effect. In order to obtain the most precise estimate of the population effect (to minimize the variance) we compute a weighted mean, where the weight assigned to each study is the inverse of that study’s variance. Concretely, the weight assigned to each study in a fixed-effect meta-analysis is

Where VYi is the within-study variance for study (i). The weighted mean (M) is then computed as

That is, the sum of the products WiYi (effect size multiplied by weight) divided by the sum of the weights.

The variance of the summary effect is estimated as the reciprocal of the sum the weights, or

Once VM is estimated, the standard deviation of the weighted mean (or, standard error of the weighted mean) is computed as the square root of the variance of the summary effect. Now we know the distribution, the point estimation, and the standard deviation, of the weight mean. Thus, the confidence interval of the summary effect could be computed by the confidence interval Z-procedure.

Effect Sizes Measurements

Raw Mean Difference

When the studies report means and standard deviations (continuous variables), the preferred effect size is usually the raw mean difference, the standard mean difference (SMD), or the response ratio. When the outcome is reported on a meaningful scale and all studies in the analysis use the same scale, the meta-analysis can be performed directly on the raw difference in means, or the raw mean difference. The primary advantage of the raw mean difference is that it is intuitively meaningful, either inherently or because of widespread use. Examples of raw mean difference include systolic blood pressure (mm Hg), serum LDL-C level (mg/dL), body surface area (m2), and so on.

We can estimate the mean difference D from a study that used two independent groups revealed by the inference procedure for two population means (independent samples). Let’s recall a little for the inference procedure for two population means. The sampling distribution of the difference between two sample meets these characteristics:

PS: All is based on the central limit theorem – if the sample size is large, the mean is approximately normally distributed, regardless of the distribution of the variable under consideration.

Once we know the sample mean difference, D, the standard deviation of the mean difference (or the standard error), and in the light of the central limit theorem, we could compute the variance of D. In addition to know the group mean, the standard deviation of group mean, and the group size, we also could compute the pooled sample standard deviation (Sp) or the nonpooled method. Therefore, we would have the value of variance of D, which will be used by meta-analysis procedures (fixed-effect, or random-effects model) to compute the weight (Wi = 1 / VYi). And once the standard error is known, the synthesized confidence interval could be computed.

Standardized Mean Difference, d and g

As noted, the raw mean difference is a useful index when the measure is meaningful, either inherently or because of widespread use. By contrast, when the measure is less well known, the use of a raw mean difference has less to recommend it. In any event, the raw mean difference is an option only if all the studies in the meta-analysis use the same scale. If different studies use different instruments to assess the outcome, then the scale of measurement will differ from study to study and it would not be meaningful to combine raw mean differences.

In such cases we can divide the mean difference in each study by that study’s standard deviation to create an index (the standard mean difference, SMD) that would be comparable across studies. This is the same approach suggested by Cohen in connection with describing the magnitude of effects in statistical power analysis. The standard mean difference can be considered as being comparable across studies based on either of two arguments (Hedges and Olkin, 1985). If the outcome measures in all studies are linear transformations of each other, the standardized mean difference can be seen as the mean difference that would have been obtained if all data were transformed to a scale where the standard deviation within-groups was equal to 1.0.

The other argument for comparability of standardized mean differences is the fact that the standardized mean difference is a measure of overlap between distributions. In this telling, the standardized mean difference reflects the difference between the distributions in the two groups (and how each represents a distinct cluster of scores) even if they do not measure exactly the same outcome.

Computing d and g from studies that use independent groups

We can estimate the standardized mean difference from studies that used two independent groups as

where Swithin is the pooled standard deviation across groups. And n1 and n2 are the sample sizes in the two groups, S1 and S2 are the standard deviations in the two groups. The reason that we pool the two sample estimates of the standard deviation is that even if we assume that the underlying population standard deviations are the same, it is unlikely that the sample estimates S1 and S2 will be identical. By pooling the two estimates of the standard deviation, we obtain a more accurate estimate of their common value.

The sample estimate of the standardized mean difference is often called Cohen’s d in research synthesis. Some confusion about the terminology has resulted from the fact that the index 𝛿, originally proposed by Cohen as a population parameter for describing the size of effects for statistical power analysis is also sometimes called d. The variance of d is given by,

Again, with the standard mean difference and variance of the standard mean difference known, we could compute the confidence interval of the standard mean difference. However, it turns out that d has a slight bias, tending to overestimate the absolute value of 𝛿 in small samples. This bias can be removed by a simple correction that yields an unbiased estimate of 𝛿, with the unbiased estimate sometimes called Hedges’ g (Hedges, 1981). To convert from d to Hedges’ g we use a correction factor, which is called J. Hedges (1981) gives the exact formula for J, but in common practice researchers use an approximation,

Summary

• Under the fixed-effect model all studies in the analysis share a common true effect.
• The summary effect is our estimate of this common effect size, and the null hypothesis is that this common effect is zero (for a difference) or one (for a ratio).
• All observed dispersion reflects sampling error, and study weights are assigned with the goal of minimizing this within-study error.

Converting Among Effect Sizes

Despite that widespread used outcome measures would be across studies under investigation, it is not uncommon that the outcome measures among individual studies are different. When we convert between different measures we make certain assumptions about the nature of the underlying traits or effects. Even if these assumptions do not hold exactly, the decision to use these conversions is often better than the alternative, which is to simply omit the studies that happened to use an alternate metric. This would involve loss of information, and possibly the systematic loss of information, resulting in a biased sample of studies. A sensitivity analysis to compare the meta-analysis results with and without the converted studies would be important. Figure 7.1 outlines the mechanism for incorporating multiple kinds of data in the same meta-analysis. First, each study is used to compute an effect size and variance of native index, the log odds ratio for binary data, d for continuous data, and r for correlational data. Then, we convert all of these indices to a common index, which  would be either the log odds ratio, d, or r. If the final index is d, we can move from there to Hedges’ g. This common index and its variance are then used in the analysis.

We can convert from a log odds ratio to the standardized mean difference d using

where 𝜋 is the mathematical constant. The variance of d would then be

where VlogOddsRatio is the variance of the log odds ratio. This method was originally proposed by Hasselblad and Hedges (1995) but variations have been proposed. It assumes that an underlying continuous trait exists and has a logistic distribution (which is similar to a normal distribution) in each group. In practice, it will be difficult to test this assumption.

## Basic Concepts in Statistics

Three Kinds of Data

There are three types of data, including interval data where some variables are measured on a scale with constant intervals, nominal/categorical data, and ordinal data. For interval data, the absolute difference between two values can always be determined by subtraction. Interval variables include some such as tempterature, blood pressure, height, weight, and so on. There are other data, such as gender, state of birth, or whether or not a person has a certain disease, that are not measured on an interval scale. These variables are examples of nominal or categorical data, where individuals are classified into two or more mutually exclusive and exhaustive categories. For example, people could be categorised as male or female, dead or alive, or as being born in one of the 50 states, District of Columbia, or outside the United States. In every case, it is possible to categorise each individual into one and only one category. In addition, there is no arithmetic relationship or even ordering between the categories. Ordinal data fall between interval and nominal data. Like nominal data, ordinal data fall into categories, but there is an inherent ordering (or ranking) of the categories. Level of health (excellent, very good, good, fair, or poor) is a common example of a variable measured on an ordinal scale. The different values have a natural order, but the differences or “distances” between adjoining values on an ordinal scale are not necessarily the same and may not even be comparable. For example, excellent health is better than very good health, but this difference is not necessarily the same as the difference between fair and poor health. Indeed, these difference may not even be strictly comparable.

The Normal Distribution

If the observed measurement is the sum of many independent small random factors, the resulting measurements will take on values that are distributed in normal/Gaussian distribution. Note that the distribution is completely defined by the population mean μ and population standard deviation σ. The μ and σ, and the size of population are all the information one needs to describe the population fully if the distribution of values follows a normal distribution.

Getting The Data

We can get the data by examine every single member of the population, however, usually it is physically or fiscally impossible to do this, and we are limited to examining a sample of n individuals drawn from the population in the hope that it is representative of the complete population. Without knowledge of the entire population, we can no longer know the population mean – μ and population standard deviation – σ. Nevertheless, we can estimate them from the sample. To do so the sample has to be “representative” of the population from which it is drawn.

• Random Sample

All statistical methods are built on the assumption that the individuals included in your sample represent a random sample from the underlying (and unobserved) population. In a random sample every member of the population has an equal probability (chance) of being selected for the sample. The most direct way to create a simple random sample would be to obtain a list of every member of the population of interest, number them from 1 to N (where N is the number of population members), then use a computerised random number generator to select the n individuals for the sample. Every number has the same chance of appearing and there is no relationship between adjacent numbers. We could create several random samples by simply select samples that have not been selected before and the important point is not to reuse any sequence of random number already used to select a number. In this way, we ensure that every member of the population is equally likely (equal probability/chance) to be selected for observation in the sample.

The list of population members from which we drew the random sample is known as sampling frame. Sometimes it is possible to obtain such a list (for example, a list of all people hospitalised in a given hospital on a given day), but often no such list exists. When there is no list, investigators use other techniques for creating a random sample, such as dealing telephone numbers at random for public opinion polling or selecting geographic locations at random from maps. The issue of how the sampling frame is constructed can be very important in terms of how well and to whom the results of a given study generalize to individuals beyond the specific individuals in the sample. The procedure of random selection of samples above is known as simple random sample, by which we randomly selection samples from a population as a whole group. Conversely, investigators sometimes use stratified random samples in which they first divide the population into different population into different subgroups (perhaps based on gender, race, or geographic location), then construct simple random samples within each subgroup (strata). This procedure is used when there are widely varying numbers of people in the different subpopulations so that obtaining adequate sample sizes in the smaller subgroups would require collecting more data than necessary in the larger subpopulations if the sampling was done with a simple random sample. Stratification reduces data collection costs by reducing the total sample size necessary to obtain the desired precision in the results, but makes the data analysis more complicated. The basic need to create a random sample where each member of each subpopulation (strata) has the same chance of being selected is the same as in a simple random sample.

• The Mean and Standard Deviation

Having obtained a random sample from a population of interest, we are ready to use information from that sample to estimate the characteristics of the underlying population.

Sample mean = Sum of values/Number of observations in sample, that is

in which the bar over the X denotes that it is the mean of the n observations of X.

The estimate of the population standard deviation is called the sample standard deviation s or SD and is defined as,

The definition of the sample standard deviation, SD, differs from the definition of the population standard deviation σ in two ways: 1.the population mean μ has been replaced by our estimate of it, the sample mean X, and 2.we compute the “average” squared deviation of a sample by dividing n-1 rather than n. The precise reason for dividing by n-1 rather than n requires substantial mathematical arguments, but we can present the following intuitive justification: The sample will never show as much variability as the entire population and dividing by n-1 instead of n compensates for the resultant tendency of the sample standard deviation to underestimate the population standard deviation. In conclusion, if you are willing to assume that the sample was drawn from a normal distribution, summarise data with the sample mean and sample standard deviation, the best estimates of the population mean and population standard deviation, because these two parameters completely define the normal distribution. When there is evidence that the population under study does not follow a normal distribution, summarise data with the median and upper and lower percentiles discussed later in this thread.

• Standard Error of The Mean/How Good Are These Estimates (sample mean and sample standard deviation)

The mean and standard deviation computed from a random sample are estimates of the mean and standard deviation of the entire population from which the sample was drawn. There is nothing special about the specific random sample used to compute these statistics, and different random samples will yield slightly different estimates of the true population mean and standard deviation. To quantitate how accurate these estimates are likely to be, we can compute their standard errors. It is possible to compute a standard error for any statistic, but here we shall focus on the standard error of the mean. This statistic quantifies the certainty with which the mean computed from a random sample estimates the true mean of the population from which the sample was drawn.

To compute the standard error of the mean, you have to get two or more samples, in which each sample has a certain size of individuals. For each sample, you could get a mean value of the individuals in the sample. For example, if you have four samples, you would get 4 means. And the standard error of the mean could be calculated by the equation for the calculation of SD, as described above. We denote the standard error of the mean σX.

Just as the standard deviation of the original sample of a certain amount of individuals is an estimate of the variability distribution for the whole population, σX is an estimate of the variability of possible values of means of samples of a certain amount of individuals. Since when one computes the mean, extreme values tend to balance each other, there will be less variability in the values of the sample means than in the original population (that is, SD>σX). σX is a measure of the precision with which a sample mean estimates X the population mean μ.

The standard error of the mean tells not about variability in the original population, as the standard deviation does, but about the certainty with which a sample mean estimates the true population mean.

Since the precision with which we can estimate the mean increases as the sample size increases, the standard error of the mean decreases as the sample size increases. Conversely, the more variability in the original population, the more variability will appear in possible mean values of samples; therefore, the standard error of the mean increases as the population standard deviation increases. The true standard error of the mean of samples of size n drawn from a population with standard deviation σ is,

Mathematicians have shown that the distribution of mean values will always approximately follow a normal distribution, regardless of how the population from which the original samples were drawn is distributed. We have developed what statisticians call the Central Limit Theorem. It says,

• The distribution of sample means will be approximately normal regardless of the distribution of values in the original population from which the samples were drawn.
• The mean value of the collection of all possible sample means will equal the mean of the the original population.
• The standard deviation of the collection of all possible means of samples of a given size, called the standard error of the mean, depends on both the standard deviation of the original population and the size of the sample.