A systematic review of the literature on a specific research question qualitatively compiles all the studies performed on that question and can help the clinician have a global and summarised view of the available evidence. A meta-analysis goes further and can resolve the following questions quantitatively using statistical techniques:
- What is the mean effect size of individual studies?
- Is this magnitude significant?
- Are the individual effects homogeneous around the mean effect?
- If not, what characteristics of the studies may be causing the heterogeneity?
- Is it possible to formulate a model that explains and corrects for the heterogeneity?
Meta-analysis is a statistical technique, and a quantitative effect size is needed for each study. This effect size can be the family of risks (risk ratios or risk differences of the groups to be compared) when the outcome is qualitative, or the family of means (mean differences between groups) when the outcome is quantitative. For a meta-analysis, the unit of analysis is the effect sizes of the individual studies. In this way, the mean effect is calculated as a weighted average of the individual effects of each study. The weight given to each individual study is usually the inverse of the variance of each individual effect. The variance of the effect of a study is proportional to its sample size: studies with small sample sizes will have effects with more variance than studies with large sample sizes. Thus, the overall mean effect gives more importance to studies with larger sample sizes. Finally, the estimation of the possible benefit of research therapies in real-world patients is also a feasible and useful strategy to address clinical trials and eventual approval by regulatory agencies.
Forest plots are graphical representations used to display the results of individual studies included in a meta-analysis or a systematic review, along with the overall combined result. Each study is represented by a horizontal line that shows the confidence interval (usually 95%) of the effect size, and a central marker that indicates the point estimate of the effect, such as an odds ratio or mean difference. The weight of each study is represented by the size of the marker, usually based on its sample size or variance. At the bottom of the forest plot, a diamond represents the overall effect size and its confidence interval, derived by pooling the results of all included studies.
Key aspects for the interpretation of a forest plot are the position and spread of the confidence intervals in relation to a vertical line of the effect; if most studies and the overall diamond lie on one side of this line, it suggests a consistent effect across studies. However, wide confidence intervals or substantial variation between studies may indicate heterogeneity, meaning that the results differ more than would be expected by chance. Tools like the I² statistic, often reported alongside forest plots, provide an objective quantification of heterogeneity. Understanding forest plots involves assessing both the direction and consistency of effects to judge the strength and reliability of the evidence.
The main limitation of forest plots is that they can oversimplify complex data by focusing primarily on effect sizes and confidence intervals without fully capturing the quality or heterogeneity of the included studies. Visual impressions from forest plots can be misleading if studies with very different sample sizes or methodologies are presented without sufficient context. In addition, the use of summary statistics may mask publication bias or the influence of outlier studies. Forest plots also rely on correct statistical assumptions, such as homogeneity in fixed-effects models, which, if violated, can lead to incorrect interpretations. Therefore, although forest plots are useful, they should be interpreted alongside other analyses and assessments of study quality, bias and heterogeneity.
Pros and Cons of Meta-analysis
The joint analysis of different studies and the overall effect can be easily obtained by the meta-analyses. Nevertheless, the robust finding in a large-scale and specifically designed trial should be considered far more reliable than the inferences of most meta-analyses.1 The Preferred Reporting Items for Systematic Reviews and Meta-Analyses is a practical guide to perform a meta-analysis that offers a feasible way to standardise the methods.2 It is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses that focuses on the reporting of reviews evaluating the effects of interventions but can also be used as a basis for reporting systematic reviews with objectives other than evaluating interventions. It includes a 27-item checklist and flowchart to follow when reporting a systematic review or meta-analysis.
Meta-analyses can be methodologically accurate but also meaningless. In an excellent critical viewpoint, Packer states three key messages for not trusting all meta-analyses:
- Conclusions of meta-analyses should not rely on small numbers of events.
- Meta-analyses that rely on indirect comparison should be cautiously interpreted.
- Meta-analyses should not tell us what is already known or obscure what should be remembered.3
Once these issues are accepted, it must be recognised that there are many gaps in knowledge that can be addressed by meta-analyses. As shown in Figure 1A, meta-analyses using studies that obtained similar results only provide duplication and provide no new evidence. In contrast, the most efficient use of meta-analysis may be to assess an outcome that had different results in different studies (Figure 1B). Another important utility of meta-analysis is to assess the main effect of studies with short-term follow-up (Figure 1C) or small sample sizes (Figure 1D) to increase the statistical power. For example, treatment with dapagliflozin in patients with heart failure and impaired ejection fraction reduced cardiovascular and all-cause mortality in the DAPA-HF trial.4 Nonetheless, these results were not statistically significant in the EMPEROR trial that tested the effect of empagliflozin in the same clinical setting. Investigators of both studies conducted a meta-analysis and provided the evidence of sodium–glucose cotransporter 2 inhibitors for mortality reduction in patients with heart failure and reduced ejection fraction; interestingly, Packer was the last author of this meta-analysis. Similarly, despite the well-proven superiority of direct oral anticoagulants compared to vitamin K antagonist, there were concerns about a possible increase of MI with dabigatran after percutaneous coronary interventions in patients requiring oral anticoagulation. We conducted a meta-analysis that elucidated that there was no increase in any ischaemic event.5–7
Another clear example is the trials conducted with proprotein convertase subtilisin/kexin type 9 (PCSK9) inhibitors. Peculiarly, the clinical trials that provided the evidence for evolocumab and alirocumab were stopped once the number needed to assess the primary endpoint of each trial was achieved. The follow-up period (2.8 months) was much lower than in the studies previously performed with any lipid-lowering and, as consequence, there was lack of statistical power to assess the effect on the individual components of the primary endpoint.8,9 We performed a meta-analysis with all the trials available that reported significant reduction in MI and stroke.10
An important aspect is the critical evaluation of meta-analyses that challenges the results of well-designed and powered trials. An example is the meta-analysis by Kelly et al. of the effect of the combination of moderate-intensity statins with ezetimibe compared to high-intensity statins in terms of lipids, major adverse cardiovascular events (MACE) and drug-related events. The results suggested that the combination provides the same benefit in clinical outcomes, a higher rate of lipid control and similar rates of discontinuation.11 In contrast, the largest trial designed with this purpose, the RACING trial involving 3,780 patients, concluded that the combination of moderate-intensity statins with ezetimibe conferred lower risk of intolerance but without higher risk of discontinuations. The results of this trial are shadowed in the meta-analysis by Kelly et al. by the results of two small trials with shorter follow-up and very few event rates. Although the analyses of other endpoints might be accurate, the conclusion by Kelly et al. in this peculiar aspect is inadequate.11,12
Meta-analyses can also describe the main effect of different strategies or risk factors, especially in areas with more gaps in knowledge. The 2020 COVID-19 pandemic was a social and clinical challenge that represented a new and constantly changing scenario for the medical community.13,14 The first reports relating to the clinical course of patients admitted for pneumonia due to COVID-19 showed that age, cardiovascular risk factors and chronic conditions, such as cardiovascular disease, were leading predictors of poorer outcomes and mortality.15,16 We tried to shed some light in this challenging setting and performed a meta-analysis using the national reports of five countries and more than 600,000 patients; we were able to describe that mortality increased exponentially at >50 years of age and was 33% in octogenarians.17 Subsequently, we analysed the actual prognosis of patients with established cardiovascular disease using hospital reports and were able to show that they had a twofold higher risk of mortality.18 Finally, the generalisation of the COVID-19 vaccines was followed by an unprecedented alarm related to myocarditis as a side-effect. We collected seven large cohorts, involving >17 million subjects, and performed a meta-analysis that clearly depicted that the incidence of myocarditis following RNA-based vaccines was 0.0018%.19 Similar results were reported later.20
One of the most relevant issues when conducting a meta-analysis is to establish a clear definition of the endpoints that are going to be analysed as well as the inclusion and exclusion criteria. These aspects should guide the selection of trials that have a critical impact on results. An example of the effect of study selection on meta-analyses results is those that assessed the efficacy of colchicine in reducing cardiovascular events among patients with coronary artery disease. Results of the COLCOT trial showed a reduction of MACE (HR 0.77; 95% CI [0.61–0.96]; p=0.02) in patients with a recent MI treated with colchicine 0.5 mg, including a reduction in cardiovascular mortality (HR 0.84; 95% CI [0.46–1.52]).21 Similar results were reported in the LoDoCo2 trial that included patients with chronic coronary artery disease, where treatment with colchicine 0.5 mg was associated with a reduction in the incidence of MACE (HR 0.69; 95% CI [0.57–0.83]; p<0.001) but had no effect on cardiovascular mortality. Subsequent meta-analyses concluded that treatment with colchicine 0.5 mg/daily reduced the risk of MACE but had no effect on cardiovascular mortality; these results were challenged in another meta-analysis that included patients only with recent MI that found no effect on MACE.22–24 The most conclusive and contemporary results were provided by the CLEAR study, which concluded that colchicine had no effect on MACE or mortality when started in the first 30 days after an acute coronary syndrome. The controversial results can be explained by differences in the inclusion criteria and endpoints analysed.25
Finally, meta-analysis should not be used to estimate future therapies but can provide reliable approximations using Phase II and III trials that were not designed or underpowered to assess the clinical benefit of future therapies. Meta-regression can also be used to estimate the effect of a variable, as described below. Thereafter, meta-analysis can provide reliable conclusions in areas of uncertainty when performed accurately and are correctly directed.
Further Analysis with Meta-analyses
Heterogeneity
Heterogeneity is the degree of variability of the between-study effect that is not due to chance. Once the overall mean effect has been calculated, the presence of heterogeneity is tested for using Cochran’s Q test or I2 or H2 indices. If moderate or high heterogeneity is observed, this is a problem, since the overall mean effect does not adequately represent the individual effects, and the conclusions may be biased. There are several ways to try to explain and correct for heterogeneity: subgroup analysis, meta-regression and sensitivity analysis.
Subgroup Analyses
Subgroup analysis can correct for heterogeneity when there is a qualitative variable that is the source of the heterogeneity. An overall mean effect is then estimated for each category of this variable, correcting for heterogeneity within each subgroup. This approach has the limitation that it can only be performed on a single qualitative variable. Another application of subgroup analysis is simply to assess whether the overall effect shows a differential pattern for some subgroups of interest defined a priori in the objectives.
Sensitivity Analyses
When a meta-regression cannot identify the source of heterogeneity with the available moderator variables, a sensitivity analysis can be performed which consists of removing each individual study one-by-one and calculating the heterogeneity without that study; this is the ‘leave one out’ method.26 In this way, possible influential studies can be identified and their elimination from the meta-analysis can be assessed.
Bias Detection
Possible biases meta-analysis have been described, such as the small study effects bias, better known as publication bias. This consists of the tendency not to publish non-significant results, which is more frequent in studies with low sample sizes.27 By not taking these studies into account in the meta-analysis, the overall mean effect may be overestimated, producing a bias. This phenomenon can be assessed by graphical methods such as the funnel plot, or by parametric tests such as Egger’s test or non-parametric tests such as the Begg-Mazumdar test.28,29
Multivariate Meta-analysis
In many clinical studies it is common to study more than one outcome, for example, in clinical trials where a primary MACE is analysed, but results are also obtained on its components: non-fatal events such as stroke or ischaemic heart disease, and fatal cardiovascular events. In these situations, it is natural to perform a joint meta-analysis, considering all outcomes at the same time. If the correlation between outcomes in each study is available, this multivariate approach can provide more precise estimates than performing univariate meta-analyses for each outcome. The limitation arises from the assumption that outcomes should follow a multivariate normal distribution, which is not possible to test, and is unlikely to occur the greater the number of outcomes.30
If individual data from a set of clinical trials (individual participant data) are available, special applications of multivariate meta-analysis are possible: accounting for covariate interactions with treatment; accounting for baseline adjustment variables; modelling multiple outcomes with repeated measures; and developing predictive models with different outcomes simultaneously.31
Network Meta-analysis
A network meta-analysis is a generalisation of a paired meta-analysis (pairwise meta-analysis) that combines information on treatment comparisons from clinical trials and indirect estimates of comparisons between treatments not performed directly.32 For example, if there is a set of clinical trials comparing treatment A with B, and other trials comparing A with C, a network meta-analysis allows conclusions to be drawn about a comparison between A and C, without any trial doing so directly.
Mixed-effects models can be fitted, and the presence of heterogeneity can be assessed using generalised Q-tests and generalised Higgins I² indices. The results are presented as a network plot, where the nodes are the possible treatments, and the compared treatments are connected by lines. When all treatments are compared against the same treatment (placebo), the forest plot shows the effects of each treatment versus placebo. It is also possible to perform a net head plot, which shows the effects of each pair of comparisons between treatments. Network meta-analysis has been applied to analyse antihypertensive medication to prevent cardiovascular events in a recent paper.33
Meta-regression: A Step Beyond Meta-analyses
Linear regression is commonly used to assess the correlation between variables, and is fundamental to meta-regression. In meta-regressions an outcome variable is predicted according to the results of studies included in a meta-analysis and can be adjusted for one or more explanatory variables that might influence the size of intervention effect. A linear model is fitted by taking the overall mean effect as the dependent variable and the moderator variables as independent variables. This model can have fixed and/or random effects. When a variable is significant, it implies that the overall average effect is not fixed but is a function of this variable and depends on its values. These explanatory variables are often called potential effect modifiers or covariates. Figure 2 shows the graphical representation of a meta-regression where the change of a variable is represented on the x-axis and the reduction in the risk associated with that change is represented on the y-axis.
Meta-regression is not an independent tool and is generally performed after the meta-analysis so that study effects are converted into comparable parameters. It uses study-level data and can help understand the dose–effect relationship at a study level but carries the risk of ecological fallacy and bias.
Meta-regressions also have strengths and weaknesses. Their main strength is that they weight studies by their sample size, giving greater influence to larger studies on the outcome variable.34 In contrast, conclusions of a meta-regression might be misleading or biased if methodology is not accurate.35
There are different examples of moderating variables that explain heterogeneity, such as the years of follow-up or the percentage of men to explain the effect of the size and number of lipid particles on cardiovascular risk.36 Another recent paper identified the type of statistical model as a source of heterogeneity in estimating the mean incubation period of severe acute respiratory syndrome coronavirus 2 and the mean age of the subjects in each study as a source of heterogeneity of the 95th percentile of the incubation period.37 The meta-regression can also be included in the subgroup analysis and allows several variables (moderators) to be considered at the same time, both qualitative and quantitative.
Finally, meta-regression can also be used for the estimation of the effect of molecules under investigation or those without much clinical evidence; for example, after the results of the first long-term studies with inclisiran – the first RNA silencer therapy that inhibits the synthesis of PCSK9 – which reported the efficacy of monoclonal antibodies inhibiting the protein for lowering LDL cholesterol; moreover, a clear trend in the reduction of major cardiovascular events (MACE) was also observed. We included these results in the abovementioned meta-analysis of the trials with PCSK9 inhibitors and described the estimation that the effect on MACE would be similar for inclisiran and monoclonal antibodies.38,39 Similarly, the newest lipid-lowering therapy bempedoic acid demonstrated a reduction of MACE and LDL cholesterol in the CLEAR OUTCOMES trial.40 We performed a meta-regression with all the intensive lipid-lowering therapies which reflected that the reduction of LDL cholesterol in all of all of them was proportional to the risk of MACE.41
Conclusion
Meta-analysis and meta-regression are good statistical tools, when used correctly, to obtain solid evidence in fields of uncertainty. In addition, they allow estimation of the possible effect of some interventions.