Benefits and Harms of Treatments, Screening, and Tests.

978px-Balanced_scale_of_Justice.svgClinicians and Patients Expectations of the Benefits and Harms of Treatments, Screening, and Tests.  A Systematic Review

In 2015 and 2017 Hoffman and Del Mar wrote two systematic reviews (Hoffmann and Del Mar, 2015, 2017), the first paper reviewed unrealistic patient expectations to the benefits and harms of various medical interventions, and the second paper review inaccurate clinician expectations. Their introduction to both papers discussed the increasing demand and cost of medical care and how it related to overdiagnosis, a ‘more is better’ culture, a future funding crisis, and defensive practices. The research question for both papers were similar but only expressed clearly in the second paper as:

‘Do clinicians have accurate expectations of the benefits and harms of medical treatment, tests and screening tests?’

Both authors searched  the electronic databases MEDLINE, Embase, Cumulative Index to Nursing and Allied Health Literature, PsycINFO) with no language or study type restriction. All quantitative primary study designs were eligible as long as the participants were asked to estimate the expected harms and/or benefits of various medical interventions. Qualitative estimate without quantification were excluded. No risk of bias assessment was undertaken, and after data extraction meta-analysis was not undertaken due to the range of outcome and response options.

In the results section, for the first paper relating to patients’ expectations 36 papers were eligible, and for the clinicians review 48 were eligible’.

The results were presented as a narrative and stacked bar chart. The data could not sum as some papers only answered one question (i.e. overestimate) while others answered all three.

Of the summary result provided in the abstract only the significant findings were presented based on the result of the individual paper. (Table 1.)

Table 1. Summary results for both systematic reviews (N/D = no data)

Group/question Underestimate

 (n of studies, %)

Correct estimate

 (n of studies, %)


 (n of studies, %)

Benefit N/D N/D 15/17 (88%)
Harm 10/15 (67%) N/D N/D
Benefit 2/22 (9%) 3/28 (11%) 7/22 (32%)
Harm 20/58 (34%) N/D 3/58 (5%)

The conclusion (authors own words) to the first patients based systematic review was:

‘The majority of participants overestimated intervention benefit and underestimated harm. Clinicians should discuss accurate and balanced information about intervention benefits and harms with patients, providing the opportunity to develop realistic expectations and make informed decisions.’

For the clinician’s systematic review:

‘Clinicians rarely had accurate expectations of benefits or harms, with inaccuracies in both directions. However, clinicians more often underestimated rather than overestimated harms and overestimated rather than underestimated benefits. Inaccurate perceptions about the benefits and harms of interventions are likely to result in suboptimal clinical management choices.’


These were two well-constructed systematic reviews trying to answer a very difficult question were the primary studies were highly variable and included a mixture of quantitative and qualitative data. I would say it’s unusual for such a recent systematic review  to not follow a standardised protocol such as PRISMA or provide a risk of bias/quality assessment. The lack of a protocol made it harder to assess the methodology or results, however I do not want that to distract from the importance of the review.

The presentation of the results as a combination of narrative and complex stacked bar charts lacked clarity. By changing the unit of analysis from the number of papers  to the number of participants I was able to extract the data from the charts and transform it into a format suitable for meta-analysis using the ‘metaprop’ function within the ‘meta’ package R. The heterogeneity of the combined studies was close to 100% reflecting the high variability and the four forest plots plus subgroup analysis were quite large so I have combined  the summary estimates (Table 2) into two charts. The first chart describes patient expectations ( Figure 1), and the second, clinicians’ expectations (Figure 2).

Table 2. Summary estimates with confidence intervals

  Patients Clinicians
Outcome Group Mean 95% CI Mean 95% CI
Underestimate Benefit 23 15 to 32 29 19 to 41
  Harm 60 50 to 69 42 33 to 51
Correct estimate Benefit 19 14 to 26 28 21 to 35
  Harm 23 16 to 30 28 23 to 34
Overestimate Benefit 56 47 to 64 31 19 to 44
Harm 22 15 to 30 20 15 to 26

Figure 1. Patient expectations



Figure 2. Clinicians expectations


The re-analysis of the data does not change the authors discussion or conclusions, it just adds clarity, the heterogeneity in the data extraction and meta-analysis should be treated as a feature of research question rather than a weakness. We can see how dramatically the patients could overestimate benefits and underestimate harms leading to potential risk of overtreatment (resulting in an unfavourable benefit-to-harm ratio) as the two effects combine to increase the chance of the patient making a poor choice. Optimism bias has been well reported and is a normal human reaction when decision making (Weinstein, 2001; Hanoch, Rolison and Freund, 2019). I am sceptical that the principle of shared decision making (NICE, 2011; Ryan and Cunningham, 2014) will override this basic human trait of overestimating the upside of a situation and discounting the downside. One can see from the clinicians results that the effect still exists even after specialist training by the fact that harms are underestimated and benefits overestimated, though this effect is weaker than in the patients case there is an almost equal chance of overestimating or underestimating a benefit or harm as there is of being correct. The authors state in their discussion section:

‘Shared decision making is a logical mechanism for bringing evidence into consultations, but this requires clinicians to know the best current evidence about the benefits and harms of the interventions being contemplated’

I would argue that for the average clinician outside of academia it is very hard to search out and interrogate the best evidence necessary to assist the patient in the shared decision process due to restricted access to full text literature. Beyond the problems of access there is also endemic publication bias (Landewé, 2014) associated with primary research which favours positive outcomes and, thereby  contaminates the systematic reviews, clinical guidelines and healthcare policy.

In summary we need to embrace the concept of shared decision making but more importantly acknowledge the presence of optimism bias and harm discounting  and its ability to undermine the rational decision-making process.



Hanoch, Y., Rolison, J. and Freund, A. M. (2019) ‘Reaping the Benefits and Avoiding the Risks: Unrealistic Optimism in the Health Domain’, Risk Analysis, 39(4), pp. 792–804. doi: 10.1111/risa.13204.

Hoffmann, T. C. and Del Mar, C. (2015) ‘Patients’ expectations of the benefits and harms of treatments, screening, and tests: a systematic review.’, JAMA internal medicine, 175(2), pp. 274–86. doi: 10.1001/jamainternmed.2014.6016.

Hoffmann, T. C. and Del Mar, C. (2017) ‘Clinicians’ expectations of the benefits and harms of treatments, screening, and tests: A systematic review’, JAMA Internal Medicine, 177(3), pp. 407–419. doi: 10.1001/jamainternmed.2016.8254.

Landewé, R. B. M. (2014) ‘Editorial: How publication bias may harm treatment guidelines’, Arthritis and Rheumatology, 66(10), pp. 2661–2663. doi: 10.1002/art.38783.

NICE (2011) Shared Decision Making Collaborative. Available at:

Ryan, F. and Cunningham, S. (2014) ‘Shared decision making in healthcare’, Faculty Dental Journal, 5(3), pp. 124–127. doi: 10.1308/204268514X14017784505970.

Weinstein, N. D. (2001) ‘Health Risk Appraisal and Optimistic Bias 1.’, in International Encyclopedia of the Social & Behavioral Sciences, pp. 6612–6615.

Reflections on EBM Live 2019 – Oxford

Professor John Ioannidis presenting the keynote lecture

This month I attended the attended my first Evidence-Based Medicine conference at Oxford Universities Said Business School. Wow, this was so different from one of the usual dental congresses I attend around Europe, for a start instead of the 10,000 delegates there were only 300. This 300 hundred was made up of senior academic staff, researchers, clinicians and patient representatives. The other big difference was the lack of trade stands, corporate sponsorship and paper (App based programme).

Subjects covered over the three days included:

  • Increasing the systematic use of existing evidence
  • Reducing questionable research practices and bias
  • Finding better evidence (TRIP database)
  • Healthcare Value with Sir Muir Gray
  • Enhancing real world practice
  • Making research evidence relevant, replicable and accessible to the public
  • The complete guide to BREAST CANCER – a personal view of cancer when the doctor becomes the patient.
  • Conflicts of interests and the spinning of weak research results
  • Reproducible evidence for Healthcare: Current and Future  – Prof Ioannidis

The takehome message from this excellent conference was that we still have a long way to go with delivering that best healthcare for our patients.  This challenge requires a general improvement in healthcare literacy and a basic knowledge of statistics for both the profession and the public.

We dont need more research quantity but we do need considerably more reproducible protocol driven quality.  Unregistered/unreported trial findings can’t be analysed thus potentialy hiding benefits and harms to the population.



Spin the Odds


I recently attended the Evidence-Based Medicine Live19 conference at Oxford University where Professor Isabella Boutron from the Paris Descartes University presented a lecture entitled ‘Spin or Distortion of Research Results’. Simply put, research spin is ‘reporting to convince readers that the beneficial effect of the experimental treatment is greater than shown by the results’(Boutron et al., 2014). In a study of oncology trials spin was prevalent  in 59% of the 92 trials where the primary outcome was negative (Vera-badillo et al., 2013). I would argue that spin also affects a large proportion of dental research papers.

To illustrate how subtle this problem can be  I have selected a recent systematic review (SR) that was posted on the Dental Elf website  regarding pulpotomy (Li et al., 2019). Pulpotomy  is the removal of a portion of the diseased pulp, in this case from a decayed permanent tooth, with the intent of maintaining the vitality of the remaining nerve tissue by means of a therapeutic dressing. Li’s SR was comparing the effectiveness of calcium hydroxide with the newer therapeutic dressing material mineral trioxide aggregate (MTA).

In the abstract Li states that the meta-analysis favours mineral trioxide aggregate (MTA), and  in the results sections of the SR that ‘MTA had higher success rates in all trial at 12 months (odds ratio, 2.23,  p= 0.02, I2=0%), finally concluding that ‘mineral trioxide aggregate appears to be the best pulpotomy medicament in carious permanent teeth with pulp exposures’. I do not agree with this assumption, and would argue that the results show substantial spin. Close appraisal of Li’s paper reveals several methodological problems that have magnified the beneficial effect of MTA.

The first problem is regarding the use of reporting guidelines, which in this case was the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Moher et al., 2009). The author states this was adhered to but there is no information regarding registration of a review protocol to establish predefined primary and secondary outcomes or methods of analysis. To quote Shamseer:

‘Without review protocols, how can we be assured that decisions made during the research process aren’t arbitrary, or that the decision to include/exclude studies/data in a review aren’t made in light of knowledge about individual study findings?’(Shamseer & Moher, 2015)

In the ‘Data synthesis and statistical analysis’ section the author states that the primary and secondary outcomes for this SR were only formulated after data collection. This post hoc selection makes the data vulnerable to selection bias. Additionally, there is no predefined rationale relating to the choice of an appropriate summary measure or method of synthesising the data.

The second problem relates to the post hoc choice of summary measure, in this case ‘odds ratio’ and the use of a fixed effects model in the meta-analysis (Figure.1).


Figure 1. Forest plot of 12-month clinical success (original).

Of all the options available to analyse the 5 randomised control trails odds ratio and a fixed effects model produced the largest significant effect size (OR 2.23 0.02). There was no explanation as to why odds ratio was selected over relative risk (RR), risk difference (RD), or arcsine difference (ASD) if the values were close to 0 or 1. Since the data for the SR is dichotomous the three most common effect measurements are:

  • the risk difference which is the actual size of the difference . This is probably the most  straightforward and useful analysis.
  • the relative risk highlights the relative difference between two probabilities .
  • the odds ratio .Odds ratios are approximately equal to the RR when outcomes are rare, however they are easy to misinterpret (in Li’s results an OR of 2.23 represents a 2.23 fold increase in odds). OR  and are best used for case-control studies.

The authors specifically chose a fixed-effects model for meta-analysis based on the small number of studies. There are two problems with this, firstly there is too much variability between the 5 studies in terms of methodology and patient factors, such as age (in 4  studies the average age is approximately 8 years and in  one study its 30 years). Secondly we don’t need to used a fixed effect model since there are 5 studies, therefore we can use a random effects model using a Hartung-Knapp adjustment specifically for handling the small number of studies (Röver, Knapp & Friede, 2015; Guolo & Varin, 2017).

Below I have reanalysed the original data using a more plausible random effects model (Hartung-Knapp) and RR to show the relative difference in treatments plus RD to highlight the actual difference (Figure 2. and 3.) using the ‘metabin’ package in R (Schwarzer, 2007).


Figure 2. 12-month clinical success using  Hartung-Knapp adjustment for random effects model and relative risk

HaknRDFigure 3. 12-month clinical success using  Hartung-Knapp adjustment for random effects model and risk difference

Both analyses now show a small effect size ( 8% to 9%) that slightly favours the MTA but is non-significant as opposed to a 2.23-fold increase in odds. In the pulpotomy review the OR magnifies the effect size by 51% using the formula   . In a paper by Holcomb reviewing 151 studies using odds ratios 26% had interpreted the odds ratio as a risk ratio (Holcomb et al., 2001).

There are a couple of further observations to note. Regarding the 5 studies, even combined one would need 199 individuals in each arm of the study for it to be sufficiently powered (  error prob = 0.05,   error prob = 0.8) putting the authors results into question about significance.

I have included a prediction interval in both my forest plots to signify the range of possible true values one could expect in a future RCT’s, which is more useful to know in clinical practice than the confidence interval (IntHout et al., 2016). Using the RD meta-analysis, a future RCT could produce a result that favours calcium hydroxide by 20% or MTA by 35% which is quite a wide range of uncertainty.

One of  Li’s primary outcomes was cost effectiveness and the paper concluded there was insufficient data to determine a result, it also mentions the high cost and technique sensitivity of MTA compared to the calcium hydroxide. I would argue that since there appears to be no significant difference between outcomes, we could conclude that on the evidence available calcium hydroxide must be more cost effective.

In conclusion researchers, reviewers and editors need to be aware of the harm spin can do. Many clinicians are not able to interrogate the main body of a research paper for detail as it is hidden behind a paywall and they rely heavily on the abstract for information(Boutron et al., 2014). Registration of a research protocol prespecifying appropriate outcome and methodology will help prevent post-hoc changes to the outcomes and analysis. I would urge researches to limit the use of odds ratios to case-control studies and use relative risk or risk difference as they are easier to interpret. For the meta-analysis avoid using a fixed effects model if the studies don’t share a common true effect and include a prediction interval to explore possible future outcomes.


Boutron, I., Altman, D.G., Hopewell, S., Vera-Badillo, F., et al. (2014) Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: The SPIIN randomized controlled trial. Journal of Clinical Oncology. [Online] 32 (36), 4120–4126. Available from: doi:10.1200/JCO.2014.56.7503.

Guolo, A. & Varin, C. (2017) Random-effects meta-analysis: The number of studies matters. Statistical Methods in Medical Research. [Online] 26 (3), 1500–1518. Available from: doi:10.1177/0962280215583568.

Holcomb, W.L., Chaiworapongsa, T., Luke, D.A. & Burgdorf, K.D. (2001) An Odd Measure of Risk. Obstetrics & Gynecology. [Online] 98 (4), 685–688. Available from: doi:10.1097/00006250-200110000-00028.

IntHout, J., Ioannidis, J.P.A., Rovers, M.M. & Goeman, J.J. (2016) Plea for routinely presenting prediction intervals in meta-analysis. British Medical Journal Open. [Online] 6 (7), e010247. Available from: doi:10.1136/bmjopen-2015-010247.

Li, Y., Sui, B., Dahl, C., Bergeron, B., et al. (2019) Pulpotomy for carious pulp exposures in permanent teeth: A systematic review and meta-analysis. Journal of Dentistry. [Online] 84 (January), 1–8. Available from: doi:10.1016/j.jdent.2019.03.010.

Moher, D., Liberati, A., Tetzlaff, J. & Altman, D.G. (2009) Systematic Reviews and Meta-Analyses: The PRISMA Statement. Annulas of Internal Medicine. [Online] 151 (4), 264–269. Available from: doi:10.1371/journal.pmed1000097.

Röver, C., Knapp, G. & Friede, T. (2015) Hartung-Knapp-Sidik-Jonkman approach and its modification for random-effects meta-analysis with few studies. BMC Medical Research Methodology. [Online] 15 (1), 1–8. Available from: doi:10.1186/s12874-015-0091-1.

Schwarzer, G. (2007) meta: An R package for meta-analysis. [Online]. 2007. R News. Available from:

Shamseer, L. & Moher, D. (2015) Planning a systematic review? Think protocols. [Online]. 2015. Research in progress blog. Available from:

Vera-badillo, F.E., Shapiro, R., Ocana, A., Amir, E., et al. (2013) Bias in reporting of end points of efficacy and toxicity in randomized, clinical trials for women with breast cancer. Annals of Oncology. [Online] 24 (5), 1238–1244. Available from: doi:10.1093/annonc/mds636.