Spin the Odds


I recently attended the Evidence-Based Medicine Live19 conference at Oxford University where Professor Isabella Boutron from the Paris Descartes University presented a lecture entitled ‘Spin or Distortion of Research Results’. Simply put, research spin is ‘reporting to convince readers that the beneficial effect of the experimental treatment is greater than shown by the results’(Boutron et al., 2014). In a study of oncology trials spin was prevalent  in 59% of the 92 trials where the primary outcome was negative (Vera-badillo et al., 2013). I would argue that spin also affects a large proportion of dental research papers.

To illustrate how subtle this problem can be  I have selected a recent systematic review (SR) that was posted on the Dental Elf website  regarding pulpotomy (Li et al., 2019). Pulpotomy  is the removal of a portion of the diseased pulp, in this case from a decayed permanent tooth, with the intent of maintaining the vitality of the remaining nerve tissue by means of a therapeutic dressing. Li’s SR was comparing the effectiveness of calcium hydroxide with the newer therapeutic dressing material mineral trioxide aggregate (MTA).

In the abstract Li states that the meta-analysis favours mineral trioxide aggregate (MTA), and  in the results sections of the SR that ‘MTA had higher success rates in all trial at 12 months (odds ratio, 2.23,  p= 0.02, I2=0%), finally concluding that ‘mineral trioxide aggregate appears to be the best pulpotomy medicament in carious permanent teeth with pulp exposures’. I do not agree with this assumption, and would argue that the results show substantial spin. Close appraisal of Li’s paper reveals several methodological problems that have magnified the beneficial effect of MTA.

The first problem is regarding the use of reporting guidelines, which in this case was the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Moher et al., 2009). The author states this was adhered to but there is no information regarding registration of a review protocol to establish predefined primary and secondary outcomes or methods of analysis. To quote Shamseer:

‘Without review protocols, how can we be assured that decisions made during the research process aren’t arbitrary, or that the decision to include/exclude studies/data in a review aren’t made in light of knowledge about individual study findings?’(Shamseer & Moher, 2015)

In the ‘Data synthesis and statistical analysis’ section the author states that the primary and secondary outcomes for this SR were only formulated after data collection. This post hoc selection makes the data vulnerable to selection bias. Additionally, there is no predefined rationale relating to the choice of an appropriate summary measure or method of synthesising the data.

The second problem relates to the post hoc choice of summary measure, in this case ‘odds ratio’ and the use of a fixed effects model in the meta-analysis (Figure.1).


Figure 1. Forest plot of 12-month clinical success (original).

Of all the options available to analyse the 5 randomised control trails odds ratio and a fixed effects model produced the largest significant effect size (OR 2.23 0.02). There was no explanation as to why odds ratio was selected over relative risk (RR), risk difference (RD), or arcsine difference (ASD) if the values were close to 0 or 1. Since the data for the SR is dichotomous the three most common effect measurements are:

  • the risk difference which is the actual size of the difference . This is probably the most  straightforward and useful analysis.
  • the relative risk highlights the relative difference between two probabilities .
  • the odds ratio .Odds ratios are approximately equal to the RR when outcomes are rare, however they are easy to misinterpret (in Li’s results an OR of 2.23 represents a 2.23 fold increase in odds). OR  and are best used for case-control studies.

The authors specifically chose a fixed-effects model for meta-analysis based on the small number of studies. There are two problems with this, firstly there is too much variability between the 5 studies in terms of methodology and patient factors, such as age (in 4  studies the average age is approximately 8 years and in  one study its 30 years). Secondly we don’t need to used a fixed effect model since there are 5 studies, therefore we can use a random effects model using a Hartung-Knapp adjustment specifically for handling the small number of studies (Röver, Knapp & Friede, 2015; Guolo & Varin, 2017).

Below I have reanalysed the original data using a more plausible random effects model (Hartung-Knapp) and RR to show the relative difference in treatments plus RD to highlight the actual difference (Figure 2. and 3.) using the ‘metabin’ package in R (Schwarzer, 2007).


Figure 2. 12-month clinical success using  Hartung-Knapp adjustment for random effects model and relative risk

HaknRDFigure 3. 12-month clinical success using  Hartung-Knapp adjustment for random effects model and risk difference

Both analyses now show a small effect size ( 8% to 9%) that slightly favours the MTA but is non-significant as opposed to a 2.23-fold increase in odds. In the pulpotomy review the OR magnifies the effect size by 51% using the formula   . In a paper by Holcomb reviewing 151 studies using odds ratios 26% had interpreted the odds ratio as a risk ratio (Holcomb et al., 2001).

There are a couple of further observations to note. Regarding the 5 studies, even combined one would need 199 individuals in each arm of the study for it to be sufficiently powered (  error prob = 0.05,   error prob = 0.8) putting the authors results into question about significance.

I have included a prediction interval in both my forest plots to signify the range of possible true values one could expect in a future RCT’s, which is more useful to know in clinical practice than the confidence interval (IntHout et al., 2016). Using the RD meta-analysis, a future RCT could produce a result that favours calcium hydroxide by 20% or MTA by 35% which is quite a wide range of uncertainty.

One of  Li’s primary outcomes was cost effectiveness and the paper concluded there was insufficient data to determine a result, it also mentions the high cost and technique sensitivity of MTA compared to the calcium hydroxide. I would argue that since there appears to be no significant difference between outcomes, we could conclude that on the evidence available calcium hydroxide must be more cost effective.

In conclusion researchers, reviewers and editors need to be aware of the harm spin can do. Many clinicians are not able to interrogate the main body of a research paper for detail as it is hidden behind a paywall and they rely heavily on the abstract for information(Boutron et al., 2014). Registration of a research protocol prespecifying appropriate outcome and methodology will help prevent post-hoc changes to the outcomes and analysis. I would urge researches to limit the use of odds ratios to case-control studies and use relative risk or risk difference as they are easier to interpret. For the meta-analysis avoid using a fixed effects model if the studies don’t share a common true effect and include a prediction interval to explore possible future outcomes.


Boutron, I., Altman, D.G., Hopewell, S., Vera-Badillo, F., et al. (2014) Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: The SPIIN randomized controlled trial. Journal of Clinical Oncology. [Online] 32 (36), 4120–4126. Available from: doi:10.1200/JCO.2014.56.7503.

Guolo, A. & Varin, C. (2017) Random-effects meta-analysis: The number of studies matters. Statistical Methods in Medical Research. [Online] 26 (3), 1500–1518. Available from: doi:10.1177/0962280215583568.

Holcomb, W.L., Chaiworapongsa, T., Luke, D.A. & Burgdorf, K.D. (2001) An Odd Measure of Risk. Obstetrics & Gynecology. [Online] 98 (4), 685–688. Available from: doi:10.1097/00006250-200110000-00028.

IntHout, J., Ioannidis, J.P.A., Rovers, M.M. & Goeman, J.J. (2016) Plea for routinely presenting prediction intervals in meta-analysis. British Medical Journal Open. [Online] 6 (7), e010247. Available from: doi:10.1136/bmjopen-2015-010247.

Li, Y., Sui, B., Dahl, C., Bergeron, B., et al. (2019) Pulpotomy for carious pulp exposures in permanent teeth: A systematic review and meta-analysis. Journal of Dentistry. [Online] 84 (January), 1–8. Available from: doi:10.1016/j.jdent.2019.03.010.

Moher, D., Liberati, A., Tetzlaff, J. & Altman, D.G. (2009) Systematic Reviews and Meta-Analyses: The PRISMA Statement. Annulas of Internal Medicine. [Online] 151 (4), 264–269. Available from: doi:10.1371/journal.pmed1000097.

Röver, C., Knapp, G. & Friede, T. (2015) Hartung-Knapp-Sidik-Jonkman approach and its modification for random-effects meta-analysis with few studies. BMC Medical Research Methodology. [Online] 15 (1), 1–8. Available from: doi:10.1186/s12874-015-0091-1.

Schwarzer, G. (2007) meta: An R package for meta-analysis. [Online]. 2007. R News. Available from: https://cran.r-project.org/doc/Rnews/Rnews_2007-3.pdf.

Shamseer, L. & Moher, D. (2015) Planning a systematic review? Think protocols. [Online]. 2015. Research in progress blog. Available from: http://blogs.biomedcentral.com/bmcblog/2015/01/05/planning-a-systematic-review-think-protocols/.

Vera-badillo, F.E., Shapiro, R., Ocana, A., Amir, E., et al. (2013) Bias in reporting of end points of efficacy and toxicity in randomized, clinical trials for women with breast cancer. Annals of Oncology. [Online] 24 (5), 1238–1244. Available from: doi:10.1093/annonc/mds636.


Diagnosis under pressure

It is true to say that our judgement is not always at it’s best when we are stressed or under time pressure. In a recent paper (Plessas et al., 2019 ) a research team  placed 40 dentist under time pressure to assess a number of dental radiographs and compared it to the same groups results if they has as much time as they felt necessary.  Their conclusion was:

Time pressure negatively impacts one aspect of dentists’ diagnostic performance, namely sensitivity (increased diagnostic errors and omissions of pathology), which can potentially affect patient safety and the quality of care delivered.

Since this study was based about diagnostic test accuracy (DTA) I felt it might be interesting to express this result graphically, and the author was kind enough to provide me with the raw data from the research. The  data was extracted into  excel and then analysed using the statistical package ‘mada’ in R to  create a summary receiver operator characteristic (sROC).

The Results

The ellipse with circle surrounded by the solid line represents the summary estimate for diagnosis with time pressure (TP) and 95% confidence interval, and the ellipse with triangle surrounded by the dotted line represents the summary estimate for diagnosis with no time pressure (NTP).

Diagnostic test accuracy (sROC curve)Time pressure

Sensitivity = 0.551 (95%CI: 0.439 to 0.658)

False positive rate = 0.006 (95%CI: 0.004 to 0.01)

No time pressure

Sensitivity = 0.797 (95% CI: 0.692 to 0.873)

False positive rate = 0.009 (95%CI: 0.006 to 0.13)

Summary result

The difference between TP and NTP are statistically significant p=0.0007

In absolute terms there was a 24.6% reduction in correct diagnosis under TP.

In relative terms there was a 40% reduction in correct diagnosis under TP.

My conclusion

When under stress or significant time pressure there was a significant reduction in the clinician ability to diagnose pathology (from 80% to 55%) but a misdiagnosis of ‘normal’ was very low in both scenarios.


I would like to thank the authors for allowing me access to their data

Plessas, A., Nasser, M., Hanoch, Y., O’Brien, T., Bernardes Delgado, M. and Moles, D. (2019) ‘Impact of time pressure on dentists’ diagnostic performance’, Journal of Dentistry, 82, pp. 38–44. doi: 10.1016/j.jdent.2019.01.011.

Four Ways to Optimise an Outcome



As a dental surgeon, I have spent my entire career trying to keep up-to-date with the latest evidence as surgical techniques have evolved. Over the past 10 years I have started to question the validity of some of this evidence as I was seeing more complications relating to dental implant treatment than the research would suggest. To explore this hypothetical mismatch between clinical and research outcomes I chose to undertake an updated systematic review (SR) and sensitivity meta-analysis on the ‘Long-term survival of titanium dental implants’ [1].

Observations from the research

Following completion of my SR there were four areas where the previous SRs had potentially over optimised their conclusions:

  1. Definitions of implant failure
    Problem: Most of the research defined the failure of a dental implant using the most extreme outcome (loss from the oral cavity). In clinical practice it is generally accepted that an implant has failed if it causes pain or is mobile when in use, lost most of its supporting bone or presents with uncontrollable infection.
    Solution: By universally adopting these real-world definitions of implant failure the research will produce results closer to what a patient might consider a failure.
  2. Patients lost to follow-up
    : In all the papers reviewed the researchers had assumed that any patient unavailable for assessment at 10-years was ‘missing at random’, that is to say that their absence had nothing to do with the treatment they received and therefore the data was ignorable and only complete data would be analysed. In clinical practice there is anecdotal evidence and handful of research papers showing that patient who don’t come back for review may have had a higher failure rate (up to ten times higher) for clinical, psychological or financial reasons [2,3].
    Solution: In the real-world clinical environment there is less control over patient monitoring, and it is not plausible to either assume all the patients are missing at random or that all patients lost to follow-up had complete success or complete failure. There needs to be some plausible imputation model to account for the missing data. In my review we set the relative implant failure rate at 5x higher than the authors published result, based on previous lost to follow-up studies and then imputed the number of additionally failed implant this would add in if all the patients had been followed up [4]. One could argue about using a multiplier of 5 but it is more plausible than ignoring the patient altogether or substituting a probability of 0 or 1 for the missing outcome (Cromwell’s Law) [5].
  3. Risk of Bias (RoB) assessment
    : In the initial review most of the previously published SR’s did not employ a risk of bias tool. If one was used it was either Cochrane Collaborations tool for assessing risk of bias in randomised trials or the Newcastle Ottawa Scale for comparing non-randomised studies. The problem with both these tools is that there was no comparator group to assess, so neither of the tools are suitable to assess the risk of bias in these SR’s. By using an inappropriate RoB tool there is a risk or presenting the evidence in a better light by concentrating on the internal validity of the study.
    Solution: I used a risk of bias tool specifically designed for prevalence studies so there is no comparator group [6]. This tool places an emphasis on the external validity (how closely the group under study represent the national population that may benefit from the treatment).
  4. Presentation of the prediction interval
    : The results of the meta-analysis were only presented as a summary estimate and 95% confidence interval. It must be remembered that this is the mean survival rate of all the studies and the 95% confidence interval represents the precision of that estimate. This does not help us predict the possible outcome of a future study conducted in a similar fashion.
    Solution: It is possible to add a prediction interval (PI) to the summary estimate, which represents distribution of the true effects and the heterogeneity in the same metric as the original effect size measure [7,8].


A traditional analysis produced similar 10-year survival estimates to previous systematic reviews. A more realistic sensitivity meta-analysis accounting for loss to follow-up data and the calculation of prediction intervals demonstrated a possible doubling of the risk of implant loss in the older age groups.

Link to CEBM blog  “https://www.cebm.net/2019/05/four-ways-that-a-systematic-review-can-over-optimise-an-outcome/”

[1] M.-S. Howe, W. Keys, D. Richards, Long-term (10-year) dental implant survival: A systematic review and sensitivity meta-analysis, J. Dent. VO – 84. 84 (2019) 9–21. doi:10.1016/j.jdent.2019.03.008.

[2] A.C.P. Sims, The importance of a high tracking rate in long term medical follow-up studies, Lancet. 302 (1973) 433–435.

[3] E.H. Geng, N. Emenyonu, M.B. Bwana, , Sampling-Based Approach to Determining Outcomes of Patients Lost to Follow-Up in Antiretroviral Therapy Scale-Up Programs in Africa, J. Am. Mediacal Assocoation. 300 (2008) 506–507. doi:10.1001/jama.300.5.506.

[4] E.A. Akl, M. Briel, J.J. You, , Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review, Br. Med. J. 344 (2012) e2809–e2809. doi:10.1136/bmj.e2809.

[5] D. V Lindley, Understanding uncertainty. [electronic book], in: Hoboken, New Jersey : Wiley, 2014., 2014: pp. 129–130.

[6] D. Hoy, P. Brooks, A. Woolf, , Assessing risk of bias in prevalence studies: Modification of an existing tool and evidence of interrater agreement, J. Clin. Epidemiol. 65 (2012) 934–939. doi:10.1016/j.jclinepi.2011.11.014.

[7] M. Borenstein, L. V Hedges, J.P.T. Higgins, Introduction to Meta-Analysis, Wiley & Sons, Chichester, UK, 2009.

[8] J. IntHout, J.P.A. Ioannidis, M.M. Rovers, Plea for routinely presenting prediction intervals in meta-analysis, Br. Med. J. Open. 6 (2016) e010247. doi:10.1136/bmjopen-2015-010247.