Influence of blinding on treatment effect size estimate in randomized controlled trials of oral health interventions

Scales,justice,blind,fold,fairness - free image from


The coronavirus pandemic has highlighted many of the strengths and weaknesses in trying to develop evidence-based healthcare strategies from the huge amount of data created in the past 8 months. The most obvious strength is the rapid sharing of research papers via the preprint servers; the limitations are a lack of high-quality peer review, or research methodology in the rush to publish. In a recent article Aronson and co-workers looked at the large number of Covid-19 drug trials and the lack of  suitable blinding/masking which is essential in reducing the effects of bias within a trial. The authors found that out of 142 trails only 36% were masked (Aronson et al., 2020) raising important ethical issues about bias and unreliable reporting of study results.

The importance of blinding/masking has been highlighted in an oral health review by Saltaji and co-workers (Saltaji et al., 2018) that showed studies that did not mask both participants and investigators gave treatment effect size estimates that were, on average higher than masked studies. This review is described below. Their research questions were:

  • Do oral health randomised controlled trials (RCTs) with adequate blinding of participants, outcome assessors, and health care providers yield different treatment effect sizes (ESs) than trials with lack or unclear blinding?
  • Do specific non-methodological meta-analysis characteristics (e.g., dental specialty, type of treatment, type of outcome [objective vs. subjective], magnitude of the treatment ES estimate, heterogeneity of meta-analysis) modify the association between blinding and treatment ES estimates?


This paper was part of a larger oral health study regarding the methodology used in randomised control trials. Their protocol was registered on PROSPERO (CRD42014014070).

The authors searched six electronic databases (PubMed, MEDLINE, EMBASE, ISI Web of Science, Evidence-Based Medicine Reviews–Cochrane Database of Systematic Reviews, and Health

STAR) from database inception to May 2014. The search was not limited to the English language.

Two independent assessors extracted the relevant data. Study quality was assessed on the different levels of study blinding and whether they were present/absent or unclear. Meta-analysis was undertaken using the raw data extracted from the individual studies with the summary estimate being the difference in ESs plus 95% confidence intervals. Subgroup analysis was undertaken in a similar fashion.


  • 64 systematic reviews (32 Cochrane and 32 non-Cochrane reviews) satisfied the eligibility criteria for the present report. A total of 540 trials analysing 137,957 patients were considered for this study:
    • Periodontics (36 reviews; 271 trials)
    • Dental public health and paediatric dentistry (10 reviews; 145 trials)
    • Oral medicine and pathology (11 reviews; 80 trials)
    • Oral and maxillofacial surgery (4 reviews; 26 trials)
    • Orthodontics and dentofacial orthopaedics (2 reviews; 12 trials)
    • Restorative dentistry (1 review; 6 trials).
  • Risk of Bias – Blinding of patients was judged as adequate (low risk of bias) in 71.5% (n = 386) of the trials, and blinding of the outcome assessment was judged as adequate (low risk of bias) in 59.4% of the trials.
  • Quality assessment – Only 33.5% (n=181) of studies were described as double blind.
  • Trials with inadequate patient blinding had significantly larger treatment ES estimates (difference = 0.12, 95% confidence interval 0.00 to 0.23, p = 0.046).
  • Trials with a lack of blinding of both patients and assessors (difference = 0.19; 95% CI: 0.06 to 0.32)
  • Trials with a lack of blinding of patients, assessors, and care-providers concurrently (difference = 14; 95% CI: 0.03 to 0.25).
  • Subgroup analysis stratified by other characteristics of meta-analyses (heterogeneity of meta-analysis, type of outcome, and dental speciality) was not statistically significant for any of the characteristics.

Conclusion (the author concluded)

We found significant differences in treatment ESs between oral health RCTs based on lack of patient and assessor blinding. RCTs that lacked patient and assessor blinding had significantly larger treatment ES estimates. Treatment ES estimates were 0.19 and 0.14 larger in trials with lack of blinding of both patients and assessors and blinding of patients, assessors, and care-providers concurrently. No significant differences were identified in other blinding criteria. Future meta-epidemiological assembling of a greater number of meta-analyses and trials that takes other biases and different degrees of blinding into account is needed.


This was a well-constructed methodological meta-analysis demonstrating the importance of including blinding wherever it is possible within a study design to reduce the effects of bias. Poorly designed trials can overestimate benefits and therefore obscure harms to patients. Good methodology must not be sacrificed on the alter of producing results with amplified effect sizes in an attempt to get published as this not only represent bad science it is also unethical (WHO, 2014).


ARONSON, J., DEVITO, N. & FERNER, R. 2020. The ethics of COVID-19 treatment studies: too many are open, too few are double-masked [Online]. Oxford COVID-19 Evidence Service. Available: [Accessed].

SALTAJI, H., ARMIJO-OLIVO, S., CUMMINGS, G. G., AMIN, M., DA COSTA, B. R. & FLORES-MIR, C. 2018. Influence of blinding on treatment effect size estimate in randomized controlled trials of oral health interventions. BMC Med Res Methodol, 18, 42.

WHO. 2014. Ethical considerations for use of unregistered interventions for Ebola virus disease: Report of an advisory panel to WHO [Online]. Available: [Accessed].

Myth busting and Covid-19

ProDental CPD's general (non-verifiable) CPD resource | ProDental CPD50 dentists appointed to new FGDP(UK) Academy : Scottish Dental ...

Last night I was asked to participate in an excellent webinar with Professors Jennie Wilson, Nairne Wilson, Ross Hobson, Drs Jimmy Walker, Ian MIlls, and Dominic O’Hooley run through a collaboration of the Faculty of General Dental Practice and ProdentalCPD.

Myth busting and Covid-19

FFP3 fit testing accuracy and Covid-19 prevalence update

File:Atemluftfilter Einwegmaske.jpg - Wikimedia Commons

Link to Dental Elf

The Bottom Line

The prevalence of Covid-19 in England has dropped to 1:1000, reducing the risk of treating an asymptomatic/presymptomatic patient down to 1:3330. Depending on usage between 5% and 55% of fit tested FFP3 mask will truly fit correctly. As the prevalence of Covid-19 drops policy makers will need to be aware of the changing risk/benefits of complex PPE usage.


On the 5th June the Office of National Statistics (ONS) updated their prevalence statistics for Covid-19 infections  down to 0.10% (95% CI: 0.05% to 0.18) of the population in England (ONS, 2020). From my previous two blogs the estimated true asymptomatic prevalence for Covid-19 was 16% (95% CI: 12% to 20%) and the combines asymptomatic/presymptomatic prevalence was estimated at 27% (95%CI; 12 to 45%). From this updated data it is now possible to revise down the chance of treating a Covid-19 patient from 1:1333 to 1:3330, so in 20 days the risk has reduces about 2.5 times.

At the same time we have had three major Standard Operating Procedures (SOPs) published by the British Dental Association, Faculty of General Dental Practitioners, and the Office of the Chief Dental Officer (England) (BDA, 2020, FGDP, 2020, OCDOE, 2020). A large proportion of these documents are dedicated to the aerosol generating procedures and the need for properly fit-tested FFP3 respirators. Below I have outlined the diagnostic accuracy of fit testing based on data from the Health and Safety Executive (HSE, 2015).


In 2015 the HSE produced a document specifically reviewing the fit test criteria for FFP3 respirators. In total 25 volunteers were tested with four consecutive fit test methods, qualitative Bitrex, quantitative Portacount (both with and without the N95-Companion technology), and the laboratory-generated salt aerosol (Total Inward Leakage – TIL) chamber fit test method, tests were conducted on the same subject wearing an FFP3, without adjustment to the fit between tests. I carried out a metanalysis using the ‘mada’ package in R, the reference test used was the laboratory test chamber (TIL) fit test. The summary estimate for the qualitative/quantitative fit-tests for sensitivity was 89.3% (95% CI: 80.1% to 96.7%) and for specificity 58.7% (31.7% to 56.1%). The results have also been charted on a  Summary Receiver Operating Characteristic (sROC) curve (See Figure 1.).

Figure 1. sROC curve for FFP3 fit test


The main problem with both the qualitative and quantitative tests is the low specificity which produces a high number of false negative results. The reference test found that  37% of the masks tested passed, so if we tested 1000 masks, we get the following results (see Figure 2.).

Figure 2. Frequency tree for fit tests

fit test_v3

From the frequency tree we can see that a false fail will appear as a pass, so if a clinician passes a standard fit test there is a 55% probability that the result is true. On refitting and passing a fit check 58% of FFP3 masks failed their fit-test, so in real-terms this reduces the overall pass rate to 31% of fit checked respirators passing their fit test if reused. In long term real-world usage this ability for a mask to retain the required level of filtration could drop as low as 5% with incorrect usage with 18% of theatre staff wearing their face masks incorrectly (Herron et al., 2019).

Attack rates, hospital AGPs and the relative protection of face masks

Interestingly I found a paper on real-world  SARS infection rates for acute care nursing staff performing medical AGPs (Loeb et al., 2004). The infection rate with SARs according to consistently used FFP2 respirators was 13%, and for surgical masks it was 25%, with inconsistent use this rose to 56%. Combining this data with the updated prevalence data we can model the revised infection natural frequencies (see Table 1.).

Table 1. Face mask protection model

Mask usage Infection rate Protection factor Infection risk
Base rate 100% 1.0 1:3330
Inconsistent mask usage 56% 1.7 1:5661
Surgical face mask 25% 4.0 1:13320
FFP2 13% 7.7 1:25641


The problem now is that when the data is placed into an apriori power calculator (G*Power 3.19.2) with an error probability is 0.05 and power (1-beta error probability) of 0.8 one will need a total sample size of 1087610 to test mask effectiveness.


The prevalence of Covid-19 in the population has changed dramatically over the past three weeks, this will have knock on effects regarding the real-world exposure risks of clinical staff to Covid-19 and the application of the advice given in the current SOPs. Clinicians need to be aware of the high failure rates in the true protection offered from properly fit tested FFP2 and FFP3 masks, and as the risk status drops the clinical benefit of these respirator mask  will be harder to detect.


BDA. 2020. RETURNING TO FACE-TO-FACE CARE [Online]. Available: [Accessed].

FGDP. 2020. Implications of COVID-19 for the safe management of general dental practice A practical guide [Online]. Available: [Accessed].

HERRON, J. B. T., KUHT, J. A., HUSSAIN, A. Z., GENS, K. K. & GILLIAM, A. D. 2019. Do theatre staff use face masks in accordance with the manufacturers’ guidelines of use? J Infect Prev, 20, 99-106.

HSE. 2015. Review of fit test pass criteria for Filtering Facepieces Class 3 (FFP3) Respirators [Online]. Available: [Accessed].

LOEB, M., MCGEER, A., HENRY, B., OFNER, M., ROSE, D., HLYWKA, T., LEVIE, J., MCQUEEN, J., SMITH, S. & MOSS, L. 2004. SARS among critical care nurses, Toronto. Emerging infectious diseases, 10, 251.

OCDOE. 2020. Standard operating procedure transition to recovery [Online]. Available: [Accessed].

ONS. 2020. Coronavirus (COVID-19) Infection Survey pilot: 5 June 2020 [Online]. Available: [Accessed].