Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults

Background The early detection of oral cavity squamous cell carcinoma (OSCC) and oral potentially malignant disorders (OPMD), followed by appropriate treatment, may improve survival and reduce the risk for malignant transformation respectively. This is an update of a Cochrane Review first published in 2013. Objectives To estimate the diagnostic test accuracy of conventional oral examination, vital rinsing, light-based detection, mouth self-examination, remote screening, and biomarkers, used singly or in combination, for the early detection of OPMD or OSCC in apparently healthy adults. Search methods Cochrane Oral Health's Information Specialist searched the following databases: Cochrane Oral Health's Trials Register (to 20 October 2020), MEDLINE Ovid (1946 to 20 October 2020), and Embase Ovid (1980 to 20 October 2020). The US National Institutes of Health Trials Registry (ClinicalTrials.gov) and the World Health Organization International Clinical Trials Registry Platform were searched for ongoing trials. No restrictions were placed on the language or date of publication when searching the electronic databases. We conducted citation searches, and screened reference lists of included studies for additional references. Selection criteria We selected studies that reported the test accuracy of any of the aforementioned tests in detecting OPMD or OSCC during a screening procedure. Diagnosis of OPMD or OSCC was provided by specialist clinicians or pathologists, or alternatively through follow-up. Data collection and analysis Two review authors independently screened titles and abstracts for relevance. Eligibility, data extraction, and quality assessment were carried out by at least two authors independently and in duplicate. Studies were assessed for methodological quality using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). We reported the sensitivity and specificity of the included studies. We provided judgement of the certainty of the evidence using a GRADE assessment. Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review) Copyright © 2021 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd. 1 Cochrane Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Main results We included 18 studies, recruiting 72,202 participants, published between 1986 and 2019. These studies evaluated the diagnostic test accuracy of conventional oral examination (10 studies, none new to this update), mouth self-examination (four studies, two new to this update), and remote screening (three studies, all new to this update). One randomised controlled trial of test accuracy directly evaluated conventional oral examination plus vital rinsing versus conventional oral examination alone. There were no eligible studies evaluating light-based detection or blood or salivary sample analysis (which tests for the presence of biomarkers for OPMD and OSCC). Only one study of conventional oral examination was judged as at overall low risk of bias and overall low concern regarding applicability. Given the clinical heterogeneity of the included studies in terms of the participants recruited, setting, prevalence of the target condition, the application of the index test and reference standard, and the flow and timing of the process, the data could not be pooled within the broader categories of index test. For conventional oral examination (10 studies, 25,568 participants), prevalence in the test accuracy sample ranged from 1% to 51%. For the seven studies with prevalence of 10% or lower, a prevalence more comparable to the general population, the sensitivity estimates were variable, and ranged from 0.50 (95% confidence interval (CI) 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00); the specificity estimates were more consistent and ranged from 0.94 (95% CI 0.88 to 0.97) to 0.99 (95% CI 0.98 to 1.00). We judged the overall certainty of the evidence to be low, and downgraded for inconsistency and indirectness. Evidence for mouth self-examination and remote screening was more limited. We judged the overall certainty of the evidence for these index tests to be very low, and downgraded for imprecision, inconsistency, and indirectness. We judged the evidence for vital rinsing (toluidine blue) as an adjunct to conventional oral examination compared to conventional oral examination to be moderate, and downgraded for indirectness as the trial was undertaken in a high-risk population. Authors' conclusions There is a lack of high-certainty evidence to support the use of screening programmes for oral cavity cancer and OPMD in the general population. Frontline screeners such as general dentists, dental hygienists, other allied professionals, and community healthcare workers should remain vigilant for signs of OPMD and OSCC. P L A I N   L A N G U A G E   S U M M A R Y What are the most accurate tests for screening for cancer of the mouth (oral cancer) and conditions that may lead to oral cancer? Key messages There is a lack of high-certainty evidence to support the use of screening tests for cancer of the mouth and conditions that may lead to mouth cancer in the general population. General dental practitioners and healthcare professionals should be watchful for signs of oral potentially malignant disorders (OPMD) and malignancies whilst performing routine oral examinations in practice for other common oral lesions/conditions. Detection of oral cancer Cancer of the mouth (oral cancer) is a serious condition, and only half of those that develop the disease will survive aMer 5 years. This is because it is oMen detected late. Early detection when the oral cancer is small or as a 'preceding' condition or lesion (which can become cancer) can result in simpler treatment and much better outcomes. As a result, there is a need to understand how good diNerent types of tests are at the early detection of oral cancer and the lesions that precede it. What did we want to find out? The aim of this review was to find out the accuracy of diNerent screening tests for cancer of the mouth and conditions that may lead to mouth cancer. What did we do? We searched for studies that reported the test accuracy of diNerent screening tests in detecting cancer of the mouth or OPMDs during a screening procedure. Diagnosis of cancer of the mouth or OPMDs was provided by specialist clinicians or pathologists, or alternatively through follow-up. We compared and summarised the results of the studies and rated our confidence in the evidence, based on factors such as study methods and sizes. What did we find? We included 18 studies recruiting 72,202 participants, published between 1986 and 2019. These studies evaluated a conventional oral examination (COE) or visual inspection (10 studies), mouth self-examination (four studies), and remote screening (three studies). One randomised controlled trial of test accuracy directly compared conventional oral examination plus vital rinsing with conventional oral examination alone. No eligible studies evaluated the accuracy of tests of blood or saliva. Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review) Copyright © 2021 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd. 2 Cochrane Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews There was substantial variation in the participants that were recruited, the setting, the prevalence of mouth cancer or OPMDs, and how the diNerent tests were carried out, and so we were unable to pool the data. Most studies evaluated the accuracy of the diNerent COEs (10 studies, 25,568 participants). The prevalence of mouth cancer or OPMDs in these studies ranged from 1% to 51%. For the seven COE studies with a prevalence of 10% or lower, a prevalence more comparable to the general population, the sensitivity estimates (proportion of true positives) ranged from 0.50 to 0.99 with specificity estimates (proportion of true negatives) from 0.94 to 0.99. Evidence for mouth self-examination (4 studies, 35,059 participants) and remote screening (3 studies, 3600 participants) was more limited. What are the limitations of the evidence? We judged the overall certainty of the evidence for COE to be low and downgraded for the variation across studies and applicability of the study samples. We judged the overall certainty of the evidence for mouth self-examination and remote screening to be very low, and downgraded for variation across studies, applicability of the study samples, and imprecise accuracy estimates. How up to date is this evidence? The evidence is up to date to October 2020. Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review) Copyright © 2021 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd. 3 Cochrane Library Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews S U M M A R Y   O F   F I N D I N G S Summary of findings 1.   Summary of findings: conventional oral examination/visual inspection for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults Question What is the performance of conventional oral examination/visual inspection for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults? Population OSCC or OPMD symptom-free individuals screened opportunistically, or through an organised screening programme Index test Oral examination (conventional oral examination by a dentist or visual inspection by trained healthcare workers) Target condition OSCC or OPMD Referencestandard Examination and clinical evaluation by a physician with specialist knowledge or training. Long-term follow-up was accepted as a suitable reference standard for those participants who screened negative Study type Individuals attending for opportunistic screening, organised screening programme, validation as part of an organised screening programme, or randomised controlled trial, or screening as part of a routine surveillance appointment Quantity of evidence 10 studies including 25,568 participants. The prevalence varied widely across the studies from 1.4% to 50.9%


S U M M A R Y O F F I N D I N G S Summary of findings 1. Summary of findings: conventional oral examination/visual inspection for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults Question
What is the performance of conventional oral examination/visual inspection for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults?
Population OSCC or OPMD symptom-free individuals screened opportunistically, or through an organised screening programme

Index test
Oral examination (conventional oral examination by a dentist or visual inspection by trained healthcare workers)

Referencestandard
Examination and clinical evaluation by a physician with specialist knowledge or training. Long-term follow-up was accepted as a suitable reference standard for those participants who screened negative

Study type
Individuals attending for opportunistic screening, organised screening programme, validation as part of an organised screening programme, or randomised controlled trial, or screening as part of a routine surveillance appointment Quantity of evidence 10 studies including 25,568 participants. The prevalence varied widely across the studies from 1.4% to 50.9%

Findings
Due to differences in region, setting, nature of the index test, and reference standard we elected not to pool the studies For the 7 studies with low prevalence (10% or less) the sensitivity estimates were highly variable, and ranged from 0.50 (95% CI 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00), but the specificity estimates were more consistent and ranged from 0.94 (95% CI 0.88 to 0.97) to 0.99 (95% CI 0.98 to 1.00). For the 3 studies with higher prevalence sensitivity estimates ranged from 0.94 (95% CI 0.90 to 0.97) to 0.97 (95% CI 0.96 to 0.98), and specificities ranged from 0.75 (95% CI 0.73 to 0.77) to 0.98 (95% CI 0.98 to 0.99). For many of the studies the sensitivity estimates were imprecise, often reflective of the low disease prevalence in the samples

Limitations
Test accuracy certainty of the evidence Risk of bias 3 studies were judged to be at low risk of bias overall. 5 studies were judged to be at unclear risk of bias primarily due to insufficient information regarding blinding of the results of the index test. 2 studies were judged to be at high risk of bias arising from the flow and timing domain (high levels of attrition following a positive screen and time from positive screen to receipt of the reference standard) In 2 studies the sensitivity was much lower than the specificity (sensitivity 0.18 (95% CI 0.13 to 0.24), specificity 1.00 (95% CI 1.00 to 1.00) and sensitivity 0.09 (95% CI 0.04 to 0.15), specificity 0.95 (95% CI 0.88 to 0.99), respectively). Sensitivity and specificity values were similar for 2 other studies (sensitivity 0.43 (95% CI 0.24 to 0.63), specificity 0.44 (95% CI 0.20 to 0.70) and sensitivity 0.33 (95% CI 0.10 to 0.65), specificity 0.54 (95% CI 0.37 to 0.69), respectively)

Limitations
Test accuracy certainty of the evidence

Risk of bias
The overall risk of bias for the studies that evaluated mouth self-evaluation was judged to be unclear for 3 studies and high for 1 study

Applicability of evidence to question
We judged concern regarding the applicability of the studies to the review question to be high for the patient selection domain for 1 study that recruited and evaluated participants with Fanconi anaemia in a hospital setting, and participants that were identified and invited to participate based on their physician assessed risk of oral cancer

Overall certainty of the evidence
We judged the overall certainty of the evidence to be very low, and downgraded for imprecision, inconsistency, and indirectness ⨁◯◯◯ VERY LOW CI = confidence interval; OPMD = oral potentially malignant disorders; OSCC = oral squamous cell carcinoma.

Summary of findings 3. Summary of findings: vital rinsing (toluidine blue) as an adjunct to conventional oral examination compared to conventional oral examination alone for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults B A C K G R O U N D Target condition being diagnosed
The target conditions of interest are oral squamous cell carcinoma (OSCC) and oral potentially malignant disorders (OPMD) of the oral cavity. OSCC is the most common form of oral cavity cancer (Bagan 2020; Chi 2015) and a proportion of carcinomas are preceded by OPMD. OPMD represent a heterogeneous group of conditions including leukoplakia, erythroplakia, proliferative verrucous leukoplakia, oral lichen planus/oral lichenoid lesions, oral submucous fibrosis, and actinic keratosis (Warnakulasuriya 2007; Warnakulasuriya 2020).
The natural history of OSCC is not fully understood; not all OPMD undergo malignant transformation, some remain stable, and some a ected sites can revert back to health (Speight 2017). Equally, some OSCC can develop from lesions in which epithelial dysplasia was not previously diagnosed (Dost 2014), or from apparently normal mucosa that may contain significant molecular aberrations that increase the likelihood of cancer (Farah 2019; Nikitakis 2018; Thomson 2017). Proliferative verrucous leukoplakia has the highest malignant transformation rate (MTR) followed by erythroplakia (Locca 2020). Oral leukoplakia is the most common OPMD but has a varied MTR (Arduino 2013; Chaturvedi 2020; Warnakulasuriya 2016). In a systematic review of the literature, Warnakulasuriya 2016 reported the MTR of oral leukoplakia to be between 0.1% and 34%, and more recently a review from 2015 to 2020 reported that the MTR varied between 1.1% and 40.8%, with a pooled proportion of 9.8% (95% confidence interval (CI) 7.9% to 11.7%) (Aguirre-Urizar 2021). Petti 2003 calculated a global MTR of oral leukoplakia of 1.4% per year (95% CI 0.7% to 2%), but when this is applied to the prevalence of the condition, it far exceeds the numbers of actual cases of OSCC reported. However, the MTR in hospital-based studies is consistently higher than in community-based studies.
Several recent systematic reviews have reported an MTR for oral lichen planus close to 1%. For example, in a meta-analysis of 78 studies with 25,848 patients, Gonzalez-Moles 2019 reported a malignant transformation rate of 1.1% (95% CI 0.8% to 1.5%), results similar to Fitzpatrick 2014 (1.1%), Giuliani 2019 (1.4%), and Locca 2020 (1.4% (95% CI 0.9% to 1.9%)). In a meta-analysis of 33 studies with 12,838 oral lichen planus patients, Idrees 2021 reported that 151 cases were initially considered to have progressed to carcinoma (1.2%). Following the application of strict criteria (the presence of a properly verified oral lichen planus histological diagnosis with absence of epithelial dysplasia, a clear description of the cancerous lesion developing at the same site as the verified oral lichen planus lesion, and a follow-up period of a minimum of 6 months prior to carcinoma development), this figure was reduced to 0.4%, with an overall pooled proportion MTR of 0.2% (95% CI 0.1% to 0.3%) (Idrees 2021). Ramos-García 2021 summarised the systematic reviews in this area.
The early detection and excision of high-risk oral leukoplakias (OL) may reduce the risk of malignant transformation (Mehanna 2009a). Leukoplakias can be treated by a number of di erent methods however, there is relatively little empirical evidence from randomised controlled trials and there remains some debate in the literature as to their e ectiveness (Holmstrup 2006; Lodi 2016). Systematic reviews have evaluated the evidence for surgical interventions (including laser therapy). Surgical laser excision of OL may decrease recurrence rates but have no e ect on malignant transformation when compared with conventional treatments (de Pauli 2020). There is scant experimental evidence for non-surgical interventions, though clinical resolution was observed, relapses were common (Lodi 2016).
In the United Kingdom, patients presenting with any new growth, an ulcer, or a white and red or red lesion persisting for more than 2 to 3 weeks, are urgently referred to Oral Medicine Units or Oral and Maxillofacial Surgery Units for further investigation (NICE 2016). Technologies to manage OSCC have progressed substantially (Bulsara 2018; Furness 2011; Glenny 2010; Shaw 2020), but surgery, radiotherapy, chemotherapy, and now immunotherapy are associated with significant morbidity. Despite this, mortality and survival rates have, however, remained high (approximately 50%) and typically have remained unchanged over several decades (Warnakulasuriya 2009;Warnakulasuriya 2020), and this appears to point to the late presentations or aggressive biological behaviour of some OSCCs. There is a need for centralization of expertise while remaining accessible to the patient (Ogden 2020). If the lesion is diagnosed as OSCC the traditional treatment is surgery and radiotherapy, but the associated morbidity is high. This is in marked contrast to the improved mortality and survival rates in many other cancers, such as those of the breast and the colon (Cancer Research UK 2020). Reasons for this include that late presentation of OSCC may be related to delayed diagnosis (a combination of patient factors such as infrequent visits to the dentist or physician, and clinician factors, such as failure to screen the entire mouth, failure to raise the index of suspicion regarding any lesion they may see or delays in onward referral) (Seoane 2016). Yet early OSCC can o en be asymptomatic and is more amenable to a cure if detected as localised stage I or II disease (Ganly 2012).
Vital rinsing/staining has been an available adjunct to a conventional oral examination (COE) for several years (Lestón 2010; Lingen 2008), whilst light-based detection systems have become commercially available more recently. Blood analysis and saliva analysis are at a relatively early stage of evaluation (Additional Table 1). Index tests evaluated up to this point of specific interest to opportunistic screening or mass screening programmes outside of a clinical setting include conventional oral examination by clinicians or trained healthcare workers, mouth self-examination, blood and saliva analyses, or remote assessment.
Where access to clinicians, such as dentists/dental hygienists or physicians/allied medical workers is limited, population screening using oral examination, vital staining or rinsing, light-based detection, blood and saliva analyses, and remote examination could, in principle, be undertaken by trained community healthcare workers.
Mouth self-examination is a simple technique with universal application. This is usually undertaken in the home setting in accordance with instructional material, and the target condition is typically the presence of a visible lesion in the oral cavity. It is simple to carry out and has a limited cost, but the significant disadvantage is that it is being performed by a trained or untrained novice who can only determine, at best, the presence or absence of oral lesions. Mouth self-examination cannot definitively di erentiate between OSCC, OPMDs, and benign lesions. Studies examining the ability of individuals to perform mouth self-examination have reported the quality of examinations of adolescents and adults to be unsatisfactory in terms of retraction and visualisation of the oral mucosa, and care and attention whilst carrying out the examination (Furquim 2014; Pivovar 2017a). The participants in these studies received no supporting literature or instruction prior to carrying out the self-examination.
A companion Cochrane Review evaluates the diagnostic accuracy of index tests in individuals presenting with clinically evident lesions (Walsh 2021).

Clinical pathway
Typically, individuals receive a COE as part of a routine dental appointment. The COE involves a standard visual and tactile examination of the oral mucosa under normal (incandescent) light. Alternatively, patients may occasionally present to the dental clinic with symptoms. Upon discovering a lesion, the clinician i.e. the dentist or dental hygienist, makes a subjective judgement based upon clinical presentation. If an OPMD or OSCC is suspected, the frontline clinician refers onward to an oral specialist for a scalpel biopsy to render the definitive diagnosis. In some healthcare systems, for example in Spain, the biopsy is o en carried out by the dentist.
Not all individuals regularly attend for a routine dental appointment, particularly in countries where access to healthcare resources are limited. Given the clear benefits of early detection, the screening of asymptomatic individuals would seem sensible. Screening can be carried out opportunistically, when an individual presents for a dental appointment, as part of a routine surveillance appointment for patients with a history of OPMDs or OSCC who need close monitoring, or as part of an organised screening programme carried out by a dentist or other healthcare worker. If the outcome of the screening activity detects a lesion that elicits concern, the individual is usually referred for further investigation by a specialist; it could be an examination/biopsy by an oral medicine specialist, oral pathologist, oral surgeon, or otolaryngologist at a secondary or tertiary clinic.
The policies for promoting screening programmes for OPMD and OSCC remain controversial, with the US Preventive Services Task Force concluding that there is insu icient evidence regarding the benefits and harms of screening for OPMD and OSCC by primary care providers in asymptomatic adults (Moyer 2014). In asymptomatic high-risk individuals, however, the picture may be di erent. A population-based national screening programme in Taiwan targeting betel-quid-chewing or cigarette-smoking individuals deemed to be at high risk of oral cancer compared health outcomes between screened and non-screened individuals. With an overall screening rate of 55.1%, the study reported a risk ratio (RR) of death from oral cancer of 0.53 (95% CI 0.51 to 0.56) compared with the expected risk of oral cancer deaths in the absence of screening (RR 0.74 (95% CI 0.72 to 0.77) a er adjusting for self-selection bias), and a RR of 0.62 (95% CI 0.59 to 0.64) for advanced oral cancer (RR 0.79 (95% CI 0.76 to 0.82) a er adjustment for self-selection bias) (Chuang 2017). A re-analysis of the Kerala Oral Cancer Screening Trial where healthcare workers performed visual oral examinations reported that mortality was reduced by 27% in the screening arm compared to the control arm (hazard ratio (HR) 0.73; 95% CI 0.54 to 0.98), including a 29% reduction in ever-tobacco or ever-alcohol users or both (HR 0.71; 95% CI 0.51 to 0.99) (Cheung 2021). Galvão-Moreira 2017 suggested that screening strategies for OSCC should target populations at greater risk of disease in areas with high incidence of disease through visual examination by trained health workers or specialists in order to decrease the burden of disease. Similarly Mandal et al suggested that screening of habitual tobacco or alcohol users with oral examination may be prudent in countries with a high burden of oral cancer where healthcare resources are sparse or where competing healthcare priorities exist (Mandal 2018). There is limited evidence available, but the addition of adjunct tools to the COE by dentists may not prove fruitful in terms of reducing oral cancer incidence in a screening programme. For example, in a randomised controlled trial of screening with COE plus toluidine blue versus COE alone carried out in Taiwan amongst 28,167 highrisk individuals, a non-significant reduction of 21% in oral cancer incidence was reported in the individuals screened with COE plus toluidine blue (28.0 x 10(-5) versus 35.4 x 10(-5)) (Su 2010).

Rationale
Cochrane Oral Health undertook an extensive prioritisation exercise in 2020 to identify a core portfolio of titles that were the most clinically important ones to maintain in the Cochrane Library. Consequently, this review was identified as a priority title (COH priority reviews).
Oral cancer is a significant global health problem with an estimated 354,864 new cases and 177,384 deaths in 2018 (Bray 2018), and reported increases in incidence and mortality rates in many countries in the globe (Jin 2016; Shield 2017; Warnakulasuriya 2009). More recently, the trends of oral cancer incidence indicated two contrasting patterns between the sexes; in males, most cancer registry populations exhibited decreasing trends while in females, rising rates were seen in most populations (Miranda-Filho 2020). There is wide geographic variation in disease incidence and mortality, with almost double the incidence in lower-and middleincome countries compared to high-income countries, and a threefold increase in mortality. Tobacco use, alcohol consumption, betel-quid-chewing and low socioeconomic status are the most important risk factors for oral cancer (Conway 2008; IARC 2012). Human papillomavirus is not considered a significant risk for oral cavity cancers but is a major risk factor for oropharyngeal cancer (Kreimer 2020). Men have a higher incidence of oral cancer than women, but the gender di erence has narrowed in recent decades from a ratio of five males to one female diagnosed with OSCC in the 1960s to less than two to one in 2008 (Ferlay 2010). Although traditionally the risk of oral cancer increases with age, since the 1980s the incidence amongst younger adults has increased in the European Union and the United States (Warnakulasuriya 2009).
Oral cancer mortality can be reduced by: (i) primary prevention, (ii) secondary prevention (screening and early detection), and (iii) improved treatment (ERO-FDI 2019). Accurate case detection and early treatment of oral cancers can substantially improve an individual's outlook with respect to morbidity, mortality, and quality of life (Speight 2017). However, no national populationbased screening programmes for oral cancer has yet been implemented in high-income countries, although opportunistic screening has been advocated (Speight 2017). Oral cancer screening models feasible for high-risk countries have recently been reviewed (Nagao 2020).
There is some debate in the literature on anticipated di erences in diagnostic accuracy of prospective population-based invitational screening programmes and a more opportunistic approach (when patients attend their dental practitioner or to a lesser extent their physician, for routine examination or for treatment). In Downer et al's systematic review of test performance in screening for OSCC and OPMDs, only prospective investigations of population screening with specified reference standards were included. The pooled sensitivities and specificities were 0.85 (95% CI 0.730 to 0.919) and 0.97 (95% CI 0.930 to 0.982) respectively (Downer 2004). An opportunistic approach that focuses on high-risk groups is also possible (McGurk 2010; Sankaranarayanan 1997). A simulation study which used neural network and machine learning techniques suggested opportunistic screening aimed at high-risk groups may be both e ective and cost-e ective (Speight 2006). However, many individuals with risk factors may not attend the dentist (or the physician) and are therefore not amenable to an opportunistic approach (Netuveli 2006;Yusof 2006). A review of the literature on screening models for OPMDs and OSCC has identified a huge potential for new research directions in this area (Warnakulasuriya 2021).
In this systematic review we have identified screening tests for OPMD and OSCC to evaluate the diagnostic accuracy of the COE and other index tests, used alone or in combination, in asymptomatic adults. The index tests proposed for evaluation in this review are suitable for use in a general dental practitioner's o ice as part of a dental examination, or in an organized community screening event. The proposed index tests cannot confirm whether a 'positive' finding is indeed an OSCC or dysplastic OPMD before deciding on referral to secondary care; biopsy with histopathology is currently the only confirmatory method of diagnosing OSCC or dysplasia.
This diagnostic test accuracy review complements a number of intervention reviews undertaken by Cochrane Oral Health on the treatment of oral and oropharynx cancers (Bulsara 2018; Furness 2011; Glenny 2010) and oral leukoplakia (Lodi 2016), screening programmes for the early detection and prevention of OSCC (Brocklehurst 2013). This review was originally published in 2013 as clinical assessment to screen for the detection of oral cavity cancer and oral potentially malignant disorders in apparently healthy adults (Walsh 2013). In this updated Cochrane Review we have included contemporary studies irrespective of publication language and status, and assessed the body of evidence using GRADE (Schünemann 2020; Schünemann 2020a) to facilitate the production of summary of findings tables.

O B J E C T I V E S
To estimate the diagnostic test accuracy of index tests (conventional oral examination (COE), vital rinsing, light-based detection, mouth self-examination (MSE), remote screening, and biomarkers), used singly or in combination, for the early detection of oral potentially malignant disorders (OPMD) or oral squamous cell carcinoma (OSCC) in apparently healthy adults.

Types of studies
Eligible study designs were cross-sectional studies (or prospective consecutive series) and randomised controlled trials (RCTs) of test accuracy. Where randomised or paired comparative designs were available these were included in the review and analysed separately. We excluded case series and diagnostic case-control studies which have been shown to lead to inflated estimates of prevalence and test accuracy (Whiting 2004). We also excluded studies that reported in abstract form alone, uncontrolled reports, and randomised controlled trials of the e ectiveness of screening programmes (intervention studies). Only studies reporting test accuracy data in the form of a 2 x 2 table or where a 2 x 2 table could be constructed from the information in the study report were included.

Participants
Apparently healthy adults not reporting symptoms of oral potentially malignant disorders (OPMD) or oral squamous cell carcinoma (OSCC), attending an organised screening or surveillance programme, or screened during attendance at a dental or physician examination. We did not exclude specific subgroups of patients in this review, such as high-risk cohorts or surveillance cohorts.

Target conditions
Following the consensus views of the expert working group of the World Health Organization (WHO) Collaborating Centre for Oral Cancer/Precancer (Warnakulasuriya 2007; Warnakulasuriya 2021a), the following OPMDs and malignancies were considered as constituting a diseased classification: OSCC; OPMD represent a heterogeneous group of conditions including leukoplakia, erythroplakia, proliferative verrucous leukoplakia, oral lichen planus/oral lichenoid lesions, oral submucous fibrosis, and actinic keratosis (Warnakulasuriya 2007; Warnakulasuriya 2020). Where studies evaluated COE by someone other than a dentist or physician, or mouth self-examination, the target condition of the index test was typically expressed as the presence or absence of an oral lesion.

Reference standards
The reference standard was examination and clinical evaluation by a clinician with specialist knowledge or training, working to the current diagnostic guidelines of their locality. At the most experienced level, this would be an oral and maxillofacial pathologist or oral medicine specialist, possibly utilising biopsy with histology where clinically appropriate. More commonly, this was expected to include general dentists in receipt of supplementary training in the detection and identification of OPMDs and OSCCs. We included studies where confirmation of individuals who were screened as negative by the index test was obtained from extended follow-up. To be eligible for inclusion in the review, at least a proportion of the screened negatives were required to be verified. For each study we noted the diagnostic protocol, guidelines or registry used for follow-up in the Characteristics of included studies table. Studies with confirmatory biopsy of individuals who were screened as negative by the index test were eligible for inclusion although ethically questionable (Downer 2004).
Where a histopathological reference standard was employed this review classified any level of dysplasia (mild, moderate, or severe) as disease-positive.

Searching other resources
The following trial registries were searched for ongoing studies:

Selection of studies
Two review authors independently assessed the titles and abstracts of all articles identified from the searches. Full-text reports were obtained for those appearing to meet the inclusion criteria, or where a clear decision was unable to be made from scanning the title and abstract alone. Where disagreements occurred, these were resolved by discussion with the review team.

Data extraction and management
Two review authors independently extracted data using a piloted data collection form. Discrepancies were resolved through discussion with the review team. Study authors were contacted to obtain relevant missing data if these were not available in the printed report.
From each study, we extracted the following data. •

Assessment of methodological quality
We used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool (Whiting 2011) to assess the quality of the included studies over four key domains: patient selection, index test, reference standard, and flow and timing of participants through the study. The QUADAS-2 tool was tailored specifically for this review (Additional Table 2). Review specific guidance was used to facilitate documentation of the pertinent descriptive information contained in the studies. Two core signalling questions were removed: 'Was a case-control design avoided?' (this study design was excluded from the review); and 'Did all patients receive a reference standard?' (this was a criterion for inclusion). Two additional signalling items relating to commercial funding and multiple index tests were added to the core signalling questions.
Responses to the signalling questions, risk of bias, and applicability judgements are presented in the Characteristics of included studies tables and summarised graphically.

Statistical analysis and data synthesis
Data for the true-positive, true-negative, false-positive, and falsenegative values for each test in each study was entered into Review Manager (Review Manager 2020). Estimates of diagnostic accuracy were expressed as sensitivity and specificity with 95% confidence intervals (CI) for each study and for each available data point if there were multiple index tests or lesions reported within a single study. Study estimates of sensitivity and specificity were plotted on coupled forest plots and in receiver operating characteristic (ROC) space.
Where studies directly evaluated the comparative accuracy of more than one index test with the reference standard, i.e. randomising individuals to di erent index tests, we planned to report the results of these studies separately.
For the primary analysis we had intended to undertake a metaanalysis to combine the results of the studies for each index test. However, the substantial diversity of characteristics of the included studies meant that this was not appropriate.

Investigations of heterogeneity
We planned to explore possible sources of heterogeneity through meta-regression including the following covariates: • characteristics of the study sample (prevalence of OSCC or OPMD in the study (> 50% prevalence), inclusion of human papillomavirus (HPV) + adults, tobacco users/high alcohol consumption); • target condition (OSCC alone or OSCC and potentially malignant disorders (PMD)); • aspects of study design (prospective organised or opportunistic); • type of reference standard (examination and clinical evaluation by physician with specialist knowledge or extended follow-up) and operator (dentist, physician, or other healthcare workers).

Sensitivity analyses
No sensitivity analyses were planned.

Assessment of reporting bias
Tests for reporting bias were not conducted because current tests are misleading when applied to systematic reviews of diagnostic test accuracy (Leeflang 2008).

Summary of findings and assessment of the certainty of the evidence
We reported our results for the di erent index tests following GRADE methods (Schünemann 2020; Schünemann 2020a), and using the GRADEPro online tool (www.guidelinedevelopment.org).
To enhance readability and understanding, we planned to present test accuracy results in natural frequencies to indicate numbers of false positives and false negatives. We assessed the certainty of the body of evidence with reference to the overall risk of bias of the included studies, the indirectness of the evidence, the inconsistency of the results, and the imprecision of the estimates. We categorised the certainty of the body of evidence as high, moderate, low, or very low. one study is ongoing, one previously ongoing study has not yet reported results and is awaiting classification, one study is awaiting classification pending further details of the study design from the authors. Eight studies were excluded ( Figure 1).

Methodological quality of included studies
The assessment of methodological quality is presented graphically in Figure 2 and summarised by index test in Figure 3. The accuracy of detecting oral potentially malignant disorders (OPMDs) and oral squamous cell carcinoma (OSCC) was evaluated in a variety of di erent settings. In Tokoname, Japan, all residents of 60 years of age were invited by mail to attend a dental screening programme at a health centre (Ikeda 1995). In Kerala, India, basic healthcare workers incorporated screening into their routine house visits (Mathew 1997; Mehta 1986) as in Sri Lanka (Warnakulasuriya 1990; Warnakulasuriya 1991). In the UK, the feasibility and accuracy of workplace screening was evaluated in one study (Downer 1995), of screening patients at a medical practice in another (Jullien 1995a), and opportunistically in patients attending a dental hospital for an outpatient appointment (Jullien 1995). In Taiwan, screening was o ered to individuals attending a tertiary referral centre (Chang 2011). In the USA, screening was part of the routine surveillance visit of patients attending an otolaryngology clinic (Sweeny 2011).
Risk of bias for the patient selection domain was low for all studies with one exception (Julien 1995). This study was judged as unclear as the method of patient selection for this opportunistic screening study was not reported. Two studies were judged to be of low concern for applicability ( We assessed the risk of bias for this domain as low in nine studies. The index test was carried out prior to the reference standard and a positivity threshold for the target condition was specified a priori. One study (Sweeny 2011) was judged to be at unclear risk of bias as there was a lack of clear definition of the target condition and the positivity threshold. All studies were judged to be at low concern regarding applicability.
We judged four studies (Downer 1995; Ikeda 1995; Jullien 1995; Jullien 1995a) to be at low risk of bias for the reference standard domain. In these studies the reference standard was carried out by experienced specialist physicians and the results were interpreted without knowledge of the results of the index tests. For the remaining studies it was unclear whether the reference standard personnel were unaware of the results of the index test when interpreting the reference standard. One study (Sweeny 2011) was judged to be at unclear concern regarding applicability as the target definition was recurrence of head and neck cancer; all other studies were judged as low concern.
For the flow and timing domain, two studies were judged to be at high risk of bias as a result of attrition following positive screen (37.5% of screen positive) and di erential verification (Chang 2011) and time from screen positive to receiving reference standard Warnakulasuriya 1990 Three studies (Chang 2011; Mehta 1986; Sweeny 2011) were judged as having high overall concerns regarding applicability, arising from patient selection of high-risk groups. Two studies (Jullien 1995; Jullien 1995a) were judged as having low overall concerns regarding applicability. For the remaining five studies an unclear concern regarding applicability in the patient selection domain resulted in an overall applicability judgement of unclear (Downer 1995; Ikeda 1995; Mathew 1997; Warnakulasuriya 1990; Warnakulasuriya 1991). There was high concern for applicability in one study that recruited and evaluated participants with Fanconi anaemia (which carries an increased risk of oral cancer) in a hospital setting (Furquim 2014), and participants that were identified and invited to participate based on their physician assessed risk of oral cancer (smokers aged 45 years or older) (Scott 2010).

Mouth self-examination
We gave a judgement of unclear risk of bias to three studies for the index test domain as it was not reported whether the results of the index test were interpreted without knowledge of the reference test in two studies (Elango 2011; Scott 2010), in one study there was insu icient information on the target condition and threshold in order to ascertain whether a pre-specified threshold was used (Furquim 2014). We gave a judgement of high concern regarding applicability for this domain to one study (Furquim 2014) where the mouth self-examination was undertaken without instruction, and low concern for the remaining three studies.
The risk of bias judgement for the reference standard domain was low for one study (Scott 2010), being evaluated by a dentist with specialist training and the reference test being carried out prior to the index test. We judged three studies to be at unclear risk of bias (Elango 2011; Furquim 2014; Ghani 2019) as there was a lack of information as to whether the reference standard was interpreted without knowledge of the index test. We judged three studies to be of low concern for the reference standard (Furquim 2014; Ghani 2019; Scott 2010) and one study that used general health workers specifically trained for the study to be of unclear concern (Elango 2011). The manuscript states that "the competence of the health workers [reference standard] was confirmed by a trained oral cancer specialist" but not reported. It is reasonable to assume that the implicit threshold for disease of the trained health workers would di er from that of an experienced oral medicine specialist.
Risk of bias was judged to be low for the flow and timing domain (Furquim 2014; Ghani 2019; Scott 2010) where there was (or could be assumed to be) an appropriate time interval between the index test and reference standard, all patients received the same reference standard, and all patients were included in the analysis.
There was a significant number of withdrawals and exclusions for non-compliance in one study (Elango 2011) which we judged to be at high risk of bias for this domain.
The overall risk of bias for the studies that evaluated mouth selfevaluation was judged to be unclear (Furquim 2014; Ghani 2019; Scott 2010) and high (Elango 2011). Concern regarding the overall applicability of the studies to the review question was high for two studies (Furquim 2014; Scott 2010), unclear for one study (Elango 2011), and low for the remaining study (Ghani 2019).

Conventional oral examination compared to conventional oral examination plus vital rinsing (toluidine blue)
No new eligible studies evaluating screening using conventional oral examination were included in this update.
One study (Su 2010) that directly compared two index tests in a randomised controlled trial was judged to be at low risk of bias for patient selection and index test domains. We judged the trial to be of high concern regarding applicability for the patient selection domain as individuals who "lacked oral habits" such as smoking or betel quid chewing were ineligible for the trial.
We judged the study to be at unclear risk of bias whether this was interpreted without knowledge of the results of the index tests is unclear. There was low concern regarding applicability of the reference standard. Risk of bias for the flow and timing domain was judged as low.
Overall risk of bias for this study was judged as unclear, based on the interpretation of the reference standard. Concern regarding the overall applicability of the study was high, arising from patient selection.

Remote screening (mobile applications)
Three studies, new to this review, evaluated remote screening in India (Birur 2019; Vinayagamoorthy 2019) and Brazil (Gomes 2017). One study was an organised screening programme in a workplace setting (Birur 2019) and two studies (Gomes 2017; Vinayagamoorthy 2019) were smaller feasibility or pilot studies that focused on the use of the technology.
Risk of bias assessments for the patient selection domain was judged to be unclear for two studies that stated that a convenience sample was selected but with limited information on the methods for selecting participants (Gomes 2017; Vinayagamoorthy 2019), but low risk of bias for the organised screening programme study (Birur 2019). There was high concern for applicability in two studies where smokers made up a large majority or the total sample, where the participants were all male, or where most participants were over 60 years of age (Birur 2019; Gomes 2017). We judged the applicability of the patient selection domain as unclear where there was little detail on the characteristics of the convenience sample (Vinayagamoorthy 2019).
We gave a judgement of unclear risk of bias to two studies for the index test domain as the threshold for the target condition was not explicitly reported (Gomes 2017; Vinayagamoorthy 2019), and a low risk of bias for one study (Birur 2019). We judged all studies to be at low concern for applicability for the index test domain.
The risk of bias judgement for the reference standard were low for all three studies as the reference standard personnel were typically oral medicine specialists, and we judged all studies to be at low concern for applicability.

Library
Trusted evidence. Informed decisions. Better health.

Cochrane Database of Systematic Reviews
Risk of bias was judged to be low for the flow and timing domain (Birur 2019; Vinayagamoorthy 2019) where there was (or could be assumed) an appropriate time interval between the index test and reference standard, and unclear where this was not explicitly stated or could not be assumed (Ghani 2019). All participants received the same reference standard any exclusion from analysis was minimal and related to poor quality of images (Birur 2019; Vinayagamoorthy 2019).
The overall risk of bias was judged to be low (Birur 2019) and unclear (Gomes 2017; Vinayagamoorthy 2019). Concern regarding the overall applicability of the studies to the review question was high for two studies (Birur 2019; Gomes 2017) on account of patient selection and unclear for one study (Vinayagamoorthy 2019).

Conventional oral examination/visual inspection
No new studies of conventional oral examination/visual inspection were included in this update. For the three studies with higher prevalence (Mathew 1997; Warnakulasuriya 1990; Warnakulasuriya 1991) sensitivity estimates ranged from 0.94 to 0.97, and specificities ranged from 0.75 to 0.98 ( Figure 4; Figure 5). For many of the studies the sensitivity estimates were imprecise, reflective of the low disease prevalence in the samples. Cochrane Database of Systematic Reviews

Figure 5. Summary receiver operating characteristic (ROC) plot of 1. Conventional oral examination.
For the three studies with higher prevalence, from 10.3% to 50.9%: Due to di erences in region, setting, nature of the index test, and reference standard we elected not to pool the studies.
A summary is given in the Summary of findings 1. We judged the overall certainty of the evidence to be low, and downgraded for inconsistency and indirectness. Cochrane Database of Systematic Reviews 0.33 (95% CI 0.10 to 0.65), specificity 0.54 (95% CI 0.37 to 0.69), respectively) ( Figure 6; Figure 7).

Cochrane Database of Systematic Reviews
A summary is given in the Summary of findings 2. We judged the overall certainty of the evidence to be very low, and downgraded for indirectness, inconsistency, and imprecision.

Conventional oral examination compared to conventional oral examination plus vital rinsing (toluidine blue)
We included one randomised controlled trial which directly compared the performance of conventional oral examination (COE) alone (3895 individuals) with COE plus vital staining (4080 individuals) with biopsy and long-term follow-up through a National Cancer Registry (Su 2010).
When we considered the trial arms independently, the estimates of sensitivity and specificity for the target condition of oral cancer in the trial arm of COE alone were 0.50 (95% CI 0.12 to 0.88) and 0.92 (95% CI 0.91 to 0.93) with a prevalence of 0.15%; the corresponding sensitivity and specificity values for the COE with vital rinsing adjunct were 0.40 (95% CI 0.05 to 0.85) and 0.91 (95% CI 0.90 to 0.91) with a prevalence of 0.12%.
A summary is given in the Summary of findings 3. We judged the certainty of the evidence as moderate, and downgraded one level due to indirectness in patient selection.

Remote screening (mobile applications)
Three studies (  Cochrane Database of Systematic Reviews

Figure 9. Summary receiver operating characteristic (ROC) plot of 3. Remote screening (mobile app).
Due to di erences in region, setting, and lack of information on case definition in some studies we elected not to pool the studies.
A summary is given in the Summary of findings 4. We judged the certainty of the evidence as very low, and downgraded two levels due to indirectness (applicability of the study sample) and for inconsistency.

Summary of main results
Eighteen studies were identified for inclusion evaluating the diagnostic accuracy of conventional oral examination (COE)/visual inspection, mouth self-examination, vital staining, and remote screening with mobile applications. The studies were diverse in nature with substantial variations in sample prognostic risk factors, nature of the screening test, the clinical specialty of personnel conducting the index test, verification of screened-negative and screened-positive individuals, exclusion of individuals from the analysis, and large variation in incidence of disease (including registry-based studies) across included studies. Consequently, the decision was taken that a meta-analysis of the included studies by index test was inappropriate. This is in contrast to some previously published systematic reviews (Downer 2004; Moles 2002).
Taken as a body of evidence, the overall quality of the studies was variable both within and between index tests with only one study (Jullien 1995a) of COE being judged as overall low risk of bias Cochrane Database of Systematic Reviews and overall low concern regarding applicability ( Figure 2). Many of the studies did not fully report on the characteristics and risk factors of the study sample, which precluded us from assessing the applicability of the results to a general screening population.
In eight studies the participants could be considered as 'highrisk' individuals and consequently their findings elicit high concern judgements for the applicability of participant sample to the review question.
Prevalence of oral potentially malignant disorders (OPMDs) or oral squamous cell carcinoma (OSCC) in the test accuracy study samples ranged from 1.4% to 59% over the di erent index tests. Estimates should be interpreted with respect to the diagnostic test accuracy study prevalence levels. A low prevalence of the target condition e ectively results in a lower sample size for diseased participants and for the calculation of sensitivity.
For COE/visual inspection, sensitivity estimates were highly variable for study level prevalence analogous to those in the population, and ranged from 0.50 (95% confidence interval (CI) 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00) for the largest study (Summary of findings 1). Lower specificity values were observed in the two studies where the disease prevalence was higher than would normally be observed (20% and 50%) in the general population, and can be explained at least in part by the higher prevalence.
In the within-study, between-person study of COE plus vital staining versus COE alone, estimates of sensitivity was slightly higher for COE alone, but specificities were similar across the trial arms (Summary of findings 3). Remote screening shows promise in terms of performance, but the estimates were imprecise in two of the three studies as these were pilot/feasibility studies on very small samples (Summary of findings 4).
Index tests at a prevalence reported in the population (between 1% and 5%) were better at correctly classifying the absence of OPMD or OSCC in disease-free individuals than classifying the presence in diseased individuals. A false-negative result from a screening programme would mean that the individuals with OPMD or OSCC would not be referred for further investigations; a false-positive result would mean a number of individuals without OPMD or OSCC would receive a positive-screening result, and would typically be referred for further investigations, possibly resulting in further excisional investigations for the patient. Whereas the false-positive results could and would no doubt have financial and other resource implications following inappropriate referral, the false-negative results indicate that people with OPMD or OSCC will be missed, possibly to be diagnosed at a later date when the disease becomes more advanced.
For this update we were able to provide judgement of the certainty of the evidence using a GRADE assessment. We judged the certainty of the evidence to be moderate for the within-study, between-person randomised controlled trial, low for COE/visual inspection, and very low for remote screening and mouth self-examination.

Strengths and weaknesses of the review
The utility of this review is limited in part by the number of included studies. A number of potentially eligible studies of sizeable organised screening programmes were excluded on the basis that the screened-negative individuals were not confirmed by a reference standard, or the results of the reference standard for the screened-negative individuals were not reported. Consequently, the number of false negatives could not be determined. In large screening programmes establishing a reference standard for all screened-negative individuals may not be possible. In such instances researchers could consider the possibility of a random subset of disease-free individuals to receive the index test.
We took the decision to exclude case-control or 'two-gate' accuracy studies, where two (or more) sets of eligibility criteria are used to recruit participants, owing to the potential for over estimation of diagnostic accuracy estimates with this design. However, this has meant that the index tests evaluated in this review do not include those based on newer technologies. We would anticipate that those index tests showing promise at this present time, would be further evaluated with a more robust study design and therefore be eligible for inclusion in future updates of this review.
Following on from previous systematic reviews in this area (e.g. Downer 2004), a further five test accuracy studies have been identified and included in this review, along with one ongoing study and one study awaiting classification as the results have yet to be reported. A key strength of this review is the inclusion of studies that evaluated a range of index tests. With this update we have included studies that evaluated remote screening, as well as additional studies for mouth self-examination, along with existing studies that evaluated conventional oral examination usually by dental professionals and visual inspection by other healthcare professionals. Whilst the diverse nature of the studies within the di erent categories of index tests precluded pooling of the studies, the reader is provided with an overview of the body of evidence, including the methodological quality, of di erent tests to screen for OPMDs and malignancies. Simultaneous consideration of accuracy estimates along with methodological strengths and weaknesses is essential in making appropriate inferences from the primary studies.
Due to the substantial diversity in the nature of the included studies and the characteristics of the participants it was not appropriate to pool the data, even within each category of index test. Whilst this is not a weakness of the review, the failure to provide summary estimates of sensitivity and specificity, in contrast to previous systematic reviews, could be regarded as a limitation. The range of accuracy estimates observed in this review is reflective of the considerable clinical and methodological heterogeneity across the included studies. In future updates should more homogeneous studies be included in the review, it would be informative to evaluate the influence of risk factors on estimates of diagnostic accuracy. However, we acknowledge that there was a lack of reported detail in a number of the included studies regarding the presence or absence of important risk factors such as smoking, betel quid chewing, and alcohol consumption.
The methods of recruitment and eligibility criteria di ered widely across the included studies. The World Health Organization defines screening as "the application of a test or tests to people who are apparently free from the disease in question in order to distinguish between those that have the disease from those who probably do not" (Wilson 1968). A di iculty with a number of the included studies was determining how representative the screened population was, given the settings for recruitment such as company headquarters, hospital outpatient departments, and tertiary treatment centres. It could be argued that the latter sample represents a distinct population with a much higher risk of developing new disease and one where clinicians are likely to encounter disease with a higher index of suspicion.
Prevalence of the included studies was in line with what would be expected; Napier 2008 argues that most authorities agree that this lies between 1% and 5%. However, the sample prevalence was particularly high in two studies of COE (Mathew 1997 10.3%, Ikeda 1995 9.7%) where a larger proportion of the population consumed tobacco (and engaged in other risk factors), and one study of mouth self-examination (Scott 2010 22.6%). In two studies of COE (Warnakulasuriya 1990; Warnakulasuriya 1991) the sample prevalence calculated from the 2 x 2 tables evaluating the test accuracy was particularly high at 21.6% and 50.9%. The screenedpositive prevalence for these studies was more in line with population prevalence at 4.2% and 6.2%.
The use of cancer registries or other registries as a reference standard (e.g. Chang 2011; Su 2010) can be methodologically problematic, particularly if there is a mismatch in the target condition being evaluated and the outcome documented in the registry. For example, cancer registries are unlikely to hold data on OPMDs that have not undergone malignant transformation, inducing a disconnect in the target condition being detected by the index test and the outcome recorded in the registry. Di erential verification bias can occur if screened-positive participants receive biopsy as a reference standard whilst the screened-negative participants are assessed through a national cancer registry alone. If there is potential for malignant transformation within the duration of follow-up then follow-up through registry could be appropriate. Careful thought should be given to the target condition of the index and reference standard and whether this information will be adequately recorded in the registry.

Applicability of findings to the review question
Only three studies were judged to be at overall low concern for applicability across the three domains of patient selection, index test, and reference standard. Concerns regarding applicability arose from targeted patient selection of high-risk groups for the patient selection domain, where participants had either a previous history of head and neck cancer or other medical conditions that put them at increased risk compared to that in the general population, or were older, typically male, and tobacco smokers. For example, participants in one study conducted in a tertiary care clinic (Chang 2011) were all males; and another study recruited former head and neck cancer patients undergoing routine surveillance visits (Sweeny 2011). One study recruited participants with Fanconi anaemia (Furquim 2014), a condition where there is a significantly higher incidence of head and neck squamous cell carcinoma compared with that observed in the general population (Kutler 2003). Studies with unclear concerns over in this domain were those that had omitted important information on patient or study characteristics which meant that we were unable to determine whether the participants and settings matched the review question. There was low concern regarding applicability for the index test domain for most studies. An unclear judgement for applicability of the reference standard was given to one study where six people had been identified from the target population to act as the reference standard (Elango 2011). Although exposed to training, it is questionable whether trained lay people could act as a reference standard, and there was some concern that the index test and reference test may have been conducted simultaneously for those who had not responded initially. A second study (Sweeny 2011) was also judged to be at unclear applicability on this domain. There was low concern regarding applicability for the remaining studies in this domain.

Implications for practice
There are known clinical and methodological di iculties associated with screening for oral potentially malignant disorders (OPMDs) and oral squamous cell carcinoma (OSCC) that include relatively low incidence rates, the reluctance of screened-positive individuals to attend for follow-up, a lack of linear transition between premalignant and malignant states (Reibel 2003), disagreement over disease management (Warnakulasuriya 2009), and the relative cost-e ectiveness of mass, selective, and opportunistic screening programmes (Brocklehurst 2011).
The lack of any formal registry for reporting OPMDs, in contrast to malignancy, makes it challenging to estimate possible reductions in mortality due to a screening programme aimed at precursor lesions. A recent population-based cohort study using electronic medical records has followed patients with oral leukoplakia and estimated the short-and long-term progression to OSCC (Chaturvedi 2020). And the e icacy of the early management of OPMDs is controversial, where even if lesions are surgically removed, the risk of malignant change may remain since the lesion represents only a small area in a field of damaged mucosa, any part of which may progress to malignancy (Holmstrup 2007; Holmstrup 2009).
The results of this review suggest that using the conventional oral examination (COE) or visual inspection for screening for OPMD and OSCC has a variable degree of sensitivity (greater than 0.70 in six of the 10 studies), and a consistently high value for specificity (greater than 0.90 in eight studies). However, there was considerable clinical heterogeneity in the study participants, the application of the index test and reference standard, and the flow and timing of the process. Exploring the primary studies for sources of heterogeneity has not shown any single factor to consistently influence the accuracy of the screening test. Further, even though the evidence of accuracy is not consistently strong, there is some evidence (Cheung 2021) that implementing COE as a component of a population screening programme can reduce mortality and produce stageshi in a high-risk population. Should similar findings be replicated in other studies then it could be argued that explicit evaluation of COE accuracy per se would no longer be necessary, given the positive outcomes on mortality. Emphasis could instead be placed on the e ectiveness of screening programmes, of which COE is a component, in reducing morbidity and mortality. This should be supplemented with information on the consequences of falsenegative and false-positive screens.
The potential for vital staining, brush cytology, or light-based devices to be used as an adjunct to the COE in screening to detect OPMDs and malignancies in apparently healthy individuals warrants further investigation (Moyer 2014). In the randomised controlled trial (RCT) of screening strategies, vital staining as an adjunct to COE was compared to COE alone, with a clinically important but not statistically significant di erence observed in health outcomes (Su 2010), and therefore the cost-e ectiveness of using adjunctive methods over and above the standard COE would need to be justified. The concept of combining technologies to improve test accuracy seems reasonable; however, it is not possible to support the combining of such tests as the data from this review were limited; more studies are needed. Ideally, the role of adjunctive tests is to reduce uncertainty in the diagnostic decision. With some tests this can be achieved by exploring di erent threshold levels. However, this is not possible with any of these tests as they all dichotomise patients as either diseased or healthy.

Implications for research
It is clear that there are some methodological shortcomings in the studies included in this review. The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool has provided a robust means of assessing the methodological quality of the included studies. There is now an opportunity to use this framework to ensure that future studies are conducted in a robust manner, with particular attention paid to the design of the study in the four domains of the QUADAS-2 tool. It is imperative that studies are reported with su icient information to allow judgement of the merits of the study and its applicability to the review being undertaken. Reporting according to the Standards for Reporting Diagnostic accuracy studies (STARD) checklist should facilitate this process. In particular, results have been promising in the workplace setting, and for some opportunistic screening studies.
The population and participant selection should be clearly stated and carried out to reduce the possibility of sampling bias, preferably using a consecutive sample. The study setting is particularly important as, for example, studies undertaken within an academic referral centre are rarely directly applicable to studies in a primary care setting. Only by undertaking studies in di erent settings with di erent assessors will we be able to attain a comprehensive picture of the diagnostic test accuracy of di erent testing mechanisms across di erent contexts. The index test should be undertaken by trained and calibrated screeners, whose threshold for agreement should be stated a priori.
The definition of the target condition as identified by the index test is crucial. O en this is recorded and reported as a 'suspicious lesion.' The term is ambiguous and is interpreted in the context of the assessor's experience. A suspicious lesion for a clinical specialist is one that has a high likelihood of malignancy or a high-grade dysplasia, and most oral medicine specialists are able to make a correct risk stratification on almost any mucosal abnormality. A suspicious lesion for a healthcare worker is any white patch. Remote evaluation of digital images by clinical experts largely overcomes this problem but does require intensive training of the screener to ensure that the images are of su icient quality. The reference standard should be both accurate and pragmatic to account for the practical considerations involved in establishing the initial diagnostic test accuracy component of large population screening programmes. For such programmes it is not necessary to apply the reference standard to the entire programme's participants, rather an initial evaluation of test accuracy should be established on a sizeable number of participants prior to commencement of the screening programme proper. It is also important to utilise reference standards that capture all the target conditions under question, not just those that are likely to be identified through cancer registries. Finally, the flow and timing of the diagnostic test accuracy study should ensure that the reference standard is undertaken within a short-time frame a er the index test, given the potential for pre-malignant disorders to undergo malignant transformation, and for it to be applied a er the index test to avoid bias being introduced. Where long-term follow-up is used as a reference standard, measures should be taken to minimise attrition. Further research on ways to maximise initial participation rates and also follow-up rates for those who screen positive is warranted.

Study characteristics
Patient Sampling Method of patient selection: workplace-based organised screening programme. Quote: "Before screening all employees attended an educative talk on the importance of prevention and early detection of oral cancer and were encouraged to participate in the screening program. All employees were potential participants in the screening"

Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Fluorescence)

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern DOMAIN 4: Flow and Timing Cochrane Database of Systematic Reviews could be obtained." "In order not to confound further analyses, we excluded those with positive lesions/yet no further biopsy during the follow-up period. Although 272 participants were excluded from the final analysis, there was little impact on the power of the statistic analysis due to the large population size" Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: quote: "We further cross linked the entire screened cohort with the Taiwan  Prevalence of the target condition on the sample: 17/309 5.5% Flow and timing Time interval and any interventions between index test(s) and reference standard: immediately following attendance at screening session, quote: "After screening..." Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: quote: "A number of sta who were screened will not have been included in the evaluation since they were unable to attend at one of the dedicated sessions and were therefore not examined by the specialist diagnostician." Separate values for those attending the screening and reference standard examination not reported Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: none Comparative Notes 68.2% proportion of participants at management grade or above. 53% participation rate

Low risk
Are there concerns that the included patients and setting do not match the review question?

DOMAIN 2: Index Test (Conventional oral examination)
Were the index test results interpreted without knowledge of the results of the reference standard?

Yes
If a threshold was used, was it prespecified?

Yes Was conflict of interest avoided? Yes
Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Mouth self-examination)

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Yes Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis?

Could the patient flow have introduced bias?
Low risk

Study characteristics
Patient Sampling Method of patient selection: quote: "The study population was distributed in two Panchayats (local administrative unit in villages) with 33 subunits. Brochures were sequentially distributed to all the houses in the subunits." After a lapse of 4 weeks "Health workers attempted to locate individuals up to a maximum of three times, incase they were unavailable during the first visit"  Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: from 48,080 participants initially eligible, 5761 unavailable for examination by reference standard, and a further 7553 "who did not comply with the study procedure were excluded from the study population." Results available for 34,766 participants (38% attrition) Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: reference standard carried out by a trained health worker Comparative Notes Possible bias introduced through exclusion of participants that did not comply with the procedure. Participants located in area of high prevalence of oral cancer and potentially malignant lesions

Cochrane Database of Systematic Reviews
Did the study avoid inappropriate exclusions? Yes

Low risk
Are there concerns that the included patients and setting do not match the review question?
Low concern DOMAIN 2: Index Test (Conventional oral examination)

DOMAIN 2: Index Test (Mouth self-examination)
Were the index test results interpreted without knowledge of the results of the reference standard?
Unclear If a threshold was used, was it pre-specified?

Yes
Was conflict of interest avoided?

Yes
Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Remote screening)

Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Yes Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis?

No
Could the patient flow have introduced bias?

Cochrane Database of Systematic Reviews
Description of positive case definition by index test as reported: not explicitly reported, presence or absence of abnormality, quote: "Immediately after MSE, participants were asked to answer questions about the presence and location of oral lesions"

Sequence of tests: index test followed by reference standard
Training or calibration: none provided until after the MSE, quote: "Finally, all participants were taught to perform MSE correctly using verbal and demonstrative instruction with the support of an educational banner and a pamphlet" Blinding of examiners: can be assumed, quote: "Immediately after MSE, participants were asked to answer questions about the presence and location of oral lesions." This was followed by the clinical examination by the oral specialist Flow and timing Time interval and any interventions between index test(s) and reference standard: not explicitly stated but assumed to be at the same appointment Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: quote: "Three patients were excluded because 1 had oral cancer and was not able to perform the examination properly and 2 others did not complete all the questionnaires." Results available for 44 participants (6% attrition) Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: reference standard carried out by an oral medicine specialist Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Remote screening)

DOMAIN 2: Index Test (Fluorescence) DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Unclear
Could the reference standard, its conduct, or its interpretation have introduced bias?

Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Unclear Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis? Yes

Could the patient flow have introduced bias?
Unclear risk Characteristics and proportion of individuals who did not receive the index test(s) and/ or reference standard or excluded from analysis: all received index and reference (data fully reported for results of most recent screening exercise only)

Gomes 2017 (Continued)
Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: screened positive did receive biopsy but data taken from

Could the conduct or interpretation of the index test have introduced bias?
Low risk

Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Mouth self-examination) DOMAIN 2: Index Test (Remote screening)

DOMAIN 2: Index Test (Fluorescence) DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard? Description of positive case definition by index test as reported: quotes: "A lesion was defined as positive when a white patch, red patch, or an ulcer of longer than two weeks duration was detected." "The screeners were also instructed to include lesions of lupus erythematosus, submucous fibrosis or actinic keratosis as positive." All types of lichen planus were also regarded as positive Sequence of tests: index followed by reference Training or calibration: quote: "..screeners advised of diagnostic criteria which should result in a positive or negative screen ...no formal training or standardisation was undertaken"

Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review)
Blinding of examiners: index test completed before reference Conflict of interests: supported by grant from the Department of Health, UK Target condition and reference standard(s) Reference standard: visual examination by second dental specialist who was able to refer subjects for further tests or review as appropriate (single specialist) Description of positive case definition by reference test as reported: as for index test. Quotes: "A lesion was defined as positive when a white patch, red patch, or an ulcer of longer than two weeks duration was detected." "The screeners were also instructed to include lesions of lupus erythematosus, submucous fibrosis or actinic keratosis as positive." All types of lichen planus were also regarded as positive Training or calibration: not stated but quoted as "a specialist." Single examiner so no calibration Blinding of examiners: index test completed before reference. Quotes: "The results were also recorded on a standard form which was collated with the screeners' form only after completion." "All subjects were examined by a specialist who provided an independent definitive diagnosis" Prevalence of the target condition on the sample: 32/1042 3.1% Flow and timing Time interval and any interventions between index test(s) and reference standard: not explicit, however, reasonable to assume both conducted on same visit Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: none Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: none Comparative Notes Participant characteristics reported for Jullien 1995 and Jullien 1995a together

Item
Authors' judgement Risk of bias Applicability concerns

DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?
Unclear Did the study avoid inappropriate exclusions?

Unclear risk
Are there concerns that the included patients and setting do not match the review question?
Low concern DOMAIN 2: Index Test (Conventional oral examination) Were the index test results interpreted without knowledge of the results of the reference standard?

Yes
If a threshold was used, was it prespecified?
Yes Jullien 1995 (Continued) Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review) Cochrane Database of Systematic Reviews

Was conflict of interest avoided? Yes
Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?
Could the conduct or interpretation of the index test have introduced bias?

Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Mouth self-examination) DOMAIN 2: Index Test (Remote screening)

DOMAIN 2: Index Test (Fluorescence) DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Yes Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis?

Yes
Could the patient flow have introduced bias?

Low risk
Jullien 1995 (Continued) Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults (Review) Description of positive case definition by index test as reported: quotes: "A lesion was defined as positive when a white patch, red patch, or an ulcer of longer than two weeks duration was detected." "The screeners were also instructed to include lesions of lupus erythematosus, submucous fibrosis or actinic keratosis as positive." All types of lichen planus were also regarded as positive Reference standard: visual examination by second dental specialist who was able to refer subjects for further tests or review as appropriate (single specialist) Description of positive case definition by reference test as reported: As for index test. Quotes: "A lesion was defined as positive when a white patch, red patch, or an ulcer of longer than two weeks duration was detected." "The screeners were also instructed to include lesions of lupus erythematosus, submucous fibrosis or actinic keratosis as positive." All types of lichen planus were also regarded as positive Training or calibration: not stated but quoted as "a specialist". Single examiner so no calibration completion." "All subjects were examined by a specialist who provided an independent definitive diagnosis" Prevalence of the target condition on the sample: 22/985 2.2% Flow and timing Time interval and any interventions between index test(s) and reference standard: not explicit, however, reasonable to assume both conducted on same visit Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: none Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: none Comparative Notes Participant characteristics reported for Jullien 1995 and Jullien 1995a together

Item Authors' judgement Risk of bias Applicability concerns DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?
Yes Did the study avoid inappropriate exclusions?

Low risk
Are there concerns that the included patients and setting do not match the review question?
Low concern DOMAIN 2: Index Test (Conventional oral examination) Were the index test results interpreted without knowledge of the results of the reference standard?

Yes
If a threshold was used, was it prespecified?

Was conflict of interest avoided? Yes
Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Mouth self-examination) DOMAIN 2: Index Test (Remote screening) DOMAIN 2: Index Test (Fluorescence)

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Yes Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis?

Yes
Could the patient flow have introduced bias?

Study characteristics
Patient Sampling Method of patient selection: re-examination of 2069 eligible participants from the 9000 participants recruited in January to May 1996, shortly after commencement of the study. Quote: "Subjects were selected by choosing densely inhabited areas to allow re-examination of as many subjects as possible in two weeks." Study looking at the reproducibility and validity of oral visual inspection by health workers within a randomised controlled intervention trial of visual screening Sequence of tests: initial screen by health worker followed by second screen (the index test) by same health worker (1 to 6 months later) to establish reliability. 2069 received the index test (second screen by health worker) and this formed the sample for the sensitivity and specificity calculations Training or calibration: quote: "Training sessions spread over 6 weeks composed of lectures, practical demonstrations and field work conducted by Faculty... At the end of training sessions written and practical tests were conducted identifying the best health workers.... They were also given manuals and photographic documentation to identify different types of oral lesions." The "best performing" health workers were retained for the study Reference standard: visual examination by a specialist physician (decision made by single physician, 1 of 3). Quote: "....comparison with pathological findings is not possible as biopsy has not been performed for most case. Biopsy is performed for cases of nodular leukoplakias, erythroplakias and suspicious growths only, and this is currently being undertaken" Training or calibration: 100 participants formed the basis of comparability of findings evaluation. Kappa value of 0.85 was reported for the findings of the 3 physicians Blinding of examiners: reference test undertaken immediately after index test. Both health worker and specialist in participants' home at the same visit Prevalence of the target condition on the sample: 212/2069 10.3% Flow and timing Time interval and any interventions between index test(s) and reference standard: quote: "This was immediately followed by an independent examination of the same subject by one of three physicians" Characteristics and proportion of individuals who received a reference standard other than examination and clinical evaluation by a specialist physician: none Comparative Notes

Item
Authors' judgement Risk of bias Applicability concerns

DOMAIN 1: Patient Selection
Was a consecutive or random sample of patients enrolled?
Yes Did the study avoid inappropriate exclusions?

Low risk
Are there concerns that the included patients and setting do not match the review question?

DOMAIN 2: Index Test (Conventional oral examination)
Were the index test results interpreted without knowledge of the results of the reference standard?

Yes
If a threshold was used, was it prespecified?

Yes
Was conflict of interest avoided? Yes Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Low risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Mouth self-examination) Cochrane Database of Systematic Reviews

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Unclear
Could the reference standard, its conduct, or its interpretation have introduced bias?

Unclear risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern

DOMAIN 4: Flow and Timing
Was there an appropriate interval between index test and reference standard?
Yes Did all patients receive the same reference standard?

Yes
Were all patients included in the analysis?

Yes
Could the patient flow have introduced bias?

Study characteristics
Patient Sampling Method of patient selection: for the screening study, a basic health worker visited each household to report on health status in an area of high oral cancer prevalence. Quote: "Four adjacent blocks, two as study area I (pop 218728) and two as study area II (pop 250,399) were selected for this investigation." Field checking of the diagnosis of the health worker by the study dentist was initiated after 6 months and completed for 40 health workers. For each of the health workers' lists "A house with a lesion case was selected as a nodal point and all the available individuals from nearby houses who figured in the list were examined." Carried out on high risk individuals within a household "..i.e. people aged 35 years and above with tobacco habits"

Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern DOMAIN 4: Flow and Timing

Unclear risk
Are there concerns that the index test, its conduct, or interpretation differ from the review question?
Low concern DOMAIN 2: Index Test (Fluorescence)

DOMAIN 3: Reference Standard
Is the reference standards likely to correctly classify the target condition?

Yes
Were the reference standard results interpreted without knowledge of the results of the index tests?

Yes
Could the reference standard, its conduct, or its interpretation have introduced bias?

Low risk
Are there concerns that the target condition as defined by the reference standard does not match the question?
Low concern Flow and timing Time interval and any interventions between index test(s) and reference standard: re-examination of "660 cases who arrived at the referral centre within 18 months (January 1981 to June 1982) after case detection." Quote: "...negative cases randomly selected from PHC files.. were re-examined, during the three month period of initial PHC examinations" Characteristics and proportion of individuals who did not receive the index test(s) and/or reference standard or excluded from analysis: 87,277 adults were eligible for the screening programme of whom 29,295 were screened. Quotes: "All referred (screened positive) participants who arrived at the referral centre were re-examined by the project dentist to validate the PHC diagnosis." "A sample of negative cases was randomly selected from PHC files (in whom PHC workers had not recorded a lesion) were re-examined, during the three month period of initial examination. Biopsy was performed for high-risk cases (clinical suspicion of OPMD or OSCC, and in lesions detected by fluorescence visualization but not by conventional oral examination), and the presence of epithelial dysplasia or malignancy was assessed Flow and timing Quote: "Patients who had any oral mucosa lesion, either by COE or by FV, were referred to a specialist in oral diagnosis and oral pathology at a second level healthcare center. This professional conducted the final diagnosis process applicable for each case" Comparative -

Notes
Awaiting clarification from the authors of reference standard for the screened negative participants

A D D I T I O N A L T A B L E S Test Characteristics
Classification of response

Conventional oral examination (COE)
A standard visual and tactile examination of the oral mucosa under normal (incandescent) light The presence of an oral mucosal abnormality is classified as a positive test result; the absence of any oral mucosal abnormalities is classified as a negative test result Traditionally been used as an oral cancer screen, but its utility is debated (Lingen 2008) Advantages: quick and easy once trained, minimally invasive Disadvantages: oral mucosal abnormalities are not necessarily clinically or biologically malignant; only a small percentage of leukoplakias are progressive or become malignant; COE cannot distinguish between those that are or are not; some pre-cancerous lesions may exist within oral mucosa that appears clinically normal by COE alone (Lingen 2008) Vital rinsing (e.g. toluidine blue, tolonium chloride) Vital rinsing refers to the use of dyes such as toluidine blue or tolonium chloride to stain oral mucosa tissues for PMD or malignancy (Lestón 2010; Lingen 2008; Patton 2008). The procedure is as follows: The result of the test is classified as positive if tissue is stained and negative if no tissue is stained, or equivocal if no definitive result can be obtained Advantages: ability to define areas that could be malignant or abnormal but cannot be seen; assess the extent of the PMD for excision Disadvantages: benign inflammatory lesions subject to stain; failure of some cancerous lesions to stain; variation in test performance depending on how thorough the test procedures