Derivation and validation of the CFracture competing risk fracture prediction tool compared with QFracture in older people and people with comorbidity

UK guidelines recommend the QFracture tool to predict the risk of major osteoporotic fracture and hip fracture, but QFracture calibration is poor, partly because it does not account for competing mortality risk. The aim of this study was to derive and validate a competing risk model to predict major osteoporotic fracture and hip fracture (CFracture) and compare its performance with that of QFracture in UK primary care. Methods We used UK linked primary care data from the Clinical Practice Research Datalink GOLD database to identify people aged 30–99 years, split into derivation and validation cohorts. In the derivation cohort, we derived models (CFracture) using the same covariates as QFracture with Fine-Gray competing risk modelling, and included the Charlson Comorbidity Index score as an additional predictor of non-fracture death. In a separate validation cohort, we examined discrimination (using Harrell’s C-statistic) and calibration of CFracture compared with QFracture. Reclassification analysis examined differences in the characteristics of patients reclassified as higher risk by CFracture but not by QFracture.


Introduction
Fragility fractures are a major cause of morbidity and mortality.The disease burden associated with fragility fractures has increased globally and is largest in India, China, the USA, Japan, and Germany. 1 In England and Wales, over 2 million people have osteoporosis, with one in three women and one in five men estimated to experience a fracture during their lifetime. 2Osteoporosis treatment options include antiresorptive agents (eg, bisphosphonates, denosumab, raloxifene, and hormone replacement therapy) and anabolic agents (eg, teriparatide and romosozumab).These treatments are often recommended by international guidelines on the basis of clinical, bone density, or risk prediction stratification.In the UK, risk prediction models are recommended to calculate the 10-year risk of major osteoporotic fractures (ie, wrist, proximal humerus, vertebral, or hip fractures) or hip fracture alone to guide decisions about investigation of bone mineral density and initiation of preventive treatment. 2The QFracture and FRAX 3,4 tools are recommended by the UK National Institute for Health and Care Excellence (NICE) for risk stratification in all women older than 65 years and men older than 75 years, or in individuals older than 50 years with additional risk factors.For example, measurement of bone mineral density is recommended for people with a 10-year fracture risk of 10% or greater, although local pathways might vary.In contrast, the UK National Osteoporosis Guideline Group recommends FRAX, and outside the UK other models such as the Garvan Fracture Risk Calculator are used. 5,6Unlike FRAX, QFracture does not account for competing mortality risk, but in contrast with QFracture, the FRAX prediction equation has never been published.
Cox models are commonly used to estimate how predictors affect the hazard of an outcome in survival analysis.Censored data (due to loss to follow-up) are common in such analyses.Cox models and the Kaplan-Meier estimate of risk assume that individuals lost to follow-up have the same fracture risk (or other predicted outcome) as those who remain in follow-up.8][9] These types of events (ie, deaths from other causes) are referred to as competing risk events.Failure to account for competing risk events leads to systematic overprediction of risk using standard Cox regression models, although the effect of this overprediction depends on how frequently competing risk events occur. 10rediction tools that do not account for competing mortality risk will overestimate fracture risk in older people and those with multimorbidity, meaning that some patients might be unnecessarily recommended for treatments that come with a risk of harm and treatment disutility.Despite having excellent discrimination in the population as a whole, QFracture has recently been shown to systematically overpredict fracture risk in older and comorbid people with competing mortality risks, while simultaneously underpredicting fracture risk in younger and healthier people due to insufficient ascertainment of fracture events in the original derivation study. 11he aim of this study was to derive and internally validate a prediction tool for major osteoporotic fracture and hip fracture that accounts for competing mortality risk, and to compare the new model's performance with that of QFracture.

Data source and population
We performed a cohort study using data from patients in the Clinical Practice Research Datalink (CPRD) GOLD database. 12,13CPRD-GOLD contains electronic health records from primary care, including data on health conditions, prescriptions, laboratory measurements (taken in primary care), and lifestyle values, with linked data for UK hospitalisation and death registration, and is broadly representative of the UK population.Eligible patients had to be permanently registered with a general practice contributing up-to-standard data for at least 1 year; have linkage to Hospital Episodes Statistics and Office for National Statistics mortality data; be aged 30-99 years; and have observable records on or after Jan 1, 2004.Cohort exit was the earliest of: first fracture outcome event; nonfracture death; deregistration from the general practice; date of the last data collection from the practice; or March 31, 2016.The study was approved by the Medicines and Healthcare products Regulatory Agency's Independent Scientific Advisory Committee for database studies (reference ISAC 16/248) and was therefore exempt from ethical approval and patient consent.

Outcomes
The outcomes of interest were major osteoporotic fracture and hip fracture, as in QFracture. 3Major osteoporotic fracture was defined as hip, vertebral, wrist, or proximal

Research in context
Evidence before this study Decisions to start long-term medication to prevent fracture events are guided by estimation of fracture risk, with investigation or treatment offered if patients exceed a particular risk threshold.UK National Institute of Health and Care Excellence guidelines recommend the QFracture risk prediction tool to inform clinical decisions, but QFracture has been shown to underpredict fracture risk, particularly in young people, and to overpredict fracture risk in older adults and individuals with multimorbidity.We searched PubMed from inception to June 16, 2022, for articles in English, using the search strategy [fracture[Title/Abstract] AND (predict*[Title/ Abstract]) AND ('competing risk'[Title/Abstract]) AND (osteoporosis[Title/Abstract]) AND (mortality[Title/Abstract])].We identified seven articles, of which four performed a competing risk analysis in the context of fracture prediction.All studies predicted fracture risk in subpopulations only.Two studies were conducted in older people (≥60 years) to quantify the effects of the predictors on the transition risks to fracture and mortality over 5 years, whereas the other quantified residual lifetime fracture risk.One study estimated fracture risk in people with type 2 diabetes, finding that failing to account for competing risk mortality overestimates fracture risk.Lastly, a competing risk model was used to predict fracture risk in post-menopausal women only, showing good discrimination and calibration.

Added value of this study
This study shows that accounting for competing mortality risk and modelling a predictor of mortality leads to more accurate (better calibrated) prediction of major osteoporotic fracture and hip fracture risk than QFracture.This new model, CFracture, also reclassifies patients with different characteristics, leading to a lower estimated number needed to treat to prevent a fracture, depending on different levels of predicted risk and fracture type.

Implications of all the available evidence
CFracture has better calibration than QFracture and might be more accurate in older people and in those with multimorbidity.Prediction might be improved using CFracture or an alternative competing risk model instead of QFracture.Prediction models aimed at older people and those with multimorbidity should consider accounting for competing risk.

Prediction model
Variables were included from the QFracture and CFracture model as outlined in the appendix (p 3).In contrast to QFracture derivation and internal validation, which used data on BMI, alcohol, and smoking status recorded after the date of cohort entry but before any fracture outcome in prediction, we restricted predictor values to those recorded before cohort entry.

Comorbidity
For each patient at baseline, a Charlson Comorbidity Index (CCI) score was additionally calculated on the basis of Read Codes. 14CCI category (defined as 0, 1, 2, or ≥3) was included in the competing risk model as a predictor of competing mortality risk.

Missing data
Missing data were managed as detailed in the appendix (p 4).Individuals with missing ethnicity were assumed to be White (as for QFracture derivation).For missing BMI, smoking status, and alcohol status, multivariate imputation by chained equations 15 was used to generate five imputed datasets that were combined using Rubin's rules.Morbidities and prescription medicines used for prediction were assumed to be absent if not recorded (similarly to QFracture's derivation). 3

Statistical analysis
No formal power calculation was done because the study size was determined by the data available in CPRD, which was greater than in QFracture derivation.We implemented the published QFracture-2012 risk model (under GNU Lesser General Public Licence, version 3) and calculated QFracture-predicted 10-year risk of a major osteoporotic fracture and hip fracture, using coefficients and baseline hazard based on QFracture's original derivation cohort.Patients were randomly allocated to a fixed derivation and test dataset in a 2:1 ratio, with the split balanced in terms of age and outcome status.The derivation dataset was used to derive CFracture, a Fine-Gray model to predict the 10-year risk of a major osteoporotic fracture or hip fracture event accounting for the competing risk of non-fracture death.Separate models were estimated for men and women.Fine-Gray models calculate the subdistribution hazard ratio (ie, the instantaneous risk of a fracture event in individuals who have not yet experienced a fracture event), simultaneously accounting for the occurrence of non-fracture death. 16Because we wished to explicitly compare prediction in a model accounting for competing risk versus QFracture, we included the same main effects and age interactions as in QFracture, using the same fractional polynomials for age and BMI.However, in CFracture, we also accounted for non-fracture death as a second (competing) outcome and included the CCI score as a validated predictor of mortality. 17These models allow the cumulative incidence function or probability of a fracture outcome occurring over time to be directly estimated.The proportional hazards assumption was assessed by plots of Schoenfeld residuals against time

Women Men
CFracture QFracture CFracture QFracture for each predictor, and by fitting and testing timedependent terms; no evidence was found to reject this assumption.

Major osteoporotic fracture
The performance of CFracture was compared with QFracture in the independent validation dataset by examining the new model's calibration (ie, how closely the predicted and observed probabilities agree) and discrimination (ie, the ability to differentiate those who experience the outcome during the study from those who do not, expressed through Harrell's C-statistic).A C-statistic of 0•5 indicates discrimination that is no better than chance, whereas a C-statistic of 1 indicates perfect discrimination. 18Calibration was evaluated by plotting the observed versus predicted risk for CFracture and QFracture.Observed risk was estimated using the Aalen-Johansen estimator, which accounts for competing mortality risk. 16,17Plots were generated separately by sex, for all patients and for prespecified subgroups of age and CCI on the basis of summary statistics pooled across the imputed dataset.We also calculated a quantitative summary measure of calibration using the ratio of observed 10-year risk to mean predicted risk, which assesses how close the predicted risk is to the overall observed outcome proportion.A ratio of 1 indicates perfect calibration; a ratio of less than 1 indicates that, on average, the model overpredicts; and a ratio of more than 1 indicates that, on average, the model underpredicts.Discrimination and calibration were additionally measured using a complete case analysis in the validation cohort.
We also examined changes in patients reclassified either side of several potential thresholds of 10-year fracture risk-namely 5%, 10%, and 20%.To visualise reclassification, we generated Sankey diagrams and scatterplots of risk predicted by CFracture versus QFracture.We examined the characteristics of reclassified patients and estimated the number needed to treat (NNT) to prevent one new major osteoporotic fracture or hip fracture, assuming that all people recommended for treatment take an osteoporosis treatment, using a hypothetical relative risk reduction of 20% and 40% for new fracture events.The NNT was calculated as: Analyses were conducted using R version 4.2.0.Specific R packages and versions included mice 3.14.0,survival 3.

Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Results
The derivation cohort included 1 831 606 women and 1 789 820 men, and the validation cohort included 915 803 women and 894 910 men.Baseline characteristics were similarly distributed in the derivation and validation cohorts (table 1).The number of women with missing data ranged from 441 510 (16•1%) for smoking status to 1 278 931 (46•6%) for ethnicity, and the number of men with missing data ranged from 705 402 (26•3%) for smoking status to 1 494 450 (55•7%) for ethnicity.2).Discrimination of CFracture varied by age group and CCI stratum in women, and was best in the younger women (30-64 years age group) and in the least comorbid groups (CCI 0-2) and worst in the oldest women (age groups >64 years) and in the most comorbid groups (CCI ≥3).For hip fracture, discrimination in men generally followed a similar pattern as in women, being best in younger men (30-64 years age group) and in the least comorbid groups (CCI 0-2), and worst in the oldest men (85-99 years age group) and in the most comorbid groups (CCI ≥3).However, for major osteoporotic fracture, discrimination of CFracture and QFracture in men was poor for all age groups, with some improvement in higher-comorbidity groups (CCI ≥1).
Calibration of CFracture for major osteoporotic fracture was better than QFracture in the whole population cohort, with some underprediction in women at higher levels of predicted risk.Calibration of CFracture for major osteoporotic fracture in men was similar, but additionally showed some overprediction in the middle range of predicted risk (figure 1).The ratio of observed 10-year risk to mean predicted risk in both men and women was closer to 1 with CFracture than with QFracture for prediction of major osteoporotic fracture (appendix p 14). Calibration of CFracture by sex and age group is shown in figure 2, and calibration by CCI stratification is shown in figure 3.
Calibration of CFracture for hip fracture was better than that of QFracture in the whole population cohort.Calibration was good in women across all levels of predicted risk, and good in men except in the highest decile of predicted risk, where there was underprediction (figure 1).The ratio of observed 10-year risk to mean predicted risk in women was closer to 1 with CFracture than with QFracture for prediction of hip fracture, but was similar in men (appendix p 14). Calibration was good in women aged 30-64 and 75-84 years, with some overprediction in women aged 65-74 years (appendix p 15).In men, calibration was reasonable in those aged 30-64, 65-74, and 75-84 years, with some underprediction at the highest level of predicted risk.In both men and women aged 85-99 years, calibration was poor, with overprediction at most levels of predicted risk, although it was considerably better than QFracture in this age group as well as all others.Stratified by CCI, calibration in women was good except in the highest decile of predicted risk, where there was some overprediction in those with a CCI of 1, 2, or 3 or higher (appendix p 16). Calibration in men stratified by CCI was good apart from underprediction in the highest decile of predicted risk for a CCI of 0 or 1, underprediction in the middle of the predicted risk range for a CCI of 2, and some overprediction in the highest decile of predicted risk for a CCI of 3 or higher.
The proportion of people reclassified by CFracture compared with QFracture above and below each risk threshold examined is shown in the appendix (pp 17-20).For major osteoporotic fracture in women, net reclassification up with QFracture occurred for those with a predicted risk below 15%, and net reclassification down occurred for those at or above 15%.For major osteoporotic fracture in men, net reclassification down occurred for those with a QFracture predicted risk at or above 10% and little movement for lower risk.For hip fracture, reclassification down was higher than reclassification up in women with a QFracture predicted risk at or above 10%, and across all levels of QFracture predicted risk in men.Scatterplots of CFracture versus QFracture predicted risk were in keeping with these differences (appendix p 21).
The number of patients classified at or above potential thresholds, number of fractures, and estimated NNT to prevent one fracture are shown in table 3.At a 20% risk threshold, CFracture had a lower estimated NNT than QFracture (eg, major osteoporotic fracture in women: NNT 30•6 [95% CI 28•7-32•7] vs 38•6 [36•5-40•9] with 20% treatment effectiveness).At a 10% risk threshold, CFracture had a similar NNT in women and a lower NNT in men than QFracture.At a 5% threshold, CFracture had a higher estimated NNT for both major osteoporotic fracture and hip fracture (except for hip fracture in men, where CFracture and QFracture produced similar NNT estimates).
Compared with women classified at or above a 10% risk threshold for major osteoporotic fracture by QFracture, women reclassified at or above a 10% risk threshold for major osteoporotic fracture by CFracture were younger and had a higher prevalence of current smoking and previous fracture, but a lower prevalence of dementia, cancer, cardiovascular disease, renal disease, and type 2 diabetes (appendix p 22).This pattern was reproduced in men, apart from current smoking status, which was similar in both groups.The pattern was similar for hip fracture but with smaller differences in these covariates (appendix p 23).
Discrimination and calibration of both QFracture and CFracture in the overall population cohort with complete case analysis was similar to the main results (appendix pp 24-25).

Discussion
Discrimination of CFracture in the whole population validation cohort ranged from good to excellent for

CCI=0
QFracture predicted risk CFracture predicted risk Reference major osteoporotic fracture and was excellent for hip fracture, similarly to QFracture discrimination.Worse discrimination occurred with increasing age except for major osteoporotic fracture in men, for whom no obvious relationship with age was noted.Stratified by CCI, no clear pattern with increasing CCI for discrimination of major osteoporotic fracture was noted, whereas discrimination for hip fracture was similar to the whole population cohort for a CCI of 0 and declined with increasing comorbidity (although it was good or excellent in all strata).CFracture was better calibrated than QFracture in the whole population cohort and in every stratum, although it was better calibrated in women than in men, and for hip fracture than for major osteoporotic fracture.CFracture was poorly calibrated in both women and men aged 85-99 years (although less so than QFracture).The first version of QFracture was externally validated in The Health Improvement Network (a UK database), and the updated version of QFracture was validated in CPRD-GOLD.Both validations showed excellent discrimination and calibration in the whole population cohort. 19,20However, these evaluated QFracture in the context of incomplete fracture ascertainment because fractures were only identified in primary care records (with or without linked mortality data) and did not include hospitalisations, as reflected in their lower fracture incidence, potentially leading to underprediction of fracture risk.QFracture has not been validated across age groups or in those with comorbidity, in whom we show it has worse calibration than a model accounting for competing mortality risk, consistent with systematic overprediction by QFracture.
Other than FRAX, few tools account for competing mortality risk in the context of fracture prediction.Two studies were conducted in older people (aged ≥60 years): one to quantify the effects of the predictors on the transition risks to fracture and mortality over 5 years, and the other to quantify residual lifetime fracture risk. 21,22One study estimated fracture risk in patients with type 2 diabetes, showing that failing to account for competing risk mortality overestimates fracture risk over a 5-year time horizon. 23Another competing risk model used to predict fracture risk showed good discrimination and calibration, but was only derived and validated for use in post-menopausal women. 24espite similar discrimination to QFracture, CFracture reclassified different patients around thresholds of predicted risk.Compared with QFracture, CFracture classified more women below a potential 10% risk threshold when predicting major osteoporotic fracture, similar numbers of women when predicting hip fracture, and fewer men for both major osteoporotic fracture and hip fracture.At a potential 20% risk threshold, CFracture also tended to have a lower estimated NNT for a 20% and 40% hypothetical treatment effectiveness.At lower risk thresholds (ie, 5% and 10%), differences tended to be smaller.CFracture recommended treatment in fewer patients with a history of comorbidities associated with death (as seen, for example, with the lower prevalence of   9 which could be important if such characteristics are associated with the likelihood of treatment response or adherence to therapy. Our study was robust and used data from a large, representative population.We included linked data for better fracture outcome ascertainment, which partly accounts for the differences in hip fracture and major osteoporotic fracture incidence seen in previous QFracture derivation and validation studies published in peer-reviewed journals. 3However, our study has limitations.We only allowed data collected before cohort entry to be used in prediction (because using future data in prediction models can lead to bias), which meant our study has greater amounts of missing data than those in the QFracture derivation.We have provided all codes used to define variables and all model estimates (unlike FRAX, for example, for which the algorithm has never been published), facilitating transparency and reproducibility; however, we could not compare the Read codes we used to define variables with QFracture because QFracture code lists were not publicly available at the time.Although we explicitly accounted for censoring due to death, our model assumes that people who deregister from a general practice have the same fracture risk as those who do not.However, deregistration from a general practice might be informative-eg, if some older people with many risk factors for fracture move into residential care.We are unable to identify subclinical vertebral fractures, which is true for any other fracture model using electronic health records or administrative health data, including QFracture.As has been done with QFracture and elsewhere, we used multiple imputation for these missing data, which relies on the assumption that all data are missing at random. 3 We had a higher proportion of missing data than in QFracture derivation because we avoided including forward-looking values in prediction.Despite this, sensitivity analysis using a complete case analysis showed results that were similar to those from the main analysis.For other variables, we assumed that the condition, or family history, was not present if it was not recorded.We also used a later index date for cohort entry than QFracture, because we wished to better account for improved data recording in electronic health records and because using increasingly historical data to derive clinical prediction tools might result in bias. 25CFracture was derived and validated in the same dataset (internally validated), whereas QFracture is externally validated, having been derived from a different dataset.Although the performance of QFracture in the derivation dataset and in CPRD-GOLD have been shown to be comparable, further external validation of CFracture is required. 26Fracture incidence increases with age and QFracture is recommended to be used in people aged 30-84 years.We found that QFracture and CFracture performed poorly in patients aged 85 years and older, in keeping with this guidance.This suggests that other risk factors that are not currently accounted for in this population might exist.The thresholds we used to estimate the NNT were arbitrary.A 10% threshold is the threshold at which intravenous bisphosphonates are considered cost-effective by NICE. 27The UK National Osteoporosis Guideline Group uses age-dependent thresholds to guide bone mineral density measurement until the age of 70 years, at which point a fixed lower (11•1%) and upper (20•3%) threshold is recommended. 5We also examined reclassification around a 5% threshold to explore impact if future guidelines were to recommend larger numbers of people for treatment (eg, as drug costs fall, the cost-effectiveness threshold also falls).CPRD-GOLD also does not include all general practices in England, and small numbers of people might not be registered with a general practice.Data collected within electronic health records might also be influenced by patients in poorer health, who are more likely to interact with health professionals.
We reported prediction over a 10-year period to show how accounting for competing mortality risk and including a predictor of non-fracture death improves performance compared with a nationally recommended model for 10-year fracture prediction.Other tools also predict fracture risk using a 10-year period, including FRAX and the Garvan Fracture Risk Calculator.This time period is likely to cover the duration of osteoporosis treatment and potential benefits that might continue if treatment is discontinued.Although shorter time periods have been reported to have limited benefit in aiding risk categorisation, they have been suggested to be considered in older adults. 28,29NICE recommends using either QFracture or FRAX to assess fracture risk, but recognises that they are designed differently and are not interchangeable.Since its release, FRAX models have been validated in many countries, but independent external validation is challenging at scale because the FRAX equation is not published. 30FRAX accounts for competing risk and can be used with or without bone mineral density.However, FRAX has been shown to overpredict fracture risk when an incomplete method of fracture ascertainment was used 31 and to underestimate major osteoporotic fracture risk in studies with more complete fracture ascertainment. 32FRAX has also been criticised for only using binary clinical risk factors that do not account for exposure response. 33This makes robust head-to-head comparisons between CFracture and FRAX difficult (although they should still be performed).The Garvan calculator has also been used to individualise the risk of fragility fractures over 10 years. 6arvan includes risk factors such as history of previous fracture, history of fall during the past 12 months, age, and bone mineral density, but does not account for competing risk.Omnibus models predicting for very wide age ranges will always have good discrimination because age is such a strong predictor, but good discrimination does not mean accurate prediction, as shown by differences in calibration.Therefore, competing risks should be routinely considered if prediction models are aimed at older populations with comorbidity.
In conclusion, QFracture underpredicted fracture risk in young and healthy individuals, and overpredicted fracture risk in older individuals and those with comorbidity.A tool such as CFracture that accounts for competing risk should therefore be considered for clinical use.

Contributors
BG, DRM, and PTD conceived and designed the study and obtained the funding.All authors contributed to study design and interpretation.SJL, MM, CE, and DRM led the data management.SJL led the analysis, supported by BG, DRM, and PTD.SJL and DRM drafted the paper, which all authors revised.SJL, BG, and DRM verified the underlying data.All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Declaration of interests
PTD reports a grant from AbbVie, outside the submitted work, and is a member of the NHS Scottish Medicines Consortium.DRM reports grants from the Chief Scientist Office, Health Data Research UK, and National Institute for Health and Care Research (NIHR), outside the submitted work.DRM was a member of the European Medicines Agency Pharmacovigilance Risk Assessment Committee.All other authors declare no competing interests.

Figure 1 :
Figure 1: Overall calibration of CFracture and QFracture for major osteoporotic fracture and hip fracture Observed risk is based on the Aalen-Johansen estimator, which accounts for competing mortality risk.Ideal calibration lies on the reference line, below the line is overprediction, and above the line is underprediction.

Figure 2 :
Figure 2: Calibration of CFracture and QFracture for major osteoporotic fracture by age group Observed risk is based on the Aalen-Johansen estimator, which accounts for competing mortality risk.Ideal calibration lies on the reference line, below the line is overprediction, and above the line is underprediction.

Figure 3 :
Figure 3: Calibration of CFracture and QFracture for MOF by comorbidity group Observed risk is based on the Aalen-Johansen estimator, which accounts for competing mortality risk.Ideal calibration lies on the reference line, below the line is overprediction, above the line is underprediction.CCI=Charlson Comorbidity Index.

Table 1 : Baseline data in the derivation and validation cohorts Articles e46
www.thelancet.com/healthy-longevityVol