Improved breast cancer risk models enable targeted screening strategies that achieve earlier detection and less screening harm than existing guidelines. To bring deep learning risk models to clinical practice, we need to further refine their accuracy, validate them across diverse populations, and demonstrate their potential to improve clinical workflows. We developed Mirai, a mammography-based deep learning model designed to predict risk at multiple timepoints, leverage potentially missing risk factor information, and produce predictions that are consistent across mammography machines. Mirai was trained on a large dataset from Massachusetts General Hospital (MGH) in the United States and tested on held-out test sets from MGH, Karolinska University Hospital in Sweden, and Chang Gung Memorial Hospital (CGMH) in Taiwan, obtaining C-indices of 0.76 (95% confidence interval, 0.74 to 0.80), 0.81 (0.79 to 0.82), and 0.79 (0.79 to 0.83), respectively. Mirai obtained significantly higher 5-year ROC AUCs than the Tyrer-Cuzick model (P < 0.001) and prior deep learning models Hybrid DL (P < 0.001) and Image-Only DL (P < 0.001), trained on the same dataset. Mirai more accurately identified high-risk patients than prior methods across all datasets. On the MGH test set, 41.5% (34.4 to 48.5) of patients who would develop cancer within 5 years were identified as high risk, compared with 36.1% (29.1 to 42.9) by Hybrid DL (P = 0.02) and 22.9% (15.9 to 29.6) by the Tyrer-Cuzick model (P < 0.001). Learn more
Background
Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is limited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model may provide more accurate risk prediction.
Purpose
To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast cancer risk models.
Materials and Methods
This retrospective study included 88 994 consecutive screening mammograms in 39 571 women between January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test sets, resulting in 71 689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer-Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve (AUCs) with DeLong test (P < .05).
Results
The test set included 3937 women, aged 56.20 years ± 10.04. Hybrid DL and image-only DL showed AUCs of 0.70 (95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67 (95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC (0.62; P < .001) and RF-LR (0.67; P = .01).
Conclusion
Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with the Tyrer-Cuzick (version 8) model.
Contributors: Constance Lehman, Tal Schuster, Tally Portnoi Learn more
Introduction:
Most risk assessment tools assume that the impact of risk factors is linear and cumulative. Using novel machine-learning techniques, we sought to design an interactive, nonlinear risk calculator for Emergency Surgery (ES).
Methods:
All ES patients in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) 2007 to 2013 database were included (derivation cohort). Optimal Classification Trees (OCT) were leveraged to train machine-learning algorithms to predict postoperative mortality, morbidity, and 18 specific complications (eg, sepsis, surgical site infection). Unlike classic heuristics (eg, logistic regression), OCT is adaptive and reboots itself with each variable, thus accounting for nonlinear interactions among variables. An application [Predictive OpTimal Trees in Emergency Surgery Risk (POTTER)] was then designed as the algorithms’ interactive and user-friendly interface. POTTER performance was measured (c-statistic) using the 2014 ACS-NSQIP database (validation cohort) and compared with the American Society of Anesthesiologists (ASA), Emergency Surgery Score (ESS), and ACS-NSQIP calculators’ performance.
Results:
Based on 382,960 ES patients, comprehensive decision-making algorithms were derived, and POTTER was created where the provider's answer to a question interactively dictates the subsequent question. For any specific patient, the number of questions needed to predict mortality ranged from 4 to 11. The mortality c-statistic was 0.9162, higher than ASA (0.8743), ESS (0.8910), and ACS (0.8975). The morbidity c-statistics was similarly the highest (0.8414).
Conclusion:
POTTER is a highly accurate and user-friendly ES risk calculator with the potential to continuously improve accuracy with ongoing machine-learning. POTTER might prove useful as a tool for bedside preoperative counseling of ES patients and families.
Contributors: Jack Dunn, George Velmahos, Haytham Kaafarani Learn more