Abstract |
Conventional algorithms for modeling clinical events focus on characterizing the differences between patients with varying outcomes in historical data sets used for the model derivation. For many clinical conditions with low prevalence and where small data sets are available, this approach to developing models is challenging due to the limited number of positive (that is, event) examples available for model training. Here, we investigate how the approach of developing clinical models might be improved across three distinct patient populations (patients with acute coronary syndrome enrolled in the DISPERSE2-TIMI33 and MERLIN-TIMI36 trials, patients undergoing inpatient surgery in the National Surgical Quality Improvement Program registry, and patients undergoing percutaneous coronary intervention in the Blue Cross Blue Shield of Michigan Cardiovascular Consortium registry). For each of these cases, we supplement an incomplete characterization of patient outcomes in the derivation data set (uncensored view of the data) with an additional characterization of the extent to which patients differ from the statistical support of their clinical characteristics (censored view of the data). Our approach exploits the same training data within the derivation cohort in multiple ways to improve the accuracy of prediction. We position this approach within the context of traditional supervised (2-class) and unsupervised (1-class) learning methods and present a 1.5-class approach for clinical decision-making. We describe a 1.5-class support vector machine (SVM) classification algorithm that implements this approach, and report on its performance relative to logistic regression and 2-class SVM classification with cost-sensitive weighting and oversampling. The 1.5-class SVM algorithm improved prediction accuracy relative to other approaches and may have value in predicting clinical events both at the bedside and for risk-adjusted quality of care assessment.
|
Authors | Chih-Chun Chia, Ilan Rubinfeld, Benjamin M Scirica, Sean McMillan, Hitinder S Gurm, Zeeshan Syed |
Journal | Science translational medicine
(Sci Transl Med)
Vol. 4
Issue 131
Pg. 131ra49
(Apr 25 2012)
ISSN: 1946-6242 [Electronic] United States |
PMID | 22539773
(Publication Type: Comparative Study, Journal Article, Research Support, Non-U.S. Gov't)
|
Topics |
- Acute Coronary Syndrome
(epidemiology, therapy)
- Algorithms
- Angioplasty, Balloon, Coronary
(adverse effects, mortality)
- Artificial Intelligence
- Data Mining
(statistics & numerical data)
- Decision Support Techniques
- Discriminant Analysis
- Humans
- Logistic Models
- Models, Statistical
- Myocardial Infarction
(epidemiology)
- Postoperative Complications
(epidemiology)
- Prevalence
- Quality Improvement
(standards)
- Quality Indicators, Health Care
(statistics & numerical data)
- Registries
- Reproducibility of Results
- Risk Assessment
- Risk Factors
- Surgical Procedures, Operative
(adverse effects)
- Treatment Outcome
- United States
|