Data Mining - Mehmed Kantardzic [86]
One example deployment was to the South Carolina Heart Center where the goal was to identify among 61,027 patients those at risk of Sudden Cardiac Death (SCD). Patients who have previously suffered a myocardial infarction (MI; heart attack) are at the highest risk of SCD. In 1997 a study was performed on the efficacy of implantable cardioverter defribillators (ICDs). It was found that patients, with a prior MI and low ventricular function, had their 20-month mortality rate drop from 19.8% to 14.2%. This implantation is now a standard recommendation. Previous to the REMIND system one had two options to find who would require the implantation of an ICD. The first option is to manually review the records of all patients to identify those who were eligible for an ICD. This would be extremely time-consuming considering the large number of records. The other approach would be to evaluate the need for an ICD during regular checkups. However, not all patients come in for regular checkups and there would be a high chance that not every patient would be carefully considered for the need of an ICD. The REMIND system was given access to billing and demographics databases and transcribed free text including histories, physical reports, physician progress notes, and lab reports. From these data the REMIND system processed all records on a single laptop in 5 h and found 383 patients who qualified for an ICD.
To check the validity of the 383 found patients, 383 randomly chosen patients were mixed with the 383 found previously. Then 150 patients were chosen from the 766 patient samples. An electrophysiologist manually reviewed the 150 patients being blinded to the selection made by the REMIND system. The REMIND system concurred with the manual analysis in 94% (141/150) of the patients. The sensitivity was 99% (69/70) and the specificity was 90% (72/80). Thus it was shown that the REMIND system could fairly accurately identify at-risk patients in a large database. An expert was required to verify the results of the system. Additionally, all of the patients found would be reviewed by a physician before implantation would occur.
From the previous cases we see that a great deal of time was required from experts to prepare data for mining, and careful analysis of a model application needed to take place after deployment. Although applied data-mining techniques (neural and Bayesian networks) will be explained in the following chapters, the emphasis of these stories is on complexity of a data-mining process, and especially deployment