Machine learning models for predicting adverse events after percutaneous coronary intervention

The data source

The Japan Cardiovascular Database-Keio Interhospital Cardiovascular Studies (JCD-KiCS) is a large prospective multicenter (n=15) PCI registry to collect clinical data from consecutive patients undergoing PCI in Japan, developed in collaboration with the National Cardiovascular Data Registry (NCDR) CathPCI9,10,11. In JCD-KiCS, all PCI procedures were conducted under the direction of the intervention team of each participating hospital according to standard care. Participating hospitals were instructed to record data from consecutive PCIs using an electronic data entry software system equipped with a data query engine and validations to maintain data quality. Data entry was performed by dedicated clinical research coordinators who were trained specifically for JCD-KiCS. Data quality was ensured through the use of an automatic validation system and bimonthly standardized education and training for clinical research coordinators. The Principal Study Coordinator (IU) and a thorough on-site audit by the Investigator (SK) ensured the correct registration of each patient. The protocol of this study was in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of the Faculty of Medicine of Keio University and the committee of each participating hospital (National Hospital Organization Review Board for Clinical Trials Eiju General Hospital Ethics Committee Saiseikai Utsunomiya Hospital Ethics Committee Research Ethics Committee, Tokyo Central Hospital Saiseikai Ethics Committee of Japanese Red Cross Ashikaga Hospital; Institutional Review Board of Kawasaki Municipal Hospital; Ethical Review Board of Saitama City Hospital; Institutional Review Board of Isehara Kyodo Hospital; Tokyo Dental College Institutional Review Board of Ichikawa General Hospital; Hiratsuka Municipal Hospital Independent Ethics Board; Saint Luke’s Health System Institutional Review Board; Municipal Hospital Institutional Review Board of Hino; and Yokohama Municipal Citizen Hospital Ethics Committee). All participants received verbal or written consent for baseline data collection, and informed consent was obtained from all participants individually.

Study population

We extracted 24,848 consecutive patients who underwent PCI between July 2008 and September 2020. Because multiple parameters are applied as input variables for one model and the exclusion criteria for other models (e.g., hemodialysis before PCI is an input variable of the hospital mortality model and the exclusion criteria of the AKI model), we constituted each outcome-specific cohort using a two-step procedure. First, we excluded patients with missing indications (n ​​= 967), those without hemoglobin before and after the intervention (n = 901) and those without serum creatinine before and after the intervention (n = 22) (analytical cohort ). Next, we applied specific exclusion criteria to the results, followed by imputation of missing values ​​to create each cohort (detailed in Fig. 1). Each population was randomly divided into a training set of 75% of patients and a test set of the remaining 25% of patients with approximately the same proportion of events.

Figure 1

Organization chart of the study. Abbreviations: CAD, coronary artery disease; PCI, percutaneous coronary intervention; JCD-KiCS, Japan Cardiovascular Database-Keio Interhospital Cardiovascular Studies; Hb, hemoglobin; Cr, creatinine; AKI, acute kidney injury; LR logistic regression model; XGB, extreme gradient boosting model.

Definitions and results

The definition of AKI, bleeding and in-hospital mortality was consistent with the original NCDR-CathPCI models4,5,6. Briefly, AKI was defined as an absolute increase ≥ 0.3 mg/dl or a relative increase ≥ 1.5 times in creatinine post PCI or reinitiation of dialysis. Bleeding was defined as any of the following occurring within 72 hours of PCI or prior to hospital discharge (whichever comes first): arterial access site bleeding reported at the site; retroperitoneal, gastrointestinal, genitourinary bleeding, intracranial hemorrhage, cardiac tamponade, or post-procedure hemoglobin decrease of 3 g/dl in patients with pre-procedure hemoglobin ≤ 16 g/dl, or post-procedure non-bypass blood transfusion for patients with pre-procedure hemoglobin ≥ 8 g/dl. In-hospital mortality was defined as any post-intervention death during the same hospitalization. Because JCD-KiCS was developed in collaboration with NCDR-Cath PCI, the majority of clinical variables were defined in accordance with the data dictionary (version 4.1)9. For example, cardiogenic shock was defined as a sustained episode (>30 min) of systolic blood pressure 2 determined to be secondary to cardiac dysfunction, and/or the need for intravenous inotropic or vasopressor agents or mechanical support to maintain blood pressure and cardiac index above specified levels within 24 h of intervention.

Treatment of missing data

After enrollment of the analytical cohort, we imputed the missing pre-procedural hemoglobin value with the post-procedural hemoglobin value for the developed model of AKI and in-hospital mortality, and imputed the missing values ​​of pre-procedural creatinine with those of post-procedure. procedural creatinine for developed models of bleeding and in-hospital mortality. Since the absence rate was

Development of a model

We have developed two models: the LR models and the XGB (Extreme Gradient Descent Boosting) models. XGB is an ML algorithm that creates a series of relatively simple decision trees combined with boosting methods to develop more robust final predictions. In the LR model, we used the same categorized variables from the original NCDR-CathPCI risk scores (original model), and in the XGB model, we used the same variables but treated the raw continuous variables that were categorized in the models. originals. The full list of variables was as follows:

  1. 1.

    AKI model: age (classified as 2), diabetes mellitus, prior heart failure, prior cerebrovascular disease, non-ST-segment elevation acute coronary syndrome (NSTEACS), ST-segment elevation myocardial infarction (STEMI), cardiogenic shock at presentation, cardiorespiratory arrest at presentation , anemia defined as admission hemoglobin less than 10 g/dL and use of IABP.

  2. 2.

    Bleeding pattern: STEMI, age (classified as 2), anterior PCI, eGFR (classified as 2), cardiogenic shock at presentation, female gender, hemoglobin at presentation (categorized as

  3. 3.

    In-hospital mortality pattern: age (classified as 2), NYHA IV classification at presentation, STEMI and PCI status (emergency, rescue, urgent and elective).

To optimize the hyperparameters of the XGB model, we used stratified triple cross-validation with random search. After determining the best hyperparameters, XGB models were developed using the entire training (restraint methods, additional material for further explanation). Additionally, we constructed the extended LR and XGB models using additional variables selected by clinical significance. The additional variables were:

  • Extended AKI model: Contrast volume and ICP timing (i.e. during work or vacation periods).

  • Extended bleeding pattern: number of antiplatelet agents, use of anticoagulants at PCI and timing of PCI.

  • Extended hospital mortality model: PCI technical failure, defined as failure to pass guides or when TIMI grade after PCI was 1 or 0 (slow flow or no flow), and timing of PCI.

Statistics and key indicators

Continuous variables were summarized as medians with interquartile ranges and compared using Mann-Whitney U tests, and categorical variables were summarized as frequencies and compared using chi-square tests or Fisher’s exact tests, as appropriate.

C-statistics with 95% confidence intervals (95% CI) based on Delong’s method and area under the area under the precision recall curve (PRAUC) were used to estimate model discrimination. Model calibration was assessed using the Brier score and the calibration curve. The Brier score is defined as the root mean square difference between observed and predicted results and ranges from 0 to 1.00, with 0 representing the best possible calibration. The two main decomposed components of the Brier score, reliability and resolution, were also assessed. Calibration charts were used to plot the mean risk score against the observed outcome rate for a given quintile of predicted risk. Additionally, we used the Net Reclassification Index (NRI) to assess the clinical utility of the LR and XGB models with cut-off values ​​of 10%, 4%, and 2.5% for AKI, bleeding, and pain. hospital mortality, respectively. A P a value

Sensitivity analysis

We used a multiple imputation method to deal with missing values ​​instead of a median imputation method. Multiple imputation model included all predefined predictors and outcomes as recommended12. Ten imputed data sets were generated and C statistics were combined using Rubin’s rules.

Software implementation

All analyzes were conducted in R (version 4.0.4; R Project for Statistical Computing, Vienna, Austria) with a set of tidymodels (version 0.1.2) packages for data preprocessing, hyperparameter tuning, training and performance measures13,14,15. We used xgboost (version 1.3.2.1) to boost extreme gradient descent16pROC (version 1.17.0.1) to calculate C statistics17check (version 1.42) to calculate Brier scores18predictABEL (version 1.2.4) to calculate the NRI19 mouse (version 3.14.0) to perform multiple imputation20.