[Paper] TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records (2023)
15 Feb 2024 #bio #ehr #transformer
Yang, Zhichao, et al. “TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.” Nature Communications 14.1 (2023): 7857.
Points
-
New pre-training objective: predicting all diseases or outcomes of a future visit
- Helps the model uncover the complex interrelations among different diseases and outcomes
-
TransformEHR, the generative encoder-decoder framework to predict patients’ ICD codes using their longitudinal EHRs
-
Validation of the generalizability using both internal and external datasets
-
Demonstrated a strong transfer learning capability of the model
-
Could be great with limited data and computing resources
-
Background
- Longitudinal electronic health records (EHRs) have been successfully used to predict clinical diseases or outcomes (congestive heart failure, sepsis mortality, mechanical ventilation, septic shock, diabetes, PTSD, etc.)
- With the availability of large cohorts and computational resources, deep learning (DL) based models outperform traditional machine learning (ML) models (Med-BERT, BEHRT, BRLTM, etc.)
- The existing pre-training tasks were limited in predicting a fraction of ICD codes within each visit :arrow_right: A novel pre-training strategy, which predicts the complete set of diseases and outcomes within a visit, might improve clinical predictive modeling
Method
Data
*VHA: Veterans Health Administration, the largest integrated healthcare system in the US, providing care at 1,321 healthcare facilities
Pre-training data: around 6M patients who received care from more than 1,200 facilities of the US VHA
-
Two common and uncommon disease/outcome agnostic prediction (DOAP) datasets
- ICD-10CM codes with more than a 2% prevalence ratio for common dataset
- Those with a 0.04%-0.05% prevalence ratio for uncommon dataset
-
Non-VHA dataset: from MIMIC-IV dataset (29,482)
- Only selected objects with ICD-10CM records to match the cohorts from VHA
Longitudinal EHRs
-
Include demographic information (gender, age, race, and marital status) and ICD-10CM codes as predictors
-
Group ICD codes at the visit level
-
Order the codes by priority, where the primary diagnosis is typically given the highest priority
-
Form multiple visits as a time-stamped input of a sequence by date of visit
Embeddings
Multi-level embeddings: visit embeddings + time embeddings + code embeddings
- Time embeddings: embed days difference as relative time information by getting the difference between a certain visit and the last visit in the EHR
- Includes the date of each visit to integrate temporal information, not only sequential order
- Date is important as the importance of predictor in a visit can vary over time
Model Architecture
Encoder-decoder transformer-based architecture
- Encoder: performs cross-attention, unlikely BERT, over representations and assigns an attention weight for each representation
- Cross-attention is implemented by masking the complete set of ICD codes of a future visit as shown in Fig. 2b
- Decoder: generates ICD codes of the masked future visit with the weighted representations from the encoder
- Generates the codes following the order of code priority within a visit
Evaluation
Metrics: PPV (precision), AUROC, AUPRC
Baseline models: logistic regression, LSTM, BERT without pre-training, BERT with pre-training # what’s the objective when pre-training BERT? MLM or the objective proposed in this paper?
Pre-training
Task: Disease or outcome agnostic prediction (DOAP); Predicting the ICD codes of a patient’s future visit based on longitudinal information up to the current visit
Ablation study
- Visit masking vs. code (part of visit) masking for an encoder-decoder model
- Visit masking performed better; pre-training of all diseases outperform traditional pre-training objective (2.52-2.96% in AUROC)
- Encoder-decoder vs. encoder-only (BERT) on DOAP
- Encoder-decoder outperformed; 0.74-1.16% in AUROC # the possibility if the parameter size affected this result?
- Time embeddings O vs. X
- The model with the time embeddings outperformed moderately; 0.43 in AUROC
- Days difference is more effective than specific date as the embeddings
Fine-tuning
Tasks: the pancreatic cancer onset prediction (Table 3) and intentional self-harm prediction in patients with PTSD (Table 4)
- TransformEHR outperforms on the both tasks
-
AUPRC was consistent when using different set of demographics
- Results with all visits was better than with recent few(five) visits
- In generalizability evaluation,
- When testing with internal dataset which is included data from VHA facilities not used for pre-training, there’s no statistical difference in AUPRC on the intentional self-harm prediction task among PTSD