[Paper] TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records (2023)

15 Feb 2024 #bio #ehr #transformer

Yang, Zhichao, et al. “TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.” Nature Communications 14.1 (2023): 7857.

Paper Link

Points

New pre-training objective: predicting all diseases or outcomes of a future visit
- Helps the model uncover the complex interrelations among different diseases and outcomes
TransformEHR, the generative encoder-decoder framework to predict patients’ ICD codes using their longitudinal EHRs
Validation of the generalizability using both internal and external datasets
- Demonstrated a strong transfer learning capability of the model
- Could be great with limited data and computing resources

Background

Longitudinal electronic health records (EHRs) have been successfully used to predict clinical diseases or outcomes (congestive heart failure, sepsis mortality, mechanical ventilation, septic shock, diabetes, PTSD, etc.)
With the availability of large cohorts and computational resources, deep learning (DL) based models outperform traditional machine learning (ML) models (Med-BERT, BEHRT, BRLTM, etc.)
The existing pre-training tasks were limited in predicting a fraction of ICD codes within each visit :arrow_right: A novel pre-training strategy, which predicts the complete set of diseases and outcomes within a visit, might improve clinical predictive modeling

Method

Data

*VHA: Veterans Health Administration, the largest integrated healthcare system in the US, providing care at 1,321 healthcare facilities

Pre-training data: around 6M patients who received care from more than 1,200 facilities of the US VHA

Two common and uncommon disease/outcome agnostic prediction (DOAP) datasets
- ICD-10CM codes with more than a 2% prevalence ratio for common dataset
- Those with a 0.04%-0.05% prevalence ratio for uncommon dataset
Non-VHA dataset: from MIMIC-IV dataset (29,482)
- Only selected objects with ICD-10CM records to match the cohorts from VHA

Longitudinal EHRs

Include demographic information (gender, age, race, and marital status) and ICD-10CM codes as predictors
Group ICD codes at the visit level
Order the codes by priority, where the primary diagnosis is typically given the highest priority
Form multiple visits as a time-stamped input of a sequence by date of visit

Embeddings

Multi-level embeddings: visit embeddings + time embeddings + code embeddings

Time embeddings: embed days difference as relative time information by getting the difference between a certain visit and the last visit in the EHR
- Includes the date of each visit to integrate temporal information, not only sequential order
- Date is important as the importance of predictor in a visit can vary over time

Model Architecture

Encoder-decoder transformer-based architecture

Encoder: performs cross-attention, unlikely BERT, over representations and assigns an attention weight for each representation
- Cross-attention is implemented by masking the complete set of ICD codes of a future visit as shown in Fig. 2b
Decoder: generates ICD codes of the masked future visit with the weighted representations from the encoder
- Generates the codes following the order of code priority within a visit

Evaluation

Metrics: PPV (precision), AUROC, AUPRC

Baseline models: logistic regression, LSTM, BERT without pre-training, BERT with pre-training # what’s the objective when pre-training BERT? MLM or the objective proposed in this paper?

Pre-training

Task: Disease or outcome agnostic prediction (DOAP); Predicting the ICD codes of a patient’s future visit based on longitudinal information up to the current visit

Ablation study

Visit masking vs. code (part of visit) masking for an encoder-decoder model
- Visit masking performed better; pre-training of all diseases outperform traditional pre-training objective (2.52-2.96% in AUROC)
Encoder-decoder vs. encoder-only (BERT) on DOAP
- Encoder-decoder outperformed; 0.74-1.16% in AUROC # the possibility if the parameter size affected this result?
Time embeddings O vs. X
- The model with the time embeddings outperformed moderately; 0.43 in AUROC
- Days difference is more effective than specific date as the embeddings

Fine-tuning

Tasks: the pancreatic cancer onset prediction (Table 3) and intentional self-harm prediction in patients with PTSD (Table 4)

TransformEHR outperforms on the both tasks
AUPRC was consistent when using different set of demographics
Results with all visits was better than with recent few(five) visits
In generalizability evaluation,
- When testing with internal dataset which is included data from VHA facilities not used for pre-training, there’s no statistical difference in AUPRC on the intentional self-harm prediction task among PTSD

Coffee Chat Brewing AI Knowledge