[Paper] BEHRT: Transformer for Electronic Health Records (2020)
22 Jan 2024 #bio #ehr #transformer
Li, Yikuan, et al. “BEHRT: transformer for electronic health records.” Scientific reports 10.1 (2020): 7155.
Points
-
BEHRT (BERT for EHR): BERT-based architecture
-
In EHR, certain diseases can be reversed, or the time interval between two diagnoses can be shorter or longer than recorded.
→ Bidirectional contextual awareness of the model’s representation is a big advantage with EHR data.
-
- Transfer Learning: pre-training on predicting of masked disease words, such as Masked Language Modeling (MLM), and then fine-tuning on three disease prediction tasks
- Disease embeddings: show the relations between the various diseases
Background
-
In traditional research on EHR data, individuals are represented by models as features. Experts had to define the appropriate features.
- Studies applying Deep Learning (DL) to EHR started to show that DL models can outperform the traditional Machine Learning (ML) methods
- With DL architectures for sequence data, such as Recurrent Neural Networks (RNNs), the application of DL models for EHR was improved in terms of capturing the long-term dependencies among events.
- Similarities between sequences in EHR and natural language lead to the successful transferability of techniques
Method
Data
Clinical Practice Research Datalink (CPRD)
-
one of the largest linked primary care EHR systems
-
Contains longitudinal data from a network of 674 general practitioner practices in the UK
-
from 8 million patients (eligible for linkage to HES, meet CPRD’s quality standards) to 1.6 million patients (having at least 5 visits in the EHR)
-
Only the data from GP practices considered in this study, which consented to record linkage with HES
Input Features
- Only consider the diagnoses and ages
The patient $p$’s EHR: \[ V_p={v^1_p},{v^2_p},{v^3_p}, …,{v^n_p} \]
The patient $p$’s EHR of the $j$th visit: $v^j_p$
$v^j_p$ is a list consisting of single or multiple $m$ diagnoses: $v^j_p={d_1, …d_m}$
Input sequence: \[ I_p={CLS, v^1_p, SEP, v^2_p, SEP,…,v^{n_p}_p, SEP} \]
-
$CLS$ token: the start of a medical history
-
$SEP$ token: between visits
Embeddings
-
A combination of 4 embeddings: disease + position + age + visit segment
-
Disease embeddings: past diseases can improve the accuracy of the prediction for future diagnoses
-
Positional encodings: determine the relative position in the EHR sequence
Tasks
Pre-training: prediction of masked disease words (MLM)
- 86.5% of the disease words were unchanged, 12% of them replaced with the mask token, and the remaining 1.5% words replaced with randomly-chosen disease words
Fine-tuning: 3 different disease prediction tasks
-
Prediction of diseases in the next visit (T1)
- Randomly choose an index $j (3<j<n_p)$ for each patient
- Form input as $x_p={v^1_p, …, v^j_p}$
- Output $y_p=w_{j+1}$, $w_{j+1}$ is a multi-one-hot vector, indexed for disease that exist in $v^{j+1}_p$
- Each patient contributes only an input-output pair to the training and evaluation process.
-
Prediction of diseases in the next 6 months (T2) & in the next 12 months (T3)
- Not include patients that don’t have 6 or 12 months worth of EHR in the analysis
- choose $j$ randomly from $(3, n*)$, where $n*$ is the highest index after 6 or 12 months
- output $y_p=w_{6m}$ and $y_p=w_{12m}$
-
The number of patients for each task: 699K, 391K, 342K
Results
Disease Embeddings
T-SNE Results
-
The natural stratification of gender-specific diseases
-
The natural clusters are formed that in most cases consist fo disease of the same chapter
-
10 closest diseases by cosine similarity of their embeddings are founded as similar as those provided by a clinical researcher (0.757 overlab) - Supplementary table S3
Attention and Interpretability
Found the relationships among events which goes beyond temporal/sequence adjacency
Analysed the attention-based patterns by Vig
- Strong connections between rheumatoid arthritis and enthesopathies and synovial disorders → Attention can go beyond recent events and find long-range dependencies among diseases
Disease Prediction
Metrics: average precision score(APS), AUROC
- APS: a weighted mean of precision and recall achieved at different thresholds
-
Performance scores
→ Outperformed the best model by more than around 8% in predicting for a range of more than 300 diseases
-
Comparing performance for each disease
APS and AUROC scores with the all \(y_p\) and \(y*_p\) vectors of a disease
Discussion
- Flexibility of BEHRT from employed 4 key concepts from EHR: disease, age, segment, position
- gains insights about underlying generating process of EHR
- Distributed/complex representations that are capable of capturing concepts of disease
- Future work: adding new concepts to the embeddings
- Disease embeddings provide great insights into how various diseases are related to each other: co-occurrence and closeness of diseases
- Could be used for future research as reliable disease vectors
-
Important features of EHR for prediction
-
Robust, gender-specific predictions without inclusion of gender
-
Position was important
-
Age embeddings might be vital in diagnosing age-related diseases
-