[Paper] Amyloid-β prediction machine learning model using source-based morphometry across neurocognitive disorders (2024)

18 Apr 2024 #bio #brainImaging #demensia #atn #amyloid

Momota, Yuki, et al. “Amyloid-β prediction machine learning model using source-based morphometry across neurocognitive disorders.” Scientific Reports 14.1 (2024): 7633.

Paper Link

Points

Objective

다양한 환자의 MRI를 기반으로 하는 machine leanring 모델을 사용해 Alzheimer’s disease (AD)를 예측하고자 한다.
Amyloid-beta (A$\beta$) 침착의 정도를 측정하기 위해 source-based morphometry (SBM)을 활용한다.

Methodology

3D T1 weighted-image (WI)를 voxel-based 회백질 (gray matter; GM) 이미지로 전처리한 뒤 SBM에 적용했다.
Classifier로서 support vector machine (SVM)을 사용했다.
모델의 interpretability를 위해 SHapley Aditive exPlanations (SHAP)를 활용했다.

Results

MR 이미지, 인지 검사 결과 및 apolipoprotein E (APOE)를 input feature로 사용한 최종 모델의 정확도가 89.8%를 달성했다.
MR 이미지만을 기반으로 한 모델의 경우 84.7%이다.

Background

AD는 A$\beta$ 플라크, 신경 섬유 매듭(neurofibrillary tangles), 뇌 위축(brain atrophy) 등으로 특정되는 신경퇴행성 질환이다.
A$\beta$는 AD를 정의하는 특징 중 하나이지만 임상 실무에서 실질적으로 감지하기 어렵다.
- Position emission tomography (PET), cerebrospinal fluid (CSF) 검사, 혈액 바이오 마커 등의 방법은 아직 실무에 적용되지 못했다.
MRI 기반 A$\beta$ 예측은 위의 방법을 통한 정확한 진단 이전에 유용한 진단 도구로서 사용될 수 있다.

Method

supfig1

Features

Participants and clinical measurements

2018년 6월 ~ 2021년 8월, Keio 대학 병원의 memory clinic에서 모집되었다.
진단명: AD, MCI, HC

Cognitive assessment (9 measures)

인지 기능 전반: Mimi-mental state examination (MMSE), Clinical dementia rating (CDR), Functional activity questionnaire (FAQ)
기억력: Wechsler Memory Scale-Revised (WMS-R) Lgical Memeory immediate recall (LM I) and delayed recall (LM II)
실행력 및 주의력: Word Fluency, Trail Making Test (TMT)
특정 인지 능력: Japanese version of Alzheimer’s Disease Assessment Scale-Cognitive subscale (ADAS-cog-J), Japanese Adult Reading Test (JART)

APOE genotyping

Magnetic nanoparticle DNA extraction kit (EX1 DNA Blodd 200 $\mu$L Kit)
real-time polymerase chain reaction (PCR)

[¹⁸F] Florbetaben (FBB) amyloid-PET imaging

[¹⁸F] Florbetaben (FBB)

Florbetaben은 일반 임상에서 사용할 목적으로 개발된 진단 방사성 트레이서로, 아밀로이드 베타 플라크를 시각화하기 위해 만들어졌다. [reference]

MRI

Acquisition - 3D T1 weighted MR 이미지 (T1 WI)

MRI 스캐너: Discovery MR750 3.0 T scanner (GE Healthcare)
Coil: 32-channel head coil
Imaging parameters: field of view (FOV) 230mm, matrix size 256$\times$256, slice thickness 1.0mm, voxel size 0.9$\times$0.9$\times$1.0 mm

Pre-processing

Segmentation: MR 이미지를 조직 유형(GM, white matter (WH), CSF)에 따라 segmentation한다. (Statistical Parametric Mapping toolbox CAT12 사용)
Nomarlization: segmented GM 이미지를 Montreal Neurological Institute (MNI) 템플릿에 맞춰 normalize한다.
- Montreal Neurological Institute (MNI) Template: 신경 영상 연구에서 일반적으로 사용되는 뇌 표준판.
  
  Standard anatomical templates are widely used in human neuroimaging processing pipelines to facilitate group level analyses and comparisons across different subjects and populations. The MNI-ICBM152 template is the most commonly used standard template, representing an average of 152 healthy young adult brains. [reference]
Resampling and Smoothing: 이미지를 isotropic voxel size 2$\times$2$\times$2 mm³ 로 resampling한 후, 5mm full-width-at-half-maximum Gaussian kernel을 사용해 smoothing한다.
- 이미지 사이즈를 표준화하고 이미지 내 noise를 줄이는 데에 도움이 될 수 있다.
Source-based morphometry (SBM): 독립 성분 분석 (independent component analysis; ICA)을 통합하여 해부학적 뇌 이미지를 각 개체의 독립적인 spatial map으로 분해한다.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. [reference]
ICA processing
- 3D GM 이미지 (91$\times$109$\times$91 voxels)를 1D 배열 (1$\times$902,629) 형식으로 변환한다.
- Scikit-learn의 FastICA를 사용해 ICA에 선택된 voxel에 관한 brain mask를 생성한다.
- 추출된 독립 성분 (IC) 수는 모델링 시 하이퍼파라미터로 작용한다.
Spatial Regression: 추출된 IC는 각 GM 이미지의 공간 회귀 변수 (spatial regressor)로 사용되며, 가중 계수 (weighting coefficient) $\beta$는 각 IC의 GM 이미지에 영향을 얼마나 줄지를 결정한다.
\[I_{GM}=\beta_1 IC_1 + \beta_2 IC_2 + ... + \beta_K IC_K\]

Machine learning

Input features: ICA의 $\beta$ 값, demographic characteristics (나이 및 성별), 인지 평가, APOE 유전형
Input conduction: 다양한 input feature 조합을 모델 학습 및 테스트 시 사용했다.
1. 모든 input feature 사용
2. 각 feature를 다양하게 조합하여 사용: 뇌 이미지만 사용, 뇌 이미지+인지 평가 사용 등
3. 진단명 별 데이터를 다양하게 조합하여 사용: AD+HC, AD+MCI+HC 등
모델: Gaussian support vector machine (SVM)
- 5-fold cross-validation 방식으로 학습
- 모든 분할에서 테스트
Interpretability: SHaply Additive exPlanations (SHAP)
- 게임 이론에 기초하여 구해지는 SHAP 값은 모델 예측 결과에 해당 feature가 미치는 영향을 나타낸다.
- SHAP의 절댓값이 큰 feature일수록 예측에 강한 영향을 미친다.
- 양음성을 띠는 SHAP 값이 도출되는 임상적 feature는 A$\beta$의 양음성과 관련이 있다.

Statistical analysis

변수 간 관계성 탐색으로서 진단명과 관련이 있는지, Alzheimer’s disease 관련 기존 연구 가설과 연관이 있는지 판단해보았다.

Two-tailed t-test / Chi-square test
- Two tailed t-test: 두 그룹의 평균을 비교하여 그들 사이에 유의한 차이가 있는지 결정하는 데 사용된다.
- Chi-square test: 범주형 변수 간 독립성 (independence)을 테스트하는 데 사용된다.
feature 간 관계성: 연속성 변수에 대한 피어슨 상관 분석 (Pearson’s correlation analysis)
- 연속성 변수 pair 간 선형 관계 (linear relationship)의 강도와 방향을 측정한다.
- 변수간 관계를 이해하는 데 도움을 준다.
진단명과의 관련성: 분산 분석 (Analysis of variance; ANOVA)
- 한 표본 내에서 그룹 간 평균 차이를 분석한다.
- 그룹 평균 사이 통계적으로 유의미한 차이가 있는지 결정하므로, 비교할 그룹이 두 개 이상인 경우 특히 유용하다.

Results

최종 모델 구축에 118개 데이터가 사용되었다.

table1

Model performance

table2 fig1 table3

A$\beta$ positivity prediction

최종 모델: 뇌 이미지 + 인지 기능 점수 + APOE를 input feature로 사용한 모델
최종 모델로 최고 성능 (accuracy 89.8%, AUC 0.888)을 달성했다.
뇌 이미지만 input feature로 사용한 모델이 최저 성능 (accuracy 84.7%, AUC 0.830)을 기록했다.

최종 모델로 각 진단명의 데이터에 대해 A$\beta$ positivity prediction을 시험한 결과

모든 진단명 데이터를 사용한 경우 최고 성능 (accuracy 89.8%)을 얻었다.
MCI 데이터만을 가지고 테스트한 경우에 최저 성능 (accuracy 75.9%)을 기록했다.

SBM

table4 addfig2 fig2

최종 SBM 모델에서 7개의 IC를 추출했다.

각 component는 공간적으로 maximally independent GM volume 패턴을 나타낸다.
IC 1이 인지 검사 결과 및 A$\beta$ 양음성과 유의한 상관 관계를 보였다.
진단명 중에서는 AD와 IC 1만이 유의한 관련이 있었고, 다른 진단명은 어떤 IC와도 관련이 없었다.

Discussion

제안한 모델은 A$\beta$ positivity를 성공적으로 예측했다 (성능: accuracy 89.8%, AUC 0.888).

여러 feature로 구성된 118개의 데이터만을 가지고 좋은 결과를 내었다.
비 Alzheimer’s disease (non-AD) 개체도 정확히 구분했다: FTLD 신드롬이나 다른 정신 질환 등
최종 모델의 공분산 (convariant) 중 IC 1이 A$\beta$ positivity prediction에 강한 영향을 미쳤다.

Performance

Non-AD 개체가 갖는 feature의 다양성(heterogeneity)
- AD 개체만을 기반으로 학습된 모델이 모든 경우에 대해 학습한 모델보다 성능이 조금 낮았다. (88.4%)
SBM의 장점
- 다양한 임상 인구를 기반으로 한 모델은 실제 임상 환경에서 적용되기에 더 적합할 것이다. (← 진료를 받으러 오는 환자들은 AD 외 다양한 인지 장애를 가지고 있을 것이다.)
- 뇌 이미지만을 사용하여 학습된 모델 (accuracy 84.7%)은 AD 관련 임상 시험에서 잠재적 환자를 선별하는 데 도움이 될 수 있을 것이다.
- SBM은 기존의 아틀라스(atlas)에 의존하지 않고 ND 질환과 관련된 뇌 구조의 미묘한 형태학적 변화 및 알려지지 않은 패턴을 감지한다.
봐줄만 한 MCI 환자 예측 성능
- 의사가 AD 환자를 70% 정확하게 진단하는데, 모델은 MCI 데이터만을 가지고 이것을 초과한 정확도 (75.9%)를 보였다.
- 다른 MRI 기반 모델의 MCI 개체 대상 예측 정확도와도 견줄만하다.

Feature Importance of the model - SHAP

fig3 supfig3

모든 IC가 인구 통계 및 MMSE 등과 같은 인지적 특성보다 모델 예측에 더 중요하게 작용하는 것으로 나타났다. 모델에 제일 중요하게 작용한 feature 세 가지는 다음과 같다: IC 1, LM 1, LM II

IC 1: A$\beta$ 양음성 및 인지 검사 결과와 유의한 상관 관계를 보였다.
- IC 1의 공간적 패턴이 측두엽(parietal lobe)에서 관찰되는 AD의 신경 퇴행(neurodegeneration; ND) 피질 패턴(cortical pattern)과 유사했다.
- 전형적인 AD 양상인 내측두엽(medial temporal lobe; MTL) 위축이 어떤 IC에서도 관찰되지 않았다. 이것은 A$\beta$ 병변(pathodology)이 아닌 Tau pathodology를 가리킬 수도 있다.
LM scores: AD의 주요 증상인 기억 장애를 반영한다.
APOE -$\epsilon$4의 유무도 중요한 요소로 나타났다.

또한 IC 1과는 A$\beta$ 양음성이, IC 4와는 나이가 명확하게 관련되는 것으로 나타났다.

이것은 모델이 뇌 이미징에서 AD로 인한 ND와 정상적인 노화를 구별하는 능력이 있다는 것을 나타낸다.
즉, AD의 pathdology 과정은 나이와 절대적으로 관련이 있지는 않을 수 있음을 시사한다. → 정상적인 노화 과정에서 관찰되는 뇌 손상 패턴은 신경퇴행성 질환의 뇌 손상과 구별될 수 있다.

Limitation

PET 검사로만 결정된 A$\beta$ 양음성 여부: 임상 전 단계에서는 CSF A$\beta$로 판단하는 것이 더 정확할 수 있다.
부족한 샘플 수: 모델의 정확도에 영향을 줄 수 있다.
Cross-sectional 접근: 이보다는 Longitudinal follow-up 데이터가 모델 성능을 더 향상시킬 수도 있다.

[Paper] Tabtransformer: Tabular data modeling using contextual embeddings (2020)

11 Apr 2024 #tabular #transformer

Huang, Xin, et al. “Tabtransformer: Tabular data modeling using contextual embeddings.” arXiv preprint arXiv:2012.06678 (2020).

Paper Link

Points

TabTransformer: contextual embedding을 활용한 novel tabular data 모델
Two-phase pre-training 방법을 통해 질 좋은 feature representation을 추출한다.
Supervised 및 semi-supervised learning에서 모두 SOTA를 달성했다.
데이터가 누락되거나 일관되지 않은(noisy) 데이터에서도 성능이 안정적이다.

Background

기존 tabular data에 대한 모델은 주로 트리 기반 앙상블 방식으로, Gradient boosted decision trees (GBDT)모델이 대표적이다. 그러나 이런 모델은 딥러닝 모델과 비교하여 여러 제한을 가지고 있다:

스트리밍 데이터를 통한 continual learning에 적합하지 않다.
tabular data의 이미지나 텍스트 등 multi-modality를 end-to-end로 학습하는 데 효과적이지 않다.
Semi-supervised learning에 적합하지 않다.

한편 Multi-layer perceptron (MLP)은 이미지와 텍스트 인코더를 end-to-end로 학습하는 것을 가능하게 하지만, 이것 역시 단점이 있다:

해석하기가(interpretability) 어렵다.
누락되거나 지저분한 데이터에 대해 취약하다.
Semi-supervised learning의 상황에서 성능이 제한된다.
성능이 트리 기반 모델보다 떨어진다.

Method

archi

Transformer layer는 categorical input만을 입력으로 받는다.
Continuous input은 Transformer의 출력값과 concatenate된다.
Pre-training 동안 Transformer layer는 unlabeled data에 대해 두 가지 task를 학습한다.
- Pre-training에서는 continuous input을 배제하고 categorical input만 활용된다.
Pre-trained model은 labeled data를 가지고 MLP head와 같이 target 예측 task에 fine-tuning된다.
Continuous value는 fine-tuning 단계에서 categorical value와 concat되어 사용된다.

Model Architecture

fig1

각 입력값 $x\equiv \lbrace x_{cat}, x_{cont}\rbrace$은 해당하는 라벨 $y$를 갖는다: $(x, y)$.
$x_{cat} \equiv \lbrace x_1, x_2, …, x_m\rbrace$는 입력값 $x_i (i \in {1, …, m})$가 categorical value인 경우의 feature를 가리킨다.
$x_{cat}$은 $E_\phi$로 임베딩된다 (column embedding):
\[E_\phi(x_{cat}) \equiv \lbrace e_{\phi_1}(x_1), ..., e_{\phi_m}(x_m) \rbrace, \ e_{\phi_i}(x_i) \in \mathbb{R}^d\]
이 임베딩이 여러 Transformer layer를 통과한다 (contextual embedding):
\[\{h_1, ..., h_m\}=f_\theta(E_\phi(x_{cat})), \ h\in \mathbb{R}^d\]
$x_{cat}$의 contextual embedding은 $x_{cont} \in \mathbb{R}^c $와 concat된다 ($(d\times m+c)$ 차원).

Column Embedding

colemb

Categorical feature $x_i$는 각자의 embedding lookup table $e_{\phi_i}(.)$을 갖는다.
$d_i$개 클래스를 갖는 $i$th feature에 대해, embedding table $e_{\phi_i}(.)$은 $(d_1+1)$개의 embedding으로 구성되는데, 여기서 $d_1+1$번째 embedding은 누락된(masking된) 값을 표현하기 위해서 추가되었다.
각 embedding $e_{\phi_i}(j)$은 $[c_{\phi_i}, w_{\phi_{ij}}]$로 표현되는데,
- $c_{\phi_i}$는 column $i$에 속하는 클래스를 다른 column 내 클래스와 구분하는 역할을 한다.
- $w_{\phi_{ij}}$는 column $i$에 속하는 한 feature $j$의 클래스를 해당 column 내 다른 클래스들과 구분하는 역할을 한다.
*차원 $d$는 코드 상으로 볼 때 hidden size인 $h$와 같은 것으로 보인다.

Pre-training

Transformer layer는 categorical value $x_{cat}=\lbrace x_1, x_2, …, x_m\rbrace$로 구성된 입력을 가지고 두 가지 task를 수행하며 pre-training된다.

Masked language modeling (MLM)
- 입력값 중 $k\%$의 feature를 랜덤으로 masking한다. 실험에서는 $k$를 30으로 설정했다.
- masking된 feature의 값을 예측하는 multi-class classifier의 cross-entropy loss를 구하여 최소화하는 방향으로 학습한다.
Replaced token detection (RTD)
- 입력값 중 일부의 feature를 랜덤하게 생성된 다른 값으로 바꾼다.
- 해당 feature가 바뀌었는지 아닌지를 예측하는 binary classifier의 loss를 최소화하는 방향으로 학습한다.
- 각 column은 embedding lookup table을 따로 가지므로, binary classifier 또한 각 column에 대해 따로 구현되었다.

Experiments

Settings

Data

모든 모들은 15가지 public binary classification 데이터셋에 대해 평가되었다. 데이터셋 출처는 UCI repository, AutoML Challenge, Kaggle.
모든 데이터셋은 cross-validation을 위해 5개로 나뉘었다.
Training: Validation: Testing 비율 = 65:15:20 (%)
Categorical feature는 데이터셋마다 2에서 136가지로 분류된다.
Semi-supervised 및 supervised 실험 관련
- Semi-supervised: $p$개의 labeled data와 unlabeled data로 학습 데이터를 구성하였다. $p$는 실험 세팅에 따라 $(50, 200, 500)$ 중 하나로 설정되었다.
- Supervised: 모든 데이터가 labeled data.

Setup

Hidden dimension: 32
Transformer layer 수: 6
Attention head 수: 8
MLP layer 구조: $\lbrace 4\times l, \ 2\times l \rbrace$ ($l$은 입력의 size를 나타낸다).
매 cross-validation split마다 hyperparamter optimization (HPO)를 20번 수행했다.
Pre-training은 semi-supervised learning의 경우에만 적용되었다.
- 모든 데이터가 라벨이 있는 경우(labeled data)에는 pre-training 유무의 차이를 크게 찾지 못했다.
- Unlabeled data 개수가 많고, labeled data가 적은 학습 상황에서 pre-training의 효과를 더 명확히 발견하였다: 모델이 pre-training을 통해 labeled data에서만으로는 배울 수 없는 representation을 형성할 수 있게 되는 것으로 보인다.

Baseline model: MLP 모델

TabTransformer에서 Transformer layer를 제거한 상태의 모델
Transformer layer의 효과를 평가하기 위해 baseline으로 설정하였다.

The effectiveness of the Transformer Layers

Performance comparison

Supervised learning의 상황에서 TabTransformer와 MLP를 비교하였다.
- TabTransformer가 14개의 dataset에서 AUC 상 평균적으로 1.0% 정도로 MLP보다 더 좋은 성능을 보였다.
t-SNE visualization of contextual embeddings
- 각 점은 특정 클래스에 속하는 테스트 데이터의 2차원 좌표값을 평균내어 표시하였다.
- 마지막 Transformer layer의 t-SNE plot (왼쪽)에서, 의미가 비슷한 클래스끼리 embedding space 상 cluster를 형성하며 가까이 모여있는 것을 볼 수 있다.
- Transformer layer를 통과하기 전 (중간)에도, 성격이 다른 feature의 embedding 간에 구별이 시작되는 것을 볼 수 있다.
- MLP의 embedding (오른쪽)의 경우 어떤 뚜렷한 경향성을 보지 못했다.
Prediction performance of linear models using the embeddings from different Transformer layers

Logistic regression 모델을 사용해 학습된 embedding의 퀄리티를 평가하였다.
- 각 모델은 embedding과 continuous value를 사용하여 $y$를 예측한다.
- Metrics: Test data를 가지고 평가했을 때, AUC 내 cross-validation 점수
- Normalization: 각 예측 점수는 TabTransformer를 해당 데이터에 학습했을 때 제일 잘 나온 점수에 대해서 normalization되었다.
- Features: embedding은 concatenation 대신 평균한 후 maximum pooling하는 것으로 처리되었다.
- Findings: Transformer layer가 깊어질 수록 embedding의 효과가 커지는 것으로 보인다.

The robustness of TabTransformer

데이터가 noisy한 경우와 누락된 경우에 대해 TabTransformer의 성능 안정성을 평가하였다.

fig4_5

Noisy data
- Method: 데이터에 noise를 만들기 위해 특정 값을 해당 columns 내에 존재하는 값 중 랜덤한 값으로 교체한다. 이 데이터로 이미 학습된 모델을 평가한다.
- Findings: 데이터가 noisy할 수록 TabTransformer가 MLP보다 확실히 더 좋은 성능을 보이는 것을 관찰할 수 있다 (fig. 4).
- TabTransformer embedding의 contextual한 성질이 noisy한 데이터에서 큰 효과를 나타내는 것으로 여겨진다.
Data with missing value
- Method: 일부 값을 일부러 삭제하여 데이터를 조작한 후, 미리 학습된 모델을 평가한다.
  - 학습된 모델의 embedding 중 특정 column의 모든 클래스의 embedding 평균값으로 누락된 값을 처리했다.
- Findings: TabTransformer가 값이 누락된 데이터에서도 MLP보다 더 안정적인 성능을 보였다 (fig. 5).

Supervised learning

Supervised learning의 상황에서 TabTransformer의 성능을 4가지 카테고리의 모델과 비교했다:

Logistic Regression and GBDT
MLP and sparse MLP
TabNet model
Variational Information Bottleneck (VIB) model

table2

Findings:

TabTransformer가 성능이 제일 좋은 GBDT와 견줄만한 성능을 보였다.
한편 TabNet과 VIB와 같이 tabular data에 대해 고안된 최신 deep learning 모델보다 확실히 좋은 성능을 보였다.

Semi-supervised learning

Semi-supervised learning의 상황에서는 TabTransformer의 성능을 다음 모델과 비교했다:

Entropy Regularization (ER)
Pseudo Labeling (PL) combined with MLP, TabTransformer, and GBDT
MLP (DAE): An unsupervised pre-training method designed for deep models on tabular data, specifically the swap noise Denoising AutoEncoder

table3_4

Method:

Pre-trained model (TabTransformer-RTD/MLM 및 MLP)의 경우 unlabeled data에 pre-training한 후, labeled data에 fine-tuning했다.
Semi-supervised learning method (ER 및 PL)의 경우 labeled data와 unlabeled data를 모두 사용하여 학습하였다.

Findings:

TabTransformer-RTD/MLM 두 모델이 다른 모델보다 좋은 결과를 나타냈다.
TabTransformer (ER), TabTransformer (PL) 및 GBDT (PL)은 다른 모델의 평균보다 더 안 좋은 성능을 보였다.
TabTransformer-RTD가 unlabeled data의 수가 줄어들 수록 더 나은 성능을 보였고, TabTransformer-MLM를 압도했다.
- MLM task인 multi-class classification보다 RTD의 binary classification이 더 쉽기 때문에 학습이 잘 되어 나타난 차이라고 해석된다.
50개의 data point를 가지고 평가했을 때, MLM (ER)과 MLM (PL)이 TabTransformer 모델보다 좋은 성능을 보였다.
- TabTransformer 모델의 경우 unlabeled data에 대해 학습할 때 유용한 embedding을 추출하는 것에 주로 학습될 뿐, classifier 자체의 weight를 update하지 않으므로 나타난 결과라고 여겨진다.
전반적으로 TabTransformer 모델이 unlabeled data에서 유용한 정보를 추출하는 데에 탁월하여, supervised learning 상황에서나, 특히 unlabeled data가 많은 상황에서도 잘 활용될 수 있을 것으로 보인다.

Github.io에서 markdown 수식 문법 적용이 안될 때

15 Mar 2024

Github blog 포스트에 수식을 작성했는데, markdown 수식 문법 적용이 되지 않는 문제가 있었습니다. 해결 방법을 기록해두고자 포스팅합니다.

1. _config.yml 파일 수정

markdown process 관련 설정을 확인하여 수정, 없으면 추가합니다. markdown engine을 kramdown으로 설정해야 한다고 합니다.

2. _includes 폴더 내 수식 문법 관련 HTML 파일 작성

일반적으로 github blog 내에는 _include 폴더가 존재합니다. 폴더 내에 수식 문법이 포스트에 적용될 수 있게끔 하기 위한 스크립트를 작성합니다. 아래 내용이 HTML 파일에 작성되면 됩니다.

inlineMath 와 displayMath 항목에서 각각의 수식 문법 기호를 설정할 수 있습니다. 위 예시의 displayMath 와 같이 리스트 내에 여러 기호를 설정할 수 있습니다. 위 예시에 따르면 수식을 $$ 로 감싸거나, \\[ \\] 사이에 입력하면 display style로 작성할 수 있게 됩니다.

*\\[ \\] 말고 \[ \] 로 문법을 설정하여 포스트에 적용하면, [ ] 괄호를 사용한 일반 텍스트까지 수식으로 처리되는 경우가 있었습니다.

Inline과 Display style

수식 입력 방식에는 inline style과 display style이 있습니다.

Inline style: 줄 바꿈 없이, 문장 내에서 수식을 표기하는 방법
Display style: 수식을 블록으로 생성해 표기하는 방법
```
$2$ plus $3$ is $5$: $$2+3=5$$
```
$2$ plus $3$ is $5$: \[2+3=5\]

3. 2에서 작성한 HTML 스크립트를 포스트에 적용

위에서 작성한 스크립트를 실제 포스팅 시 적용하기 위해 layout에 관련한 HTML 파일을 수정합니다. _layout 폴더에 있는 HTML 파일 중 적합한 파일을 찾아 포스트의 내용 부분에 새로 작성한 HTML 파일의 내용을 가져와 적용합니다. 저는 ‘default.html’ 파일 중 content가 입력되는 부분을 찾아 수정했습니다. 아래 예시와 같습니다.

"content" 블록 내 { content } 의 위치에 작성한 포스트의 본문이 보여집니다. include file.html 은 ‘file.html’의 내용을 가져온다는 뜻입니다. 따라서 해당 블록 내에 ‘math.html’에서 작성한 수식 문법 사항을 적용하겠다는 의미의 코드가 됩니다.

위 코드를 아래와 같이 수정하면 수식 문법 적용 여부를 포스팅 시 설정해 줄 수 있는데요,

page.use_math 가 true 이면 ‘math.html’ 내용을 적용한다는 의미의 코드입니다. 여기서 page 는 각 포스트를 의미합니다. page.use_math 을 설정하기 위해서는 매 포스트 작성 시 Front Matter에 use_math: true 를 추가해주면 됩니다.

수식이 필요 없거나, 수식을 적용하기 싫은 포스트에는 use_math 를 추가하지 않거나 false 로 설정하면 됩니다.

Reference

https://junia3.github.io/blog/markdown
https://an-seunghwan.github.io/github.io/mathjax-error/

When mathematical expression syntax isn't applying on GitHub Pages

15 Mar 2024

I wrote a math expression in a GitHub blog post, but there was an issue with applying markdown syntax. I’m posting this to document the solution that I applied.

1. Modify the _config.yml file

Check and modify the markdown-related settings in the _config.yml file like below. If they don’t exist, add them like below. It’s recommended to set the markdown engine to kramdown.

2. Write a HTML file of math expression syntax within the _includes folder

Generally, GitHub blogs contain an _include folder. Write a script within this folder to enable math expression syntax to be applied to posts. Let’s assume creating a html file named ‘math’

You can set each math syntax mark for the inlineMath and displayMath. Similar to the displayMath item in the above code, you can specifiy multiple marks in the list. Following the example, if you wrap the formula in $$ or \\[ and \\], the math style will be displayed as the display style.

*When setting the syntax as \[ and \] instead of \\[ \\], there might be instances where ordinary text enclosed within square brackets is also treated as part of the math expression.

Inline and Display style

The inline style and the display style are two styles of math expression.

Inline style: Representing math expression within a sentence without line breaks
Display style: Generating math expression as blocks for representation
```
$2$ plus $3$ is $5$: $$2+3=5$$
```
$2$ plus $3$ is $5$: \[2+3=5\]

3. Apply the HTML script created in 2. to the post

To apply the script created above to an actual post, you’ll need to modify the HTML file related to the layout. Find an appropriate file in the _layout folder and incorporate the content of the html file into the section where the post’s content is inserted. For example, I found and modified the ‘default.html’ file like the example below:

{ content } displalys the main body of the post. include file.html means it includes the content of ‘file.html’. Therefore, within this block, it signifies applying the math syntax written in ‘math.html’

You can modify the code and adjust if applying the math syntax or not,

The code page.use_math being true indicates that the content of ‘math.html’ will be applied. Here, page refers to the each page. To set page.use_math, simply add use_math: true to the Front Matter of each post.

For posts where math expressions are not needed or you prefer not to apply them, simply omit the use_math tag or set it to false

Reference

https://junia3.github.io/blog/markdown
https://an-seunghwan.github.io/github.io/mathjax-error/

Docker 명령어 정리

20 Feb 2024 #docker

Image

이미지 검색

docker search [OPTIONS] IMAGE_NAME

--automated=false 자동 빌드만 표시
--no-trunc=false 모든 결과 표시
-s=n --stars=n star수가 n개 이상인 이미지만 표시
e.g.,
```
docker search --stars=100 mysql
```

이미지 다운로드

docker pull [OPTIONS] IMAGE_NAME[:TAG_NAME]

-a 해당 이미지의 모든 버전 다운로드
TAG_NAME 다운로드할 버전 정보, 지정하지 않으면 최신 버전(latest) 다운로드
e.g.
```
docker pull ubuntu:22.04
```

이미지 목록 확인

docker images [OPTIONS] [REPOSITORY]

-a --all 모든 이미지 표시
--digests digest 표시
-q --quiet 이미지 ID만 표시
*컨테이너 목록 확인: docker ps

이미지 세부 정보 확인

docker image inspect IMAGE_ID

Full ID를 입력하지 않고 일부만 입력해도 된다.

이미지 삭제

docker rmi [OPTION] IMAGE_NAME:TAG_NAME

-f 실행 중인 컨테이너의 이미지를 강제 삭제, 그러나 실질적으로 untagging만 되고 이미지와 컨테이너 모두 실제로 삭제되지 않는다.

여러 이미지를 한 번에 삭제:

docker rmi IMAGE_NAME_1 IMAGE_NAME_2 ...

특정 이미지의 실행 중인 컨테이너를 모두 종료시킨 후 이미지 삭제:
```
docker rm -f $(docker ps -a --filter ancestor=IMAGE_NAME)
docker rmi IMAGE_NAME
```

이미지 저장 및 로드

docker save -o DIRECTORY IMAGE_NAME

-o (output) 이미지를 저장할 디렉토리를 지정

docker load -i DIRECTORY

-i 불러올(input) 이미지의 디렉토리를 지정, 해당 디렉토리 내 이미지가 로드된다.

이미지 태그 지정

docker tag IMAGE_NAME:TAG NEW_NAME:NEW_TAG

존재하는 이미지를 복사하여 새로운 이름과 태그로 참조 가능하게 한다.
e.g.,
```
docker tag ubuntu:22.04 abcd:0.1
```

Container

컨테이너 목록 확인

docker ps [OPTION]

실행 중인 컨테이너 목록을 확인한다.
-a 실행 여부와 상관 없이 (종료된 것까지) 모든 컨테이너 확인

컨테이너 세부 정보 확인

docker inspect CONTAINER_NAME

이미지에서 컨테이너 실행

docker run [OPTIONS] IMAGE_NAME

--name CONTAINER_NAME 컨테이너 이름 설정
--rm run 명령어 수행 후 컨테이너 삭제. 컨테이너 일회성 사용
-it 컨테이너에 터미널 입력을 계속해서 전달
- -i 컨테이너에 접속하지 않은 상태에서도 stdin 활성화
- -t pseudo-TTY 할당(TTY모드 사용), 쉘에 명령어 작성
-d 백그라운드 실행. 옵션 입력 시 실행된 컨테이너 id가 출력된다.
-e 환경변수 추가. 추가하고 싶은 환경변수만큼 사용한다.
- e.g.,
```
docker run -e APP_ENV=production APP2_ENV=dev ubuntu:22.04 env
```
-p HOST_PORT:CONTAINER_PORT 호스트에 연결된 컨테이너의 특정 포트를 호스트의 포트와 바인딩. 보통 웹서버의 포트를 외부로 노출하기 위해 사용한다.
-w DIR 작업 디렉토리 변경
-v HOST_DIR:CONTAINER_DIR 호스트의 특정 디렉토리를 컨테이너에 마운트
- e.g.,
```
docker run -v volume:/data ubuntu:22.04
```
- 현재 작업 디렉토리를 컨테이너에 마운트하기
```
docker run -v `pwd`:/opt ubuntu:22.04
```
-u USER_ID 특정 user id로 컨테이너에 접속. 이미지 빌드 시 계정을 추가해야 가능하다.

실행 중인 컨테이너에 명령어 입력

docker exec CONTAINER_ID or NAME CMD

-it 컨테이너 환경에서 shell 실행,

`run` 과 `exec`의 차이점

run: 이미지에서 컨테이너를 실행
exec: 이미 실행 중인 컨테이너에서 명령어 실행

컨테이너 중지

docker stop CONTAINER_ID/NAME

실행 중인 컨테이너를 중지 (Graceful shutdown)

docker kill CONTAINER_ID/NAME

실행 중인 컨테이너를 강제 종료

컨테이너 재시작

docker start CONTAINER_ID/NAME

중지된 컨테이너를 다시 시작

docker restart CONTAINER_ID/NAME

컨테이너를 중지시키고 다시 시작

Older Newer

Coffee Chat Brewing AI Knowledge

[Paper] Amyloid-β prediction machine learning model using source-based morphometry across neurocognitive disorders (2024)

Points

Background

Method

Features

MRI

Acquisition - 3D T1 weighted MR 이미지 (T1 WI)

Pre-processing

Machine learning

Statistical analysis

Results

Model performance

SBM

Discussion

Performance

Feature Importance of the model - SHAP

Limitation

[Paper] Tabtransformer: Tabular data modeling using contextual embeddings (2020)

Points

Background

Method

Model Architecture

Pre-training

Experiments

Settings

The effectiveness of the Transformer Layers

The robustness of TabTransformer

Supervised learning

Semi-supervised learning

Github.io에서 markdown 수식 문법 적용이 안될 때

1. _config.yml 파일 수정

2. _includes 폴더 내 수식 문법 관련 HTML 파일 작성

Inline과 Display style

3. 2에서 작성한 HTML 스크립트를 포스트에 적용

Reference

When mathematical expression syntax isn't applying on GitHub Pages

1. Modify the _config.yml file

2. Write a HTML file of math expression syntax within the _includes folder

Inline and Display style

3. Apply the HTML script created in 2. to the post

Reference

Docker 명령어 정리

Image

이미지 검색

이미지 다운로드

이미지 목록 확인

이미지 세부 정보 확인

이미지 삭제

이미지 저장 및 로드

이미지 태그 지정

Container

컨테이너 목록 확인

컨테이너 세부 정보 확인

이미지에서 컨테이너 실행

실행 중인 컨테이너에 명령어 입력

run 과 exec의 차이점

컨테이너 중지

컨테이너 재시작

`run` 과 `exec`의 차이점