[論文閱讀] Patient subtyping via time-aware LSTM networks

Un-Defined發表於2024-03-05

Patient Subtyping via Time-Aware LSTM Networks

3.1.2 Time-Aware LSTM (T-LSTM).

T-LSTM 被提出,以將經過時間的資訊納入標準 LSTM 架構中,從而能夠捕捉具有時間不規則性的序列資料的時間動態。所提出的 T-LSTM 架構如圖 2 所示,其中輸入序列由患者的時間資料表示。患者的兩個連續記錄之間的經過時間可能非常不規則。例如,兩次連續入院/醫院就診之間的時間可能是幾周、幾個月或幾年。如果兩個連續記錄之間有數年的間隔,那麼對於前一次記錄的依賴性不足以影響當前輸出,因此前一次記憶對當前狀態的貢獻應該被折扣。T-LSTM 架構的主要組成部分是對前一個時間步長的記憶進行子空間分解。雖然正在調整前一個時間步長的記憶中包含的資訊量,但我們不希望失去患者的全域性特徵。換句話說,長期影響不應該被完全丟棄,但是短期記憶應該與時間步長 \(t\)\(t-1\) 之間的時間跨度成比例地調整。如果時間 \(t\)\(t-1\) 之間的間隔很大,這意味著患者很長一段時間內沒有記錄新的資訊。因此,對於當前輸出的預測,對短期記憶的依賴性不應起到重要作用。

image-20240305204031380

​ T-LSTM透過利用連續元素之間的經過時間來加權短期記憶內容,從而應用記憶折扣。為了實現這一點,我們建議使用經過時間的非遞增函式,將時間間隔轉換為適當的權重。子空間分解過程的數學表示式如下所示。首先,透過網路獲得短期記憶元件 \(\left(C_{t-1}^S\right)\)。請注意,這種分解是資料驅動的,分解網路的引數與網路的其餘部分一起透過反向傳播同時學習。對於分解網路的啟用函式型別沒有特定要求。我們嘗試了幾種函式,但沒有觀察到T-LSTM單元預測效能的顯著差異,但是tanh啟用函式表現略好一些。在獲得短期記憶之後,透過經過時間的權重進行調整,得到折扣短期記憶 \(\left(\hat{C}_{t-1}^S\right)\)。最後,為了組成調整後的先前記憶 \(\left(C_{t-1}^*\right)\),將長期記憶的補充子空間 \(\left(C_{t-1}^T=C_{t-1}-C_{t-1}^S\right)\) 與折扣短期記憶相結合。T-LSTM的子空間分解階段後面是LSTM的標準門控架構。下面給出了提出的T-LSTM架構的詳細數學表示式:

\[\begin{array}{rlr} C_{t-1}^S & =\tanh \left(W_d C_{t-1}+b_d\right) & \text { (Short-term memory) } \\ \hat{C}_{t-1}^S & =C_{t-1}^S * g\left(\Delta_t\right) & \text { (Discounted short-term memory) } \\ C_{t-1}^T & =C_{t-1}-C_{t-1}^S & \text { (Long-term memory) } \\ C_{t-1}^* & =C_{t-1}^T+\hat{C}_{t-1}^S & \text { (Adjusted previous memory) } \\ f_t & =\sigma\left(W_f x_t+U_f h_{t-1}+b_f\right) & \text { (Forget gate) } \\ i_t & =\sigma\left(W_i x_t+U_i h_{t-1}+b_i\right) & \text { (Input gate) } \\ o_t & =\sigma\left(W_o x_t+U_o h_{t-1}+b_o\right) & \text { (Output gate) }\\ \tilde{C} & =\tanh \left(W_c x_t+U_c h_{t-1}+b_c\right) & \text { (Canditate memory) } \\ C_t & =f_t * C_{t-1}^*+i_t * \tilde{C} & \text { (Current memory) } \\ h_t & =o * \tanh \left(C_t\right), & \text { (Current hidden state) } \end{array} \]

其中,\(x_t\) 表示當前輸入,\(h_{t-1}\)\(h_t\) 分別表示上一個和當前的隱藏狀態,\(C_{t-1}\)\(C_t\) 分別表示上一個和當前的細胞記憶。\(\left\{W_f, U_f, b_f\right\}\)\(\left\{W_i, U_i, b_i\right\}\)\(\left\{W_o, U_o, b_o\right\}\)\(\left\{W_c, U_c, b_c\right\}\) 分別是遺忘門、輸入門、輸出門和候選記憶的網路引數。\(\left\{W_d, b_d\right\}\) 是子空間分解的網路引數。引數的維度由輸入、輸出和所選的隱藏狀態的維度確定。\(\Delta_t\)\(x_{t-1}\)\(x_t\) 之間的經過時間,\(g(\cdot)\) 是一種啟發式衰減函式,使得 \(\Delta_t\) 的值越大,短期記憶的影響越小。根據特定應用領域的時間間隔測量型別,可以選擇不同型別的單調非遞增函式作為 \(g(\cdot)\)。如果我們處理的是時間序列資料,如影片,那麼經過時間通常以秒為單位測量。另一方面,如果經過時間從幾天到幾年不等,如醫療領域,我們需要將連續元素之間的時間間隔轉換為一種型別,如天數。在這種情況下,當兩個連續記錄之間有數年的時間時,經過時間可能具有較大的數值。作為一個指導方針,對於經過時間較少的資料集,可以選擇 \(g\left(\Delta_t\right)=1 / \Delta_t\),而對於經過時間較長的資料集,推薦選擇 \(g\left(\Delta_t\right)=1 / \log \left(e+\Delta_t\right)\)

在文獻中,可以找到提出不同方法將經過時間納入學習過程的研究。例如,在文獻[24]中,經過時間被用來修改遺忘門。在 T-LSTM 中,調整記憶單元而不是遺忘門的一個原因是為了避免改變當前輸入對當前輸出的影響。當前輸入透過遺忘門並且來自輸入的資訊起到決定我們應該從上一個單元保留多少記憶的作用。正如在當前隱藏狀態的等式中可以看到的,直接修改遺忘門可能會消除輸入對當前隱藏狀態的影響。另一個重要的點是,子空間分解使我們能夠有選擇地修改短期影響,而不會丟失長期記憶中的相關資訊。第 4 節表明,透過修改遺忘門可以提高 TLSTM 的效能,在本文中稱之為修改後的遺忘門 LSTM(MF-LSTM)。本文采用了文獻[24]中的兩種方法進行比較。第一種方法,標記為 MF1-LSTM,將遺忘門的輸出乘以 \(g\left(\Delta_t\right)\),例如 \(f_t=g\left(\Delta_t\right) * f_t\)。而 MF2-LSTM 則利用了一個引數化的時間權重,例如 \(f_t=\sigma\left(W_f x_t+U_f h_{t-1}+Q_f q_{\Delta_t}+b_f\right)\),其中當 \(\Delta_t\) 以天為單位時,\(q_{\Delta_t}=\left(\frac{\Delta_t}{60},\left(\frac{\Delta_t}{180}\right)^2,\left(\frac{\Delta_t}{360}\right)^3\right)\),與文獻[24]類似。

另一種處理時間不規則性的方法可能是透過在兩個連續時間步之間對資料進行取樣,以獲得規則的時間間隔,然後在擴充的資料上應用 LSTM。然而,當經過時間以天為單位時,對於中間相隔數年的時間步,需要取樣如此之多的新記錄。其次,資料插補方法可能會嚴重影響效能。患者記錄包含詳細資訊,很難保證插補的記錄反映了現實情況。因此,建議改變常規 LSTM 的架構以處理時間不規則性。

3.2 Patient Subtyping with T-LSTM Auto-Encoder

​ In this paper, patient subtyping is posed as an unsupervised clustering problem since we do not have any prior information about the groups inside the patient cohort. An efficient representation summarizing the structure of the temporal records of patients is required to be able to cluster temporal and complex EHR data. Autoencoders provide an unsupervised way to directly learn a mapping from the original data [2]. LSTM auto-encoders have been used to encode sequences such as sentences [32] in the literature. Therefore, we propose to use T-LSTM auto-encoder to learn an effective single representation of the sequential records of a patient. T-LSTM auto-encoder has T-LSTM encoder and T-LSTM decoder units with different parameters which are jointly learned to minimize the reconstruction error. The proposed auto-encoder can capture the long and the short term dependencies by incorporating the elapsed time into the system and learn a single representation which can be used to reconstruct the input sequence. Therefore, the mapping learned by the T-LSTM auto-encoder maintains the temporal dynamics of the original sequence with variable time lapse.
In Figure 3, a single layer T-LSTM auto-encoder mechanism is given for a small sequence with three elements \(\left[X_1, X_2, X_3\right]\). The hidden state and the cell memory of the T-LSTM encoder at the end of the input sequence are used as the initial hidden state and the memory content of the T-LSTM decoder. First input element and the elapsed time of the decoder are set to zero and its first output is the reconstruction \(\left(\hat{X}_3\right)\) of the last element of the original sequence \(\left(X_3\right)\). When the reconstruction error \(E_r\) given in Equation 1 is minimized, T-LSTM encoder is applied to the original sequence to obtain the learned representation, which is the hidden state of the encoder at the end of the sequence.

\[E_r=\sum_{i=1}^L\left\|X_i-\hat{X}_i\right\|_2^2, \]

where \(L\) is the length of the sequence, \(X_i\) is the \(i\) th element of the input sequence and \(\hat{X}_i\) is the \(i\) th element of the reconstructed sequence. The hidden state at the end of the sequence carries concise information about the input such that the original sequence can be reconstructed from it. In other words, representation learned by the encoder is a summary of the input sequence [7]. The number of layers of the auto-encoder can be increased when the input dimension is high. A single layer auto-encoder requires more number of iterations to minimize the reconstruction error when the learned representation has a lower dimensionality compared to the original input. Furthermore, learning a mapping to low dimensional space requires more complexity in order to capture more details of the high dimensional input sequence. In our experiments, a two layer T-LSTM auto-encoder, where the output of the first layer is the input of the second layer, is used because of the aforementioned reasons.

​ Given a single representation of each patient, patients are grouped by the \(k\)-means clustering algorithm. Since we do not make any assumption about the structure of the clusters, the simplest clustering algorithm, \(k\)-means, is preferred. In Figure 3 , a small illustration of clustering the patient cohort for 8 patients is shown. In this figure, learned representations are denoted by \(R\). If \(R\) has the capability to represent the distinctive structure of patient sequence, then clustering algorithm can group patients with similar features (diagnoses, lab results, medications, conditions, and so on) together. Thus, each patient group has a subtype, which is a collection of common medical features present in the cluster. Given a new patient, learned T-LSTM encoder is used to find the representation of the patient and the subtype of the cluster which gives the minimum distance between the cluster centroid and the new patient's representation is assigned to the new patient. As a result, T-LSTM auto-encoder learns powerful single representation of temporal patient data that can be easily used to obtain the subtypes in the patient population.

相關文章