醫學AI論文解讀 |Circulation|2018| 超聲心動圖的全自動檢測在臨床上的應用

忽逢桃林發表於2020-11-19

文章來自微信公眾號:機器學習煉丹術。號主煉丹兄WX:cyx645016617.文章有問題或者想交流的話歡迎~

參考目錄:
@

0 論文

論文是2018年的,發表在醫學期刊《Circulation》的一篇文章《Fully Automated Echocardiogram Interpretation in Clinical Practice》 (超聲心動圖在臨床中的自動化檢測)。現在對於整體的學習做一個回顧,可以當成導讀:整個文章的演算法方面不難,分類模型用的VGG,分割模型用的Unet,損失函式中規中矩,圖片處理中規中矩,算是一個老方法在醫學領域的一個使用。本文包含三個部分,英文的論文原文內容,宋體的百度翻譯內容,以及加粗字型的我的理解與精煉的內容。

1 概述

Using 14 035 echocardiograms spanning a 10-year period, we trained and evaluated convolutional neural network models for multiple tasks, including automated identification of 23 viewpoints and segmentation of cardiac chambers across 5 common views. The segmentation output was used to quantify chamber volumes and left ventricular mass, determine ejection fraction, and facilitate automated determination of longitudinal strain through speckle tracking. Results were evaluated through comparison to manual segmentation and measurements from 8666 echocardiograms obtained during the routine clinical workflow. Finally, we developed models to detect 3 diseases: hypertrophic cardiomyopathy, cardiac amyloid, and pulmonary arterial hypertension.

我們使用了10年的14035張超聲心動圖,訓練和評估了用於多個任務的卷積神經網路模型,包括23個視點的自動識別和5個常見檢視的心腔分割。分割輸出用於量化腔容積和左心室質量,確定射血分數,並通過散斑跟蹤自動確定縱向應變。通過與手工分割和常規臨床工作流程中獲得的8666張超聲心動圖的測量結果進行比較來評估結果。最後,我們建立了三種疾病的模型:肥厚型心肌病、心臟澱粉樣蛋白和肺動脈高壓。

Convolutional neural networks accurately identified views (eg, 96% for parasternal long axis), including flagging partially obscured cardiac chambers, and enabled the segmentation of individual cardiac chambers. The resulting cardiac structure measurements agreed with study report values (eg, median absolute deviations of 15% to 17% of observed values for left ventricular mass, left ventricular diastolic volume, and left atrial volume). In terms of function, we computed automated ejection fraction and longitudinal strain measurements (within 2 cohorts), which agreed with commercial software-derived values (for ejection fraction, median absolute deviation=9.7% of observed, N=6407 studies; for strain, median absolute deviation=7.5%, n=419, and 9.0%, n=110) and demonstrated applicability to serial monitoring of patients with breast cancer for trastuzumab cardiotoxicity. Overall, we found automated measurements to be comparable or superior to manual measurements across 11 internal consistency metrics (eg, the correlation of left atrial and ventricular volumes). Finally, we trained convolutional neural networks to detect hypertrophic cardiomyopathy, cardiac amyloidosis, and pulmonary arterial hypertension with C statistics of 0.93, 0.87, and 0.85, respectively.

卷積神經網路能準確識別視野(例如胸骨旁長軸為96%),包括部分模糊的心腔,並能分割單個的心腔。左心室容積測量值與左心室容積絕對值的17%一致。在功能方面,我們計算了自動射血分數和縱向應變測量值(在兩個佇列內),這與商業軟體得出的值一致(射血分數,中值絕對偏差=觀察值的9.7%,N=6407研究;對於應變,中值絕對偏差=7.5%,N=419,和9.0%,n=110),並證明適用於對乳腺癌患者進行曲妥珠單抗心臟毒性的連續監測。總的來說,我們發現在11個內部一致性指標(例如,左心房和心室容積的相關性)中,自動測量與手動測量相當或優於人工測量。最後,我們訓練了卷積神經網路來檢測肥厚性心肌病、心臟澱粉樣變性和肺動脈高壓,其C統計量分別為0.93、0.87和0.85。

2 pipeline


先對資料進行分類,然後再做分割。

Preprocessing entailed automated downloading of echocardiograms in Digital Imaging and Communications in Medicine format, separating videos from still images, extracting metadata (eg, frame rate, heart rate), converting them into numeric arrays for matrix computations, and deidentifying images by overwriting patient health information. We next used convolutional neu- ral networks (described later) for automatically determining echocardiographic views. Based on the identified views, videos were routed to specific segmentation models (parasternal long axis [PLAX], parasternal short axis, apical 2-chamber [A2c], api- cal 3-chamber, and apical 4-chamber [A4c]), and the output was used to derive chamber measurements, including lengths, areas, volumes, and mass estimates. Next, we generated 2 commonly used automated measures of left ventricular (LV) function: ejection fraction and longitudinal strain. Finally, we derived models to detect 3 diseases: hypertrophic cardiomyop- athy, pulmonary arterial hypertension, and cardiac amyloidosis.

預處理需要自動下載醫學格式的數字成像和通訊中的超聲心動圖,從靜態影像中分離視訊,提取後設資料(例如幀速率、心率),將其轉換為矩陣計算的數字陣列,以及通過覆蓋患者健康資訊來消除影像的標識。下一步我們使用卷積神經網路(稍後描述)來自動確定超聲心動圖檢視。【這一步應該就是對圖進行分類】,根據確定的檢視,視訊被路由到特定的分割模型(胸骨旁長軸[PLAX]、胸骨旁短軸、心尖2腔[A2c]、api-cal3腔和心尖4腔[A4c])【這裡知道應該是一個5分類的任務】,輸出用於推導腔室測量值,包括長度、面積、體積和質量估計值。接下來,我們生成了兩種常用的左心室功能自動測量方法:射血分數和縱向應變。最後,我們建立了檢測3種疾病的模型:肥厚型心肌病、肺動脈高壓和心臟澱粉樣變性。

3 技術細節

Specifically, 277 echocardiograms col- lected over a 10-year period were used to derive a view clas- sification model (Table II in the online-only Data Supplement). The image segmentation model was trained from 791 images divided over 5 separate views (Table III in the online-only Data Supplement). Comparison of automated and manual mea- surements was made against 8666 echocardiograms, with the majority of measurements made from 2014 to 2017 (Table IV in the online-only Data Supplement). For this pur- pose, we used all studies where these measurements were available (ie, there was no selection bias). The number of images used for training the different segmentation models was not planned in advance, and models were retrained as more data accrued over time. From initial testing, we rec- ognized that at least 60 images would be needed, and we allocated more training data and resources to A2c and A4c views because these were more central to measurements for both structure and function.

具體而言,在10年的時間裡收集了277張超聲心動圖,用於推導檢視分類模型(僅線上資料補充中的表II)。影像分割模型是從791張影像中訓練出來的,這些影像分為5個獨立的檢視(表3,僅線上資料補充)。將自動和手動測量與8666個超聲心動圖進行了比較,大多數測量是在2014年至2017年進行的(僅線上資料補充中的表IV)。對於這個目標,我們使用了所有這些測量資料可用的研究(即,沒有選擇偏差)。用於訓練不同分割模型的影像數量沒有事先計劃好,隨著時間的推移積累更多的資料,模型被重新訓練。從最初的測試中,我們發現至少需要60幅影像,並且我們為A2c和A4c檢視分配了更多的訓練資料和資源,因為這兩個檢視對結構和功能的測量更為重要。

從中可以知道:【訓練的時候使用的樣本只有幾百個就足夠了】

3.1 預處理

We identified 260 patients at UCSF who met guideline-based criteria for hypertrophic cardiomyopathy: “unexplained left ventricular (LV) hypertrophy (maximal LV wall thickness ≥ 15 mm) associated with nondilated ventricular chambers in the absence of another cardiac or systemic disease that itself would be capable of producing the magnitude of hypertro- phy evident in a given patient.”9 These patients were selected from 2 sources: the UCSF Familial Cardiomyopathy Clinic and the database of clinical echocardiograms. Patients had a variety of thickening patterns, including upper septal hyper- trophy, concentric hypertrophy, and predominantly apical hypertrophy. A subset of patients underwent genetic testing. Overall, 18% of all patients had pathogenic or likely patho- genic mutations. We downloaded all echocardiograms within the UCSF database corresponding to these patients and confirmed evidence of hypertrophy. We excluded bicycle, treadmill, and dobutamine stress echocardiograms because these tend to include slightly modified views or image anno- tations that could have confounding effects on models trained for disease detection. We also excluded studies of patients conducted after septal myectomy or alcohol septal ablation and studies of patients with pacemakers or implantable defibrillators. Control patients were also selected from the UCSF echocardiographic database. For each hypertrophic cardiomyopathy (HCM) case study, ≤5 matched control studies were selected, with matching by age (in 10-year bins), sex, year of study, ultrasound device manufacturer, and model. This process was simplified by organizing all of our studies in a nested format in a python dictionary so we can look up studies by these characteris- tics. Given that the marginal cost of analyzing additional samples is minimal in our automated system, we did not perform a greedy search for matched controls. Case, con- trol, and study characteristics are described in Table V in the online-only Data Supplement.
We did not require that cases were disease-free, only that they did not have HCM.

我們在加州大學舊金山分校發現了260名符合肥厚性心肌病指南標準的患者:“未解釋的左心室(LV)肥大(最大左室壁厚≥15 mm)與非擴張性心室室有關,而另一種心臟或系統性疾病本身能夠產生這些患者選自2個來源:加州大學舊金山分校家族性心肌病診所和臨床超聲心動圖資料庫。患者有各種各樣的增厚模式,包括上中隔高營養型,向心性肥大和以心尖肥大為主。一部分病人接受了基因檢測。總的來說,18%的患者有致病性或可能的致病性突變。
我們下載了加州大學舊金山分校資料庫中與這些患者相對應的所有超聲心動圖,並確認了肥大的證據。我們排除了bicycle、treadmill和多巴酚丁胺負荷超聲心動圖,因為這些超聲心動圖往往包括稍微修改的檢視或影像註釋,可能會對模型產生混淆的影響
接受疾病檢測訓練。我們也排除了對間隔肌切除術或酒精性間隔消融術後患者的研究,以及對使用起搏器或植入式除顫器的患者的研究。對照組患者也從加州大學舊金山分校超聲心動圖資料庫中選擇。對於每一個肥厚型心肌病(HCM)病例研究,選擇≤5個匹配的對照研究,按年齡(以10年為單位)、性別、研究年份、超聲裝置製造商和型號進行匹配。通過在python字典中以巢狀格式組織我們的所有研究,我們可以通過這些特徵來查詢研究,從而簡化了這個過程。考慮到分析額外樣本的邊際成本在我們的自動化系統中是最小的,我們沒有執行貪婪的搜尋匹配的控制。病例、對照和研究特點在僅線上資料補充中的表V中描述。
我們沒有要求病例是無病的,只是他們沒有HCM。

(補充資料)Additionally, each echocardiogram contains periphery information unique to different output settings on ultrasound machines used to collect the data. This periphery information details additional details collected (i.e. electrocardiogram, blood pressure, etc.). To improve generalizability across institutions, we wanted the classification of views to use ultrasound data and not metadata presented in the periphery. To address this issue, every image is randomly cropped between 0-20 pixels from each edge and resized to 224x224 during training. This provides variation in the periphery information, which guides the network to target more relevant features and improves the overall robustness of our view classification models.

此外,每個超聲心動圖都包含用於收集資料的超聲機器上不同輸出設定的外圍資訊。此外圍資訊詳細說明了收集到的其他詳細資訊(即心電圖、血壓等)。為了提高跨機構的概括性,我們希望檢視的分類使用超聲資料,而不是外圍顯示的後設資料。為了解決這個問題,每個影像從每個邊緣隨機裁剪0-20畫素,並在訓練期間調整為224x224。提高了我們的分類模型的魯棒性,為我們的網路分類提供了更多的相關資訊。

(補充資料)Training data comprised of 10 random frames from each manually labeled echocardiographic video. We trained our network on approximately 70,000 pre -processed images. For stochastic optimization, we used the ADAM optimizer2 with an initial learning rate of 1e-5 and mini-batch size of 64. For regularization, we applied a weight decay of 1e-8 on all network weights and dropout with probability 0.5 on the fully connected layers. We ran our tests for 20 epochs or ~20,000 iterations, which takes ~3.5 hours on a Nvidia GTX 1080. Runtime per video was 600 ms on average.
Accuracy was assessed by 5-fold cross-validation at the individual image level. When deploying the model, we would average the prediction probabilities for 10 randomly selected images from each video.

訓練資料由10個隨機幀組成,來自每個手動標記的超聲心動圖視訊。我們在大約70000張預處理影像上訓練了我們的網路。對於隨機優化,我們使用ADAM優化器2,初始學習率為1e-5,最小批量為64。對於正則化,我們對所有網路權重和完全連線層的概率為0.5的脫落。我們測試了20個時代或20000次迭代,在nvidiagtx1080上需要大約3.5小時。每段視訊的執行時間平均為600毫秒。
在5倍影像水平上對個體準確性進行評估。在部署該模型時,我們將平均每個視訊中隨機選擇的10個影像的預測概率。

【論文中做的預處理】

  1. 我們排除了bicycle、treadmill和多巴酚丁胺負荷超聲心動圖,因為這些超聲心動圖往往包括稍微修改的檢視或影像註釋,可能會對模型產生混淆的影響;
  2. 我們也排除了對間隔肌切除術或酒精性間隔消融術後患者的研究,以及對使用起搏器或植入式除顫器的患者的研究。
  3. 對照組患者也從加州大學舊金山分校超聲心動圖資料庫中選擇。
  4. 為了提高跨機的魯棒性,從每個圖片的每個邊緣隨機剪裁0到20個畫素,並在訓練期間調整成224x224大小。
  5. 一個標註的視訊中抽取10個視訊幀作為訓練的輸入,所以右7W個輸入,這個卷積也是2D的卷積,在推理階段把10個幀的預測值的均值作為視訊的預測值

3.2 卷積網路

We first developed a model for view classification. Typical echocardiograms consist of ≥70 separate videos representing multiple viewpoints. Furthermore, with rotation and adjust- ment of the zoom level of the ultrasound probe, sonogra- phers actively focus on substructures within an image, thus creating many variations of these views. Unfortunately, none of these views is labeled explicitly. Thus, the first learning step involves teaching the machine to recognize individual echo- cardiographic views. Models are trained using manual labels assigned to indi- vidual images. Using the 277 studies described earlier, we assigned 1 of 30 labels to each video (eg, parasternal long axis or subcostal view focusing on the abdominal aorta). Because discrimination of all views (subcostal, hepatic vein versus subcostal, inferior vena cava) was not necessary for our downstream analyses, we ultimately used only 23 view classes for our final model (Table IX in the online-only Data Supplement). The training data consisted of 7168 individually labeled videos.

我們首先開發了一個檢視分類模型。典型的超聲心動圖包括≥70個獨立的視訊,代表多個視點。此外,隨著超聲探頭的旋轉和縮放水平的調整,超聲工作者會主動聚焦於影像中的子結構,從而產生許多不同的檢視。不幸的是,這些檢視都沒有明確標記。因此,第一個學習步驟包括教機器識別個別的心臟回聲圖檢視。
使用分配給單個影像的手動標籤來訓練模型。利用前面描述的277項研究,我們為每個視訊指定了30個標籤中的一個(例如胸骨旁長軸或肋下視野聚焦於腹主動脈)。因為我們的下游分析不需要區分所有檢視(肋下、肝靜脈與肋下、下腔靜脈),我們最終只使用了23個檢視類作為最終模型(僅線上資料補充中的表IX)。訓練資料包括7168個單獨標記的視訊。

3.3 VGG分類網路結構

【簡單總結一下】:就是超聲心動圖資料是包含不同的視角的,所以需要先對視角進行分類,這裡分成了30類,手動標準了277個圖。然後訓練好分類模型,選取30類中的23個類作為下一階段的模型的資料。所以還剩下7168個視訊

The VGG network1 takes a fixed-sized input of grayscale images with dimensions 224x224 pixels (we use scikit-image to resize by linear interpolation). Each image is passed through ten convolution layers, five max-pool layers, and three fully connected layers. (We experimented with a larger number of convolution layers but saw no improvement for our task). All co nvolutional layers consist of 3x3 filters with stride 1 and all max-pooling is applied over a 2x2 window with stride 2. The convolution layers consist of 5 groups of 2 convolution layers, which are each followed by 1 max pool layer. The stack of convolutions is followed by two fully connected layers, each with 4096 hidden units, and a final fully connected layer with 23 output units. The output is fed into a 23-way softmax layer to represent 23 different echocardiographic views. This final step represents a standard multinomial logistic regression with 23 mutually exclusive classes. The predictors in this model are the output nodes of the neural network. The view with the highest probability was selected as the predicted view.

VGG network1採用尺寸為224x224畫素的固定大小的灰度影像輸入(我們使用scikit影像通過線性插值調整大小)。每個影像通過十個卷積層、五個最大池層和三個完全連線的層。(我們嘗試了大量的卷積層,但沒有發現我們的任務有任何改進)。所有共決層由3x3過濾器組成,步長為1,所有max池應用於步長為2的2x2視窗上。卷積層由5組2個卷積層組成,每個卷積層後面有1個最大池層。卷積之後是兩個完全連線的層,每個層有4096個隱藏單元,最後一個完全連線層有23個輸出單元。輸出被送入23路softmax層,以表示23種不同的超聲心動圖檢視。最後一步是標準的多項式logistic迴歸,有23個互斥類。該模型中的預測因子是神經網路的輸出節點。選擇概率最大的檢視作為預測檢視。

3.4 影像分割

To train image segmentation models, we derived a CNN based on the U-net architecture described by Ronneberger et al3. The U-net-based network we used accepts a 384x384 pixel fixed-sized image as input, and is composed of a contracting path and an expanding path with a total of 23 convolutional layers. The contracting path is composed of twelve convolutional layers with 3x3 filters followed by a rectified linear unit and four max pool layers each using a 2x2 window with stride 2 for down-sampling. The expanding path is composed of ten convolutional layers with 3x3 filters followed by a rectified linear unit, and four 2x2 up-convolution layers. Every up- convolution in the expansion path is concatenated with a feature map from the contracting path with same dimension. This is performed to recover the loss of pixel and feature locality due to downsampling images, which in turn enables pixel-level classification. The final layer uses a 1x1 convolution to map each feature vector to the output classes. Separate U-net CNN networks were trained to perform segmentation on images from PLAX, PSAX (at the level of the papillary muscle), A4c, A3c, and A2c views. Training data was derived for each class of echocardiographic view via manual segmentation. We performed data augmentation techniques including cropping and blacking out random areas of the echocardiographic image in order to improve model performance in the setting of a limited amount of training data. The rationale is that models that are robust to such variation are likely to generalize better to unseen data. Training data underwent varying degrees of cropp ing (or no cropping) at random amounts for each edge of the image. Similarly, circular areas of random size set at random locations in the echocardiographic image were set to 0-pixel intensity to achieve ''blackout''.This U-net architecture and the data augmentationtechniques enabled highly efficient training, achieving accurate segmentation from a relatively low number of training examples. Finally, in addition to pixelwise cross-entropy loss, we included a distance-based loss penalty for misclassified pixels. The loss function was based on the distance from the closest pixel with the same misclassified class in the ground truth image. This helped mitigate erroneous pixel predictions across the images. We used an Intersection Over Union (IoU) metric for assessment of results. The IoU takes the number of pixels which overlap between the ground truth and automated segmentation (for a given class, such as left atrial blood pool) and divides them by the total number of pixels assigned to that class by either method. It ranges between 0 and 100.

為了訓練影像分割模型,我們推導了一個基於Ronneberger等人描述的U-net結構的CNN。我們使用的基於U-net的網路接受384x384畫素的固定尺寸影像作為輸入,由收縮路徑和擴充套件路徑組成,共有23個卷積層。收縮路徑由12個帶3x3濾波器的卷積層和4個最大池層組成,每個層使用2x2視窗和步長2進行下采樣。擴充套件路徑由10個帶3x3濾波器的卷積層和4個2x2向上卷積層組成。擴充套件路徑中的每一個上卷積都與來自相同維數收縮路徑的特徵對映相連線。執行此操作是為了恢復由於影像的下采樣而丟失的畫素和特徵區域性性,這反過來又支援畫素級分類。最後一層使用1x1卷積將每個特徵向量對映到輸出類。
訓練獨立的U-netcnn網路對PLAX、PSAX(乳頭肌水平)、A4c、A3c和A2c檢視的影像進行分割。訓練資料是通過人工分割得到的每一類超聲心動圖檢視。為了在有限的訓練資料環境下提高模型的效能,我們採用了資料增強技術,包括裁剪和去除超聲心動圖影像的隨機區域。其基本原理是,對這種變化具有魯棒性的模型很可能會更好地概括為看不見的資料。訓練資料經歷了不同程度的裁剪(或不裁剪),對影像的每個邊緣進行隨機數量的裁剪。同樣,在超聲心動圖影像中隨機位置設定的隨機大小的圓形區域被設定為0畫素強度,以實現“斷電”。這種U-net結構和資料增強技術實現了高效的訓練,從相對較少的訓練樣本中實現了精確的分割。最後,除了畫素交叉熵損失,我們還包括了基於距離的損失懲罰
錯誤分類的畫素。損失函式是基於距離地面真實影像中同一類錯誤分類的最近畫素的距離。這有助於減少影像中錯誤的畫素預測。
我們使用了一個相交於聯合(IoU)度量來評估結果。IoU取基本真實值和自動分割(對於給定的類別,如左心房血池)之間重疊的畫素數,併除以通過任一方法分配給該類別的畫素總數。範圍在0到100之間。

【簡單的解讀】

使用的模型是Unet,然後輸入資料是2D影像384x384大小,然後網路結構就是23個卷積層,比較常規,由收縮路徑和擴充套件路徑組成,共有23個卷積層。收縮路徑由12個帶3x3濾波器的卷積層和4個最大池層組成,每個層使用2x2視窗和步長2進行下采樣。擴充套件路徑由10個帶3x3濾波器的卷積層和4個2x2向上卷積層組成。

因為資料有限,所以使用了剪裁(之前提到的對邊緣進行剪裁)、去除隨機區域(模擬斷電現象)。損失函式除了交叉熵,還有對於分類錯誤的畫素有基於距離的損失,這個距離是指與其他同樣被預測錯誤類別的畫素之間的距離。我覺得應該是距離越遠,懲罰越大,距離越近懲罰越小,保證畫素儘可能的聚成一團。

演算法的衡量標準是IoU指標。

4 遇到的問題

During the training process, we found that our CNN models readily segmented the LV across a wide range of videos from hundreds of studies, and we were thus interested in understanding the origin of the extreme outliers in our Bland-Altman plots (Figure 4). We under- took a formal analysis of the 20 outlier cases where the discrepancy between manual and automated measure- ments for LV end diastolic volume was highest (>99.5th percentile). This included 10 studies where the auto- mated value was estimated to be much higher than manual (DiscordHI) and 10 where the reverse was seen (DiscordLO). For each study, we repeated the manual LV end diastolic volume measurement. For every 1 of the 10 studies in DiscordHI, we de- termined that the automated result was in fact cor- rect (median absolute deviation=8.6% of the repeat manual value), whereas the prior manual measure- ment was markedly inaccurate (median absolute devia- tion=70%). It is unclear why these incorrect values had been entered into our clinical database. For DiscordLO (ie, much lower automated value), the results were mixed. For 2 of the 10 studies, the automated value was correct and the previous manual value erroneous; for 3 of the 10, the repeated value was intermediate between automated and manual. For 5 of the 10 stud- ies in DiscordLO, there were clear problems with the au-
tomated segmentation. In 2 of the 5, intravenous con- trast had been used in the study, but the segmentation algorithm, which had not been trained on these types of data, attempted to locate a black blood pool. The third poorly segmented study involved a patient with complex congenital heart disease with a double out- let right ventricle and membranous ventricular septal defect. The fourth study involved a mechanical mitral valve with strong acoustic shadowing and reverbera- tion artifact. Finally, the fifth poorly segmented study had a prominent calcified false tendon in the LV com- bined with a moderately sized pericardial effusion. This outlier analysis thus highlighted the presence of inac- curacies in our clinical database as well as the types of studies that remain challenging for our automated segmentation algorithms.

在培訓過程中,我們發現我們的CNN模型很容易將LV分割成來自數百個研究的大量視訊,因此我們對理解我們平淡無奇的Altman圖中極端異常值的來源很感興趣(圖4)。我們對20例異常病例進行了正式分析,其中左室舒張末期容積的手動和自動測量之間的差異最大(>99.5%)。這包括10項研究,其中自動匹配值估計遠高於手動(DiscordHI)和10項發現相反(DiscordLO)的研究。對於每項研究,我們重複手動左室舒張末期容積測量。
對於DiscordHI的10項研究中的每一項,我們確定自動結果實際上是正確的(中值絕對偏差=重複手動值的8.6%),而先前的手動測量明顯不準確(絕對偏差中值=70%)。目前尚不清楚為什麼這些不正確的數值被輸入到我們的臨床資料庫中。對於DiscordLO(即自動化值低得多),結果是混合的。在10項研究中,有2項的自動值是正確的,以前的手動值是錯誤的;在10項研究中,有3項重複值介於自動和手動之間。在不和諧的10個種馬中,有5個是明顯的問題-
自動分割。在這5項研究中,有2項採用了靜脈注射法,但分割演算法,沒有經過訓練,這些型別的資料,試圖定位一個黑色血泊。第三個分段較差的研究涉及一個複雜的先天性心臟病患者,右心室雙出口和膜性室間隔缺損。第四項研究涉及一個機械二尖瓣與強聲影和混響偽影。最後,第五個分段不良的研究發現左室有明顯鈣化假腱,並伴有中等大小的心包積液。因此,這種異常值分析突出了我們的臨床資料庫中存在的不精確性,以及對我們的自動分割演算法仍然具有挑戰性的研究型別。

【個人的理解就是】
訓練過程中,CNN模型是有能力從大量的視訊中分辨出LV左心室的位置的,因此他們對一些分割錯誤的異常值非常的好奇,最後的結果是發現少部分的異常值的標註是錯誤的,一部分的預測值是錯誤的。而錯誤的樣本中,有一半是因為出現了患者使用了靜脈注射的方法,而這類的方法事先並沒有訓練模型;另外一半是因為患者的心臟是非常特殊的(合併右心室雙出口和膜性室間隔缺損的複雜先天性心臟病患者)。

相關文章