圖 7:機器學習應用的計算病理學任務。 CNNs 在病理影像中的應用效果很好,因為從單個活檢或切除的病理中可以獲得大量的可用於訓練的畫素點。給定足夠多的有效樣本,DL 演算法可以自動學習各種分類任務的特徵 [24]。在具體的影像分析任務中,大多數問題集都採用 DL 演算法和傳統影像分析演算法相結合的方法。這樣做有幾個原因。首先,雖然 DL 已經顯示出它在非常具體的問題 (例如腎小球的檢測) 上能夠趕上或超過人類的能力,但它仍然不是一個很好的通用影像分析工具。由於缺乏靈活性,開發時間仍然很長。因為生成這些標籤的成本很高, 可用於特定分類任務的專家標籤總體上也比較稀缺。緩解這一問題的方法包括使用免疫組化染色為註解變長的樣本向病理學家提供額外資訊,以及廣泛使用的案例 (癌細胞與正常細胞) 增加有效專家註釋標籤的可用性,這也是一項正在積極進行的社群性任務。第二是透明度問題。DL 方法以其黑盒方法而聞名。決定分類任務背後的基本原理尚不清楚。這在藥物研究特別是病理學分析中是很難接受的。第三,是在臨床試驗中直接應用 DL 來推斷治療反應所需的大樣本量問題。DL 通常需要基於數萬樣本來學習,而臨床試驗通常不會產生足夠的樣本。在某些情況下,可能會將跨臨床試驗的資料組合在一起,但可能存在偏差,從而使結果更難解釋。 三、文章小結 在這篇文章中,我們介紹了幾個應用機器學習協助藥物開發的任務示例。這些模型或演算法也可以應用在公共衛生管理領域中,與藥物發現相結合可能會導致個性化醫學的重大進步。此外,在醫學領域中機器學習還可以應用於電子健康記錄和真實世界證據,以改善臨床試驗結果並最佳化臨床試驗資格評估過程。 但是,基於深度訓練的神經網路的一個典型問題是缺乏可解釋性,也就是說,很難從訓練的神經網路中獲得關於它是如何得出結果的合適的解釋。這一問題,在其它機器學習的應用領域中也同樣存在。但是這個問題在醫學或藥物研發中顯得更加嚴重,這種缺乏可解釋性可能會阻礙科學家、監管機構、醫生和患者選擇這項技術,即使在神經網路比人類專家表現更好的情況下也是如此。比起人類專家的診斷,患者會更相信黑盒機器學習演算法的診斷嗎?一家制藥公司是否會因為機器學習演算法預測選擇了一個小分子就將其納入他們的投資組合並投入臨床,而這個機器學習演算法根本無法明確解釋為什麼會選擇這個分子?目前,機器學習的結果主要作為一種猜測或預估的起點,然後由研究人員在研究中進一步發展,本文開頭提到的 DeepMind 給出的與引起 COVID-19 的病毒有關的蛋白質結構的預測結果就類似如此。 機器學習的另一個重要問題是可重複性,這是因為機器學習輸出高度依賴於網路引數的初始值或權重,甚至取決於向網路呈現訓練樣本的順序,因為它們通常都是隨機選擇的。網路是否總是使用與輸入相同的表達資料選擇相同的疾病目標?機器學習方法提出的藥物結構是否總是相同的?還有一個需要考慮的重要問題是,是否有大量高質量、準確和精選的資料來訓練和開發機器學習模型。對所需數量和精度的要求取決於資料型別的複雜性和要解決的問題。因此,生成這些資料集的成本可能很高。 醫學和藥物研發是一個專業程度很高的專門領域,如何在其中有效發揮機器學習演算法和模型的作用,值得研究人員深入研究,我們也會持續關注相關問題的最新研究進展。 本文參考引用的文章 [1] Ma, J. et al. (2015) Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55, 263–274[2] Mayr, A. et al. (2016) DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. http://dx.doi.org/10.3389/fenvs.2015.00080[3] Duvenaud, D. et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems, MIT Press. pp. 2224–2232[4] Go ´mez-Bombarelli, R. et al. (2016) Automatic chemical design using a data-driven continuous representation of molecules. ArXiv arXiv:1610.02415[5] Coley, C.W. et al. (2017) Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443[6] Segler, M.H.S. and Waller, M.P. (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry 23, 5966–5971[7] Costa, P. R., Acencio, M. L. & Lemke, N. A machine learning approach for genome- wide prediction of morbid and druggable human genes based on systems- level data. BMC Genomics11, S9–S9 (2010)[8] Jeon, J. et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high- throughput screening. Genome Med.6, 57 (2014)[9] Bravo, A., Pinero, J., Queralt- Rosinach, N., Rautschka, M. & Furlong, L. I. Extraction of relations between genes and diseases from text and large- scale data analysis: implications for translational research. BMC Bioinformatics16, 55 (2015)[10] Jha, A., Gazzara, M. R. & Barash, Y. Integrative deep models for alternative splicing. Bioinformatics33, i274–i282 (2017)[11] Vaquero- Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife5, e11752 (2016)[12] Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell166, 740–754 (2016). This paper applies ML to data from somatic mutations, copy number alterations, DNA methylation and gene expression from 1,000 cancer cell lines to model drug response of the cell lines and demonstrates the importance of genomic features for prediction[13] Wang, Q., Feng, Y., Huang, J., Wang, T. & Cheng, G. A novel framework for the identification of drug target proteins: combining stacked auto- encoders with a biased support vector machine. PLOS ONE12, e0176486 (2017)[14] Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model.55, 263–274 (2015)[15] Barati Farimani, A., Feinberg, E. & Pande, V. Binding pathway of opiates to μ- opioid receptors revealed by machine learning. Biophys. J.11 4, 62a–63a (2018)[16] Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model.55, 263–274 (2015)[17] Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray- based predictive models. Nat. Biotechnol.28, 827–838 (2010)[18] Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol.32, 1202–1212 (2014). This paper is an effort to collect and objectively evaluate various ML approaches by teams around the world on multi- omics data sets and various compounds. The data sets and results are continuously used as benchmarks for new method developments and validation[19] Tasaki, S. et al. Multi- omics monitoring of drug response in rheumatoid arthritis in pursuit of molecular remission. Nat. Commun.9, 2755 (2018). This work identifies molecular signatures that are resistant to drug treatments and illustrates a multi-omics approach to understanding drug response.[20]Paré, G., Mao, S. & Deng, W. Q. A machine- learning heuristic to improve gene score prediction of polygenic traits. Sci. Rep.7, 12665 (2017)[21] Ding, J., Condon, A. & Shah, S. P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun.9, 2002 (2018)[22]Tan, J., Hammond, J. H., Hogan, D. A. & Greene, C. A.-O. ADAGE- based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. mSystems1, e00025–15 (2016)[23] Way, G. P. & Greene, C. S. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac. Symp. Biocomput.23, 80–91 (2018)[24] anowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Informat.7, 29 (2016). This article is the first comprehensive review of DL in the context of digital pathology images. The paper also systematically explains and presents approaches for training and validating DL classifiers for a number of image- based problems in digital pathology, including cell detection, segmentation and tissue classification[25] https://deepmind.com/blog/article/AlphaFold-Using-AI-for-scientific-discovery[26] https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19[27] Hong Ming Chen, et al. The rise of deep learning in drug discovery, Drug Discovery Today. [28] Stephenson, Natalie,Survey of Machine Learning Techniques in Drug Discovery, Current Drug Metabolism. [29] Liu, B. et al. (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Central Science 3, 1103–1113 [30] Jin, W. et al. (2017) Predicting organic reaction outcomes with Weisfeiler–Lehman network. ArXiv arXiv:1709.04555 [31] Morgan, H.L. (1965) The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113[32] Vamathevan, Jessica Clark, Dominic Czodrowski, Paul Dunham, Ian Ferran, Edgardo Lee, George Li, Bin Madabhushi, Anant Shah, Parantu Spitzer, Michaela Zhao, Shanrong, Applications of machine learning in drug discovery and development, Nature Reviews Drug Discovery, 2019