リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「Active learning efficiently converges on rational limits of toxicity prediction and identifies patterns for molecule design」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

Active learning efficiently converges on rational limits of toxicity prediction and identifies patterns for molecule design

Ahsan, Habib Polash 京都大学 DOI:10.14989/doctor.k23092

2021.03.23

概要

Government organizations utilizes different assays to assess the safety of chemicals. Most of the established assays have potential drawbacks which includes but are not limited to: lack of cost effectiveness, long evaluation times, false negative results, so forth. Moreover, animal-based assays are increasingly becoming discouraged by different animal welfare organizations. As a consequence, toxicologists are encouraging the development of new types of toxicity detection assays which would overcome the drawbacks of the current toxicity assays.

A number of recent scientific studies have employed machine learning (ML) to predict binding of compounds to proteins. Such studies are accelerating computational chemistry, methods in toxicology, and usage of Artificial Intelligence in drug discovery. In 2017, Reker et al employed a ML method called Active Learning (AL) to model several kinases and G-protein coupled receptor proteins with high performances. One advantage of AL is that, instead of learning all of the available bioactivity data, it iteratively selects a subset of the data and builds a reduced-size model that is as good as a model constructed from the whole dataset. By doing this AL can reduce computational resources substantially. In 2019, Polash and co-workers employed AL in predicting highly selective inhibitors for matrix metalloproteinases, a protein family known to play various roles in cancer cell proliferation. They also deconvoluted ML model’s decision-making process.

Following the elucidation of a ML model architecture, in this study AL was applied to a dataset of approximately 9000 compounds which were tested for acute oral toxicity in rats. The data have been curated by the US government and made publicly available for the evaluation and development of predictive methodologies. Particularly notable is the fact that compared to biochemical assay data, sources of in-vivo toxicity are diverse, and thus the predictive challenge is amplified compared to biochemical assay data. Unlike many previous studies with mathematically complex ML algorithms and full activity dataset, this study showed that only a strategically subset of data was sufficient to build a model that could predict toxic compounds with high performance.

Instead of developing a `black box` model which lacks insight into model building steps, the authors tried to deconvolute the decision-making steps. In depth analyses showed that some of the compounds were predictable from the early stages of model building, whereas some compounds became predictable gradually. However, some of the compounds never could be correctly predicted; subsequent analysis revealed that these compounds frequently formed a “toxicity cliff”. A toxicity cliff can be described as a minor change in structure leading to a large change in toxicity (activity). Apparently, it was found that some of the toxic compounds in the validation data have nearest neighbors in training data that are not toxic and as a result the model failed to classify them accurately; this suggests the rational limits of toxicity prediction. Furthermore, it has been shown that the removal of the compounds from the data that had toxicity values near the borderline separating toxic and non-toxic classification yielded even higher performance. Finally, compound structure analysis revealed that some compound substructures are differentially present or absent across toxic and nontoxic compounds.

In summary, the study demonstrated efficient selection of compounds toward generating computational models for toxicity prediction in rats and provided insights about the chemical substructures and patterns that are crucial for classifying toxic compounds. Computational toxicity prediction is still a fairly nascent discipline with a variety of challenges. However, insights from this research will contribute further to develop better toxicity prediction models and the understanding of chemical fragment-toxicity association will contribute to the flagging of risk-associated structures for
drug discovery to avoid potential toxicity.

参考文献

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Available at: http://arxiv.org/abs/1603.04467 [Accessed November 22, 2019].

[2] A. Abdelaziz, H. Spahn-Langguth, K.-W. Schramm, I.V. Tetko, Consensus modeling for HTS assays using in silico descriptors calculates the best balanced accuracy in Tox21 challenge, Front. Environ. Sci. 4 (2016) 2, https://doi.org/10.3389/fenvs.2016.00002.

[3] D. Alberga, D. Trisciuzzi, K. Mansouri, G.F. Mangiatordi, O. Nicolotti, Prediction of acute oral systemic toxicity using a multifingerprint similarity approach, Toxicol. Sci. 167 (2018) 484–495, https://doi.org/10.1093/toxsci/kfy255.

[4] B.N. Ames, J. McCann, E. Yamasaki, Methods for detecting carcinogens and mu- tagens with the salmonella/mammalian-microsome mutagenicity test, Mutat. Res. Mutagen. Relat. Subj. 31 (1975) 347–363, https://doi.org/10.1016/0165-1161(75) 90046-1.

[5] P. Anastas, K. Teichman, E.C. Hubal, Ensuring the safety of chemicals, J. Expo. Sci. Environ. Epidemiol. 20 (2010) 395–396, https://doi.org/10.1038/jes.2010.28.

[6] D. Ballabio, F. Grisoni, V. Consonni, R. Todeschini, Integrated QSAR models to predict acute oral systemic toxicity, Mol. Inform. 38 (2019) 1800124, https://doi. org/10.1002/minf.201800124.

[7] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32, https://doi.org/10. 1023/A:1010933404324.

[8] J. Brown, Adaptive mining and model building of medicinal chemistry data with a multi-metric perspective, Future Med. Chem. 10 (2018) 1885–1887, https://doi. org/10.4155/fmc-2018-0188.

[9] J.B. Brown, Classifiers and their metrics quantified, Mol. Inform. 37 (2018), https:// doi.org/10.1002/minf.201700127.

[10] C.M. Fonseca, C.M. Fonseca, P.J. Fleming (1993). Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9077 [Accessed May 1, 2020].

[11] D. Gadaleta, K. Vuković, C. Toma, G.J. Lavado, A.L. Karmaus, K. Mansouri, et al., SAR and QSAR modeling of a large collection of LD50 rat acute oral toxicity data, J. Cheminform. 11 (2019) 58, https://doi.org/10.1186/s13321-019-0383-2.

[12] R. Huang, M. Xia, Editorial: Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental toxicants and drugs, Front. Environ. Sci. 5 (2017) 3, https://doi.org/10.3389/fenvs. 2017.00003.

[13] D. Kirkland, M. Aardema, L. Henderson, L. Müller, Evaluation of the ability of a battery of three in vitro genotoxicity tests to discriminate rodent carcinogens and non-carcinogens: I. Sensitivity, specificity and relative predictivity, Mutat. Res. Toxicol. Environ. Mutagen. 584 (2005) 1–256, https://doi.org/10.1016/J. MRGENTOX.2005.02.004.

[14] D. Kirkland, S. Pfuhler, D. Tweats, M. Aardema, R. Corvi, F. Darroudi, et al., How to reduce false positive results when undertaking in vitro genotoxicity testing and thus avoid unnecessary follow-up animal tests: report of an ECVAM workshop, Mutat. Res. Toxicol. Environ. Mutagen. 628 (2007) 31–55, https://doi.org/10.1016/J. MRGENTOX.2006.11.008.

[15] N.C. Kleinstreuer, A.L. Karmaus, K. Mansouri, D.G. Allen, J.M. Fitzpatrick, G. Patlewicz, Predictive models for acute oral systemic toxicity: a workshop to bridge the gap from research to regulation, Comput. Toxicol. (2018) 21–24, https://doi.org/10.1016/j.comtox.2018.08.002.

[16] T. Lang, F. Flachsenberg, U. von Luxburg, M. Rarey, Feasibility of active machine learning for multiclass compound classification, J. Chem. Inf. Model. 56 (2016) 12–20, https://doi.org/10.1021/acs.jcim.5b00332.

[17] J.C.D. Lopes, F.M. Dos Santos, A. Martins-José, K. Augustyns, H. De Winter, The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability, J. Cheminform. 9 (2017) 7, https://doi.org/10.1186/s13321-016-0189-4.

[18] K. Mansouri, J. Fitzpatrick, W. Casey, D. Allen, G. Patlewicz, A. Karmaus, et al., Developing predictive models for acute oral systemic toxicity: lessons learned from a global collaboration, CICSJ Bull. 37 (2019) 23, https://doi.org/10.11546/cicsj.37.23.

[19] B.W. Matthews, Comparison of the predicted and observed secondary structure of T4 phage lysozyme, BBA - Protein Struct. 405 (1975) 442–451, https://doi.org/10. 1016/0005-2795(75)90109-9.

[20] A. Mayr, G. Klambauer, T. Unterthiner, S. Hochreiter, DeepTox: toxicity prediction using deep learning, Front. Environ. Sci. 3 (2016) 80, https://doi.org/10.3389/ fenvs.2015.00080.

[21] N.S.H.N. Moorthy, S. Kumar, V. Poongavanam, Classification of carcinogenic and mutagenic properties using machine learning method, Comput. Toxicol. 3 (2017) 33–43, https://doi.org/10.1016/j.comtox.2017.07.002.

[22] A.W. Naik, J.D. Kangas, C.J. Langmead, R.F. Murphy, Efficient modeling and active learning discovery of biological responses, PLoS One 8 (2013) e83996, , https://doi. org/10.1371/journal.pone.0083996.

[23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.

[24] A.H. Polash, T. Nakano, S. Takeda, J.B. Brown, Applicability domain of active learning in chemical probe identification: convergence in learning from non-spe- cific compounds and decision rule clarification, Molecules 24 (2019) 2716, https:// doi.org/10.3390/molecules24152716.

[25] A.H. Polash, N. Takumi, S. Takeda, J.B. Brown, Systematic approaches to build predictive models for rat oral toxicity, CICSJ Bull. 37 (2019) 12, https://doi.org/10. 11546/cicsj.37.12.

[26] C. Rakers, R.A. Najnin, A.H. Polash, S. Takeda, J.B. Brown, Chemogenomic active learning’s domain of applicability on small, sparse qHTS matrices: a study using cytochrome P450 and nuclear hormone receptor families, ChemMedChem 13 (2018) 511–521, https://doi.org/10.1002/cmdc.201700677.

[27] C. Rakers, R.A. Najnin, A.H. Polash, S. Takeda, J.B. Brown, Chemogenomic active learning’s domain of applicability on small, sparse qHTS matrices: a study using cytochrome P450 and nuclear hormone receptor families, ChemMedChem (2018), https://doi.org/10.1002/cmdc.201700677.

[28] D. Reker, J.B. Brown, Selection of informative examples in chemogenomic datasets, Methods Mol. Biol. (Humana Press, New York, NY) (2018) 369–410, https://doi. org/10.1007/978-1-4939-8639-2_13.

[29] D. Reker, G. Schneider, Active-learning strategies in computer-assisted drug dis- covery, Drug Discov. Today 20 (2015) 458–465, https://doi.org/10.1016/J. DRUDIS.2014.12.004.

[30] D. Reker, P. Schneider, G. Schneider, J. Brown, Active learning for computational chemogenomics, Future Med. Chem. 9 (2017) 381–402, https://doi.org/10.4155/fmc-2016-0197.

[31] I. Rusyn, G.P. Daston, Computational toxicology: realizing the promise of the toxicity testing in the 21st century, Environ. Health Perspect. 118 (2010) 1047–1050, https://doi.org/10.1289/ehp.1001925.

[32] G. Schneider, W. Neidhart, T. Giller, G. Schmid, “Scaffold-Hopping” by topological pharmacophore search: a contribution to virtual screening, Angew. Chem. Int. Ed. 38 (1999) 2894–2896.

[33] J. Shawe-Taylor, N. Cristianini (2004). Kernel methods for pattern analysis. Cambridge University Press Available at: https://books.google.co.jp/books?hl=en&lr=&id=9i0vg12lti4C&oi=fnd&pg=PR8&dq=Kernel+Methods+for+Pattern+Analysis&ots=olCFrl3F5R&sig=mzYdGeZt1vEmfL65QRbZlIzX_Uo#v=onepage &q=Kernel Methods for Pattern Analysis&f=false [Accessed June 30, 2019].

[34] S.J. Shukla, R. Huang, C.P. Austin, M. Xia, The future of toxicity testing: a focus on in vitro methods using a quantitative high-throughput screening platform, Drug Discov. Today 15 (2010) 997–1007, https://doi.org/10.1016/j.drudis.2010.07.007.

[35] K. Taylor, Ten years of REACH—an animal protection perspective, Altern. Lab. Anim. 46 (2018) 347–373, https://doi.org/10.1177/026119291804600610.

[36] O. Tcheremenskaia, C.L. Battistelli, A. Giuliani, R. Benigni, C. Bossa, In silico ap- proaches for prediction of genotoxic and carcinogenic potential of cosmetic in- gredients, Comput. Toxicol. 11 (2019) 91–100, https://doi.org/10.1016/j.comtox.2019.03.005.

[37] R.R. Tice, C.P. Austin, R.J. Kavlock, J.R. Bucher, Improving the human hazard characterization of chemicals: a Tox21 update, Environ. Health Perspect. 121 (2013) 756–765, https://doi.org/10.1289/ehp.1205784.

[38] Y. Uesawa, Rigorous selection of random forest models for identifying compounds that activate toxicity-related pathways, Front. Environ. Sci. 4 (2016) 9, https://doi. org/10.3389/fenvs.2016.00009.

[39] M.K. Warmuth, J. Liao, G. Rätsch, M. Mathieson, S. Putta, C. Lemmen, Active learning with support vector machines in the drug discovery process, J. Chem. Inf. Comput. Sci. (2003) 667–673, https://doi.org/10.1021/ci025620t.

[40] A.M. Wassermann, M. Wawer, J. Bajorath, Activity landscape representations for structure-activity relationship analysis, J. Med. Chem. 53 (2010) 8209–8223, https://doi.org/10.1021/jm100933w.

[41] M. Zaslavskiy, S. Jégou, E.W. Tramel, G. Wainrib, ToxicBlend: virtual screening of toxic compounds with ensemble predictors, Comput. Toxicol. 10 (2019) 81–88, https://doi.org/10.1016/j.comtox.2019.01.001.

参考文献をもっと見る