リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD

Watanabe, Naoki Kuriya, Yuki Murata, Masahiro Yamamoto, Masaki Shimizu, Masayuki Araki, Michihiro 神戸大学

2023.06

概要

The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.

この論文で使われている画像

参考文献

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

Agarwala, R.; Barrett, T.; Beck, J.; Benson, D.A.; Bollin, C.; Bolton, E.; Bourexis, D.; Brister, J.R.; Bryant, S.H.; Canese, K.; et al.

Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018, 46, D8–D13. [CrossRef]

Bateman, A.; Martin, M.J.; Orchard, S.; Magrane, M.; Agivetova, R.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Bursteinas,

B.; et al. UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [CrossRef]

Goodwin, S.; McPherson, J.D.; McCombie, W.R. Coming of Age: Ten Years of next-Generation Sequencing Technologies. Nat. Rev.

Genet. 2016, 17, 333–351. [CrossRef]

Sengupta, S.; Basak, S.; Saikia, P.; Paul, S.; Tsalavoutis, V.; Atiah, F.; Ravi, V.; Peters, A. A Review of Deep Learning with Special

Emphasis on Architectures, Applications and Recent Trends. Knowl. Based Syst. 2020, 194, 105596. [CrossRef]

Kulmanov, M.; Hoehndorf, R.; Cowen, L. DeepGOPlus: Improved Protein Function Prediction from Sequence. Bioinformatics

2020, 36, 422–429. [CrossRef]

Strodthoff, N.; Wagner, P.; Wenzel, M.; Samek, W. UDSMProt: Universal deep sequence models for protein classification.

Bioinformatics 2020, 36, 2401–2409. [CrossRef] [PubMed]

Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko,

A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [CrossRef] [PubMed]

Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Dustin Schaeffer,

R.; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373,

871–876. [CrossRef] [PubMed]

Xu, J.; McPartlon, M.; Li, J. Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat.

Mach. Intell. 2021, 3, 601–609. [CrossRef]

Jing, X.; Xu, J. Fast and effective protein model refinement using deep graph neural networks. Nat. Comput. Sci. 2021, 1, 462–469.

[CrossRef]

Greener, J.G.; Kandathil, S.M.; Jones, D.T. Deep learning extends de novo protein modelling coverage of genomes using iteratively

predicted structural constraints. Nat. Commun. 2019, 10, 3977. [CrossRef]

Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale

prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [CrossRef]

Zhou, X.; Hu, J.; Zhang, C.; Zhang, G.; Zhang, Y. Assembling multidomain protein structures through analogous global structural

alignments. Proc. Natl. Acad. Sci. USA 2019, 116, 15930–15938. [CrossRef] [PubMed]

Biology 2023, 12, 795

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

11 of 12

Zheng, W.; Wuyun, Q.; Zhou, X.; Li, Y.; Freddolino, P.L.; Zhang, Y. LOMETS3: Integrating deep learning and profile alignment for

advanced protein template recognition and function annotation. Nucleic Acids Res. 2022, 50, W454–W464. [CrossRef]

Almagro Armenteros, J.J.; Sønderby, C.K.; Sønderby, S.K.; Nielsen, H.; Winther, O. DeepLoc: Prediction of Protein Subcellular

Localization Using Deep Learning. Bioinformatics 2017, 33, 3387–3395. [CrossRef] [PubMed]

Wang, F.; Wei, L. Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on

immunohistochemistry images. Bioinformatics 2022, 38, 2602–2611. [CrossRef] [PubMed]

Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep Learning Enables High-Quality and High-Throughput Prediction of Enzyme Commission

Numbers. Proc. Natl. Acad. Sci. USA 2019, 116, 13996–14001. [CrossRef]

Li, Y.; Wang, S.; Umarov, R.; Xie, B.; Fan, M.; Li, L.; Gao, X. DEEPre: Sequence-Based Enzyme EC Number Prediction by Deep

Learning. Bioinformatics 2018, 34, 760–769. [CrossRef]

Nallapareddy, M.V.; Dwivedula, R. ABLE: Attention Based Learning for Enzyme Classification. Comput. Biol. Chem. 2021, 94,

1–10. [CrossRef]

Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C.A.; Bekas, C.; Lee, A.A. Molecular Transformer: A Model for UncertaintyCalibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583. [CrossRef]

Ucak, U.V.; Ashyrmamatov, I.; Ko, J.; Lee, J. Retrosynthetic reaction pathway prediction through neural machine translation of

atomic environments. Nat. Commun. 2022, 13, 1186. [CrossRef]

Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.;

Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward

Responsible AI. Inf. Fusion 2020, 58, 82–115. [CrossRef]

Jiménez-Luna, J.; Grisoni, F.; Schneider, G. Drug Discovery with Explainable Artificial Intelligence. Nat. Mach. Intell. 2020, 2,

573–584. [CrossRef]

Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference

on Machine Learning, Sydney, Australia, 6–11 August 2017.

Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural

Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017.

Jha, A.; Aicher, J.K.; Gazzara, M.R.; Singh, D.; Barash, Y.; Barash, Y. Enhanced Integrated Gradients: Improving Interpretability of

Deep Learning Models Using Splicing Codes as a Case Study. Genome Biol. 2020, 21, 149. [CrossRef] [PubMed]

Lin, Y.; Pan, X.; Shen, H. Bin. LncLocator 2.0: A Cell-Line-Specific Subcellular Localization Predictor for Long Non-Coding RNAs

with Interpretable Deep Learning. Bioinformatics 2021, 37, 2308–2316. [CrossRef] [PubMed]

Junghare, M.; Spiteller, D.; Schink, B. Anaerobic Degradation of Xenobiotic Isophthalate by the Fermenting Bacterium Syntrophorhabdus Aromaticivorans. ISME J. 2019, 13, 1252–1268. [CrossRef]

Marshall, S.A.; Fisher, K.; Cheallaigh, A.N.; White, M.D.; Payne, K.A.P.; Parker, D.A.; Rigby, S.E.J.; Leys, D. Oxidative maturation and structural characterization of prenylated FMN binding by UbiD, a decarboxylase involved in bacterial ubiquinone

biosynthesis. J. Biol. Chem. 2017, 292, 4623–4637. [CrossRef]

Weber, C.; Brückner, C.; Weinreb, S.; Lehr, C.; Essl, C.; Boles, E. Biosynthesis of cis,cis-muconic acid and its aromatic precursors,

catechol and protocatechuic acid, from renewable feedstocks by saccharomyces cerevisiae. Appl. Environ. Microbiol. 2012, 78,

8421–8430. [CrossRef]

Yoshida, T.; Inami, Y.; Matsui, T.; Nagasawa, T. Regioselective Carboxylation of Catechol by 3,4-Dihydroxybenzoate Decarboxylase

of Enterobacter Cloacae, P. Biotechnol. Lett. 2010, 32, 701–705. [CrossRef]

Álvarez-Rodríguez, M.L.; Belloch, C.; Villa, M.; Uruburu, F.; Larriba, G.; Coque, J.J.R. Degradation of Vanillic Acid and Production

of Guaiacol by Microorganisms Isolated from Cork Samples. FEMS Microbiol. Lett. 2003, 220, 49–55. [CrossRef]

Dhar, A.; Lee, K.S.; Dhar, K.; Rosazza, J.P.N. Nocardia Sp. Vanillic Acid Decarboxylase. Enzym. Microb. Technol. 2007, 41, 271–277.

[CrossRef]

He, Z.; Wiegel, J. Purification and characterization of an oxygen-sensitive, reversible 3,4-dihydroxybenzoate decarboxylase from

Clostridium hydroxybenzoicum. J. Bacteriol. 1996, 178, 3539–3543. [CrossRef] [PubMed]

Matsui, T.; Yoshida, T.; Hayashi, T.; Nagasawa, T. Purification, characterization, and gene cloning of 4-hydroxybenzoate

decarboxylase of Enterobacter cloacae P240. Arch. Microbiol. 2006, 186, 21–29. [CrossRef] [PubMed]

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of

the Advances in Neural Information Processing Systems 25, Lake Tahoe, NV, USA, 3–8 December 2012.

Baldi, P. Autoencoders, Unsupervised Learning, and Deep Architectures. In Proceedings of the ICML Workshop on Unsupervised

and Transfer Learning, Bellevue, WA, USA, 2 July 2011.

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In

Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017.

Li, W.; Godzik, A. Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences.

Bioinformatics 2006, 22, 1658–1659. [CrossRef] [PubMed]

Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and

Applications. BMC Bioinf. 2009, 10, 421. [CrossRef] [PubMed]

Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow:

Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [CrossRef]

Biology 2023, 12, 795

42.

43.

44.

45.

46.

47.

48.

49.

12 of 12

Jacewicz, A.; Izumi, A.; Brunner, K.; Schnell, R.; Schneider, G. Structural Insights into the UbiD Protein Family from the Crystal

Structure of PA0254 from Pseudomonas Aeruginosa. PLoS ONE 2013, 8, e63161. [CrossRef]

Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2-A Multiple Sequence Alignment Editor

and Analysis Workbench. Bioinformatics 2009, 25, 1189–1191. [CrossRef]

Katoh, K.; Rozewicki, J.; Yamada, K.D. MAFFT Online Service: Multiple Sequence Alignment, Interactive Sequence Choice and

Visualization. Brief. Bioinform. 2019, 20, 1160–1166. [CrossRef]

Zhou, W.; Forouhar, F.; Seetharaman, J.; Fang, Y.; Xiao, R.; Cunningham, K.; Ma, L.-C.; Chen, C.X.; Acton, T.B.; Montelione, G.T.;

et al. Crystal Structure of 3-octaprenyl-4-hydroxybenzoate decarboxylase (UbiD) from Escherichia coli, Northeast Structural

Genomics Target ER459. 2006. Available online: https://www.wwpdb.org/pdb?id=pdb_00002idb (accessed on 28 May 2023).

Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank.

Nucleic Acids Res. 2000, 28, 235–242. [CrossRef]

Blum, M.; Chang, H.Y.; Chuguransky, S.; Grego, T.; Kandasaamy, S.; Mitchell, A.; Nuka, G.; Paysan-Lafosse, T.; Qureshi, M.; Raj,

S.; et al. The InterPro Protein Families and Domains Database: 20 Years On. Nucleic Acids Res. 2021, 49, D344–D354. [CrossRef]

[PubMed]

Zheng, K.; Zhang, X.L.; Wang, L.; You, Z.H.; Ji, B.Y.; Liang, X.; Li, Z.-W. SPRDA: A link prediction approach based on the

structural perturbation to infer disease-associated Piwi-interacting RNAs. Brief Bioinform. 2023, 24, bbac498. [CrossRef] [PubMed]

Zhang, H.Y.; Wang, L.; You, Z.H.; Hu, L.; Zhao, B.W.; Li, Z.W.; Li, Y.-M. iGRLCDA: Identifying circRNA–disease association

based on graph representation learning. Brief Bioinform. 2022, 23, bbac083. [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual

author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to

people or property resulting from any ideas, methods, instructions or products referred to in the content.

...

参考文献をもっと見る

全国の大学の
卒論・修論・学位論文

一発検索!

この論文の関連論文を見る