リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships

Pan, Tong Li, Chen Bi, Yue Wang, Zhikang Gasser, Robin B Purcell, Anthony W Akutsu, Tatsuya Webb, Geoffrey I Imoto, Seiya Song, Jiangning 京都大学 DOI:10.1093/bioinformatics/btad094

2023.03

概要

MOTIVATION: The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. RESULTS: Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. AVAILABILITY AND IMPLEMENTATION: PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

この論文で使われている画像

参考文献

Bairoch,A. (2000) The ENZYME database in 2000. Nucleic Acids Res., 28,

304–305.

Cai,C.Z. et al. (2003) Protein function classification via support vector machine approach. Math. Biosci., 185, 111–122.

Cao,Y. and Shen,Y. (2021) TALE: transformer-based protein function annotation with joint sequence–label embedding. Bioinformatics, 37, 2825–2833.

Chen,X. and Ishwaran,H. (2012) Random forests for genomic data analysis.

Genomics, 99, 323–329.

Das,S. et al. (2015) Functional classification of CATH superfamilies: a

domain-based approach for protein function annotation. Bioinformatics,

31, 3460–3467.

Day-Richter,J. et al. (2007) OBO-Edit—an ontology editor for biologists.

Bioinformatics, 23, 2198–2200.

Downloaded from https://academic.oup.com/bioinformatics/article/39/3/btad094/7043095 by Kyoto Daigaku Bungakubu Toshokan user on 10 January 2024

Fig. 4. PFresGO locates functional residues based on attention weights: (a) attention weights of rat a-parvalbumin (PDB: 1S3P, Chain A) with function calcium ion binding

(GO: 0005509), the dots correspond to calcium-binding residues annotated in BioLip; (b) attention weights of lactose operon repressor (PDB: 2PE5, Chain B) with function

DNA binding (GO: 0003677); (c) ROC curves of residues identified by attention weights and functional residues of protein examples retrieved from BioLip; and (d) an example of the percentage of attention on binding sites. The left, medium and right bars show the percentage of attention of every head in attention Layer 1, Layer 2 and the maximum percentage of each head, respectively

Lichtarge,O. et al. (1996) An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol., 257, 342–358.

Merino,G.A. et al. (2022) Hierarchical deep learning for predicting GO annotations by integrating protein knowledge. Bioinformatics, 38, 4488–4496.

Ng,A. et al. (2011) Sparse Autoencoder. CS294A Lecture Notes, 72, 1–19.

Ouzounis,C.A. et al. (2003) Classification schemes for protein structure and

function. Nat. Rev. Genet., 4, 508–519.

Sapoval,N. et al. (2022) Current progress and open challenges for applying

deep learning across the biosciences. Nat. Commun., 13, 1728.

Schaeffer,R.D. et al. (2017) ECOD: new developments in the evolutionary

classification of domains. Nucleic Acids Res., 45, D296–D302.

Sharma,V.S. et al. (2022) PCfun: a hybrid computational framework for systematic characterization of protein complex function. Brief. Bioinform., 23,

bbac239.

Sureyya Rifaioglu,A. et al. (2019) DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks. Sci. Rep., 9, 7344.

The Gene Ontology Consortium. (2008) The gene ontology project in 2008.

Nucleic Acids Res., 36(Database issue), D440–D444.

The UniProt Consortium. (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res., 49, D480–D489.

Yang,J. et al. (2013) BioLiP: a semi-manually curated database for biologically

relevant ligand-protein interactions. Nucleic Acids Res., 41, D1096–D1103.

Ye,J. et al. (2006) BLAST: improvements for better sequence analysis. Nucleic

Acids Res., 34, W6–W9.

Downloaded from https://academic.oup.com/bioinformatics/article/39/3/btad094/7043095 by Kyoto Daigaku Bungakubu Toshokan user on 10 January 2024

Duong,D. et al. (2020) Annotating Gene Ontology terms for protein sequences

with the Transformer model. bioRxiv.

Edera,A.A. et al. (2022) Anc2vec: embedding gene ontology terms by preserving ancestors relationships. Brief. Bioinform., 23.

Edgar,R.C. and Batzoglou,S. (2006) Multiple sequence alignment. Curr. Opin.

Struct. Biol., 16, 368–373.

Elnaggar,A. et al. (2021) ProtTrans: towards cracking the language of life’s

code through self-supervised learning. bioRxiv.

Fu,L. et al. (2012) CD-HIT: accelerated for clustering the next-generation

sequencing data. Bioinformatics, 28, 3150–3152.

Gligorijevic,V. et al. (2021) Structure-based protein function prediction using

graph convolutional networks. Nat. Commun., 12, 3168.

Hasin,Y. et al. (2017) Multi-omics approaches to disease. Genome Biol., 18,

83.

Kanehisa,M. et al. (2021) KEGG: integrating viruses and cellular organisms.

Nucleic Acids Res., 49, D545–D551.

Kulmanov,M. and Hoehndorf,R. (2022) DeepGOZero: improving protein

function prediction from sequence and zero-shot learning based on ontology

axioms. Bioinformatics, 38, i238–i245.

Kulmanov,M. et al. (2018) DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

Bioinformatics, 34, 660–668.

Lee,D. et al. (2007) Predicting protein function from sequence and structure.

Nat. Rev. Mol. Cell Biol., 8, 995–1005.

T.Pan et al.

...

参考文献をもっと見る

全国の大学の
卒論・修論・学位論文

一発検索!

この論文の関連論文を見る