Bairoch,A. (2000) The ENZYME database in 2000. Nucleic Acids Res., 28,
304–305.
Cai,C.Z. et al. (2003) Protein function classification via support vector machine approach. Math. Biosci., 185, 111–122.
Cao,Y. and Shen,Y. (2021) TALE: transformer-based protein function annotation with joint sequence–label embedding. Bioinformatics, 37, 2825–2833.
Chen,X. and Ishwaran,H. (2012) Random forests for genomic data analysis.
Genomics, 99, 323–329.
Das,S. et al. (2015) Functional classification of CATH superfamilies: a
domain-based approach for protein function annotation. Bioinformatics,
31, 3460–3467.
Day-Richter,J. et al. (2007) OBO-Edit—an ontology editor for biologists.
Bioinformatics, 23, 2198–2200.
Downloaded from https://academic.oup.com/bioinformatics/article/39/3/btad094/7043095 by Kyoto Daigaku Bungakubu Toshokan user on 10 January 2024
Fig. 4. PFresGO locates functional residues based on attention weights: (a) attention weights of rat a-parvalbumin (PDB: 1S3P, Chain A) with function calcium ion binding
(GO: 0005509), the dots correspond to calcium-binding residues annotated in BioLip; (b) attention weights of lactose operon repressor (PDB: 2PE5, Chain B) with function
DNA binding (GO: 0003677); (c) ROC curves of residues identified by attention weights and functional residues of protein examples retrieved from BioLip; and (d) an example of the percentage of attention on binding sites. The left, medium and right bars show the percentage of attention of every head in attention Layer 1, Layer 2 and the maximum percentage of each head, respectively
Lichtarge,O. et al. (1996) An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol., 257, 342–358.
Merino,G.A. et al. (2022) Hierarchical deep learning for predicting GO annotations by integrating protein knowledge. Bioinformatics, 38, 4488–4496.
Ng,A. et al. (2011) Sparse Autoencoder. CS294A Lecture Notes, 72, 1–19.
Ouzounis,C.A. et al. (2003) Classification schemes for protein structure and
function. Nat. Rev. Genet., 4, 508–519.
Sapoval,N. et al. (2022) Current progress and open challenges for applying
deep learning across the biosciences. Nat. Commun., 13, 1728.
Schaeffer,R.D. et al. (2017) ECOD: new developments in the evolutionary
classification of domains. Nucleic Acids Res., 45, D296–D302.
Sharma,V.S. et al. (2022) PCfun: a hybrid computational framework for systematic characterization of protein complex function. Brief. Bioinform., 23,
bbac239.
Sureyya Rifaioglu,A. et al. (2019) DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks. Sci. Rep., 9, 7344.
The Gene Ontology Consortium. (2008) The gene ontology project in 2008.
Nucleic Acids Res., 36(Database issue), D440–D444.
The UniProt Consortium. (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res., 49, D480–D489.
Yang,J. et al. (2013) BioLiP: a semi-manually curated database for biologically
relevant ligand-protein interactions. Nucleic Acids Res., 41, D1096–D1103.
Ye,J. et al. (2006) BLAST: improvements for better sequence analysis. Nucleic
Acids Res., 34, W6–W9.
Downloaded from https://academic.oup.com/bioinformatics/article/39/3/btad094/7043095 by Kyoto Daigaku Bungakubu Toshokan user on 10 January 2024
Duong,D. et al. (2020) Annotating Gene Ontology terms for protein sequences
with the Transformer model. bioRxiv.
Edera,A.A. et al. (2022) Anc2vec: embedding gene ontology terms by preserving ancestors relationships. Brief. Bioinform., 23.
Edgar,R.C. and Batzoglou,S. (2006) Multiple sequence alignment. Curr. Opin.
Struct. Biol., 16, 368–373.
Elnaggar,A. et al. (2021) ProtTrans: towards cracking the language of life’s
code through self-supervised learning. bioRxiv.
Fu,L. et al. (2012) CD-HIT: accelerated for clustering the next-generation
sequencing data. Bioinformatics, 28, 3150–3152.
Gligorijevic,V. et al. (2021) Structure-based protein function prediction using
graph convolutional networks. Nat. Commun., 12, 3168.
Hasin,Y. et al. (2017) Multi-omics approaches to disease. Genome Biol., 18,
83.
Kanehisa,M. et al. (2021) KEGG: integrating viruses and cellular organisms.
Nucleic Acids Res., 49, D545–D551.
Kulmanov,M. and Hoehndorf,R. (2022) DeepGOZero: improving protein
function prediction from sequence and zero-shot learning based on ontology
axioms. Bioinformatics, 38, i238–i245.
Kulmanov,M. et al. (2018) DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.
Bioinformatics, 34, 660–668.
Lee,D. et al. (2007) Predicting protein function from sequence and structure.
Nat. Rev. Mol. Cell Biol., 8, 995–1005.
T.Pan et al.
...