リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「HPOFiller: identifying missing protein–phenotype associations by graph convolutional network」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

HPOFiller: identifying missing protein–phenotype associations by graph convolutional network

Liu, Lizhi Mamitsuka, Hiroshi Zhu, Shanfeng 京都大学 DOI:10.1093/bioinformatics/btab224

2021.10.01

概要

[Motivation] Exploring the relationship between human proteins and abnormal phenotypes is of great importance in the prevention, diagnosis and treatment of diseases. The human phenotype ontology (HPO) is a standardized vocabulary that describes the phenotype abnormalities encountered in human diseases. However, the current HPO annotations of proteins are not complete. Thus, it is important to identify missing protein–phenotype associations.[Results] We propose HPOFiller, a graph convolutional network (GCN)-based approach, for predicting missing HPO annotations. HPOFiller has two key GCN components for capturing embeddings from complex network structures: (i) S-GCN for both protein–protein interaction network and HPO semantic similarity network to utilize network weights; (ii) Bi-GCN for the protein–phenotype bipartite graph to conduct message passing between proteins and phenotypes. The core idea of HPOFiller is to repeat run these two GCN modules consecutively over the three networks, to refine the embeddings. Empirical results of extremely stringent evaluation avoiding potential information leakage including cross-validation and temporal validation demonstrates that HPOFiller significantly outperforms all other state-of-the-art methods. In particular, the ablation study shows that batch normalization contributes the most to the performance. The further examination offers literature evidence for highly ranked predictions. Finally using known disease-HPO term associations, HPOFiller could suggest promising, unknown disease–gene associations, presenting possible genetic causes of human disorders.

この論文で使われている画像

参考文献

Ahluwalia, M. et al. (2018). Epidermal growth factor receptor tyrosine kinase

inhibitors for central nervous system metastases from non-small cell lung cancer.

Oncologist., 23(10), 1199.

Caponio, V. et al. (2020). Computational analysis of TP53 mutational landscape

unveils key prognostic signatures and distinct pathobiological pathways in head

and neck squamous cell cancer. Br. J. Cancer, 123(8), 1302–1314.

Defferrard, M. et al. (2016). Convolutional Neural Networks on Graphs with

Fast Localized Spectral Filtering. In Advances in Neural Information Processing

Systems 29: Annual Conference on Neural Information Processing Systems 2016,

December 5-10, 2016, Barcelona, Spain, pages 3837–3845.

Gao, J. et al. (2018). AiProAnnotator: Low-rank Approximation with network

side information for high-performance, large-scale human Protein abnormality

Annotator. In IEEE International Conference on Bioinformatics and Biomedicine,

BIBM 2018, Madrid, Spain, December 3-6, 2018, pages 13–20. IEEE Computer

Society.

Goh, K. et al. (2007). The human disease network. Proc. Natl. Acad. Sci. U. S. A.,

104(21), 8685–8690.

Han, P. et al. (2019). GCN-MF: Disease-Gene Association Identification By Graph

Convolutional Networks and Matrix Factorization. In Proceedings of the 25th ACM

SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD

2019, Anchorage, AK, USA, August 4-8, 2019, pages 705–713. ACM.

Ioffe, S. and Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network

Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd

International Conference on Machine Learning, ICML 2015, Lille, France, 611 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages

448–456. JMLR.org.

Jiang, Y. et al. (2016). An expanded evaluation of protein function prediction methods

shows an improvement in accuracy. Genome Biol., 17(1), 1–19.

Kahanda, I. et al. (2015). PHENOstruct: Prediction of human phenotype ontology

terms using heterogeneous data sources. F1000Res., 4, 259.

Kamilaris, C. and Stratakis, C. (2019). Multiple Endocrine Neoplasia Type 1

(MEN1): An Update and the Significance of Early Genetic and Clinical Diagnosis.

Front. Endocrinol., 10, 339.

Kipf, T. and Welling, M. (2017). Semi-Supervised Classification with Graph

Convolutional Networks.

In 5th International Conference on Learning

“output” — 2022/7/16 — page 8 — #8

A Self-archived copy in

Kyoto University Research Information Repository

https://repository.kulib.kyoto-u.ac.jp

HPOFiller

Table 6. Top disease-gene associations found by HPOFiller that are newly added to the latest OMIM database

Rank Protein ID Gene

114

P05231

1323

4032

Q30201

P05164

IL6

Protein name

HPO term ID

Interleukin-6

Cerebral arteriovenous

Arteriovenous malformations

HP:0002408

OMIM:108010

malformation

of the brain (BAVM)

HFE Hereditary hemochromatosis protein HP:0000726

MPO

Myeloperoxidase

HP:0002423

HPO term name

Dementia

Long-tract signs

Disease ID

OMIM:104300

Disease name

Alzheimer disease (AD)

Note: ‘HPO term’ refers to the predicted missing HPO annotation of corresponding protein by HPOFiller.

Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track

Proceedings. OpenReview.net.

Köhler, C. et al. (2015). Infantile Manifestation of a Mitochondriopathy due to a

Homozygous Mutation in DARS2 Gene. Neuropediatrics, 46(S 01), FV02–07.

Köhler, S. et al. (2009). Clinical Diagnostics in Human Genetics with Semantic

Similarity Searches in Ontologies. Am. J. Hum. Genet., 85(4), 457–464.

Köhler, S. et al. (2019). Expansion of the Human Phenotype Ontology (HPO)

knowledge base and resources. Nucleic Acids Res., 47(D1), D1018–D1027.

Krichene, W. and Rendle, S. (2020). On Sampled Metrics for Item Recommendation.

In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and

Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 1748–1757.

ACM.

Li, B. et al. (2010). Effectively Integrating Information Content and Structural

Relationship to Improve the GO-based Similarity Measure Between Proteins. In

International Conference on Bioinformatics & Computational Biology, BIOCOMP

2010, July 12-15, 2010, Las Vegas Nevada, USA, 2 Volumes, pages 166–172.

CSREA Press.

Li, Y. et al. (2019). PGCN: Disease gene prioritization by disease and gene embedding

through graph convolutional neural networks. bioRxiv.

Lin, C. et al. (2008). Tissue-specific requirements of β-catenin in external genitalia

development. Development, 135(16), 2815–2825.

Liu, L. et al. (2020). HPOLabeler: improving prediction of human protein-phenotype

associations by learning to rank. Bioinform., 36(14), 4180–4188.

Long, Y. et al. (2020). Predicting human microbe-drug associations via graph

convolutional network with conditional random field. Bioinform., 36(19), 4918–

4927.

Oti, M. et al. (2006). Predicting disease genes using protein–protein interactions. J.

Med. Genet., 43(8), 691–698.

Pandya, J. et al. (2018). A correlation of immunohistochemical expression of TP53

and CDKN1A in oral epithelial dysplasia and oral squamous cell carcinoma. J.

Cancer Res. Ther., 14(3), 666.

Petegrosso, R. et al. (2017). Transfer learning across ontologies for phenome-genome

association prediction. Bioinform., 33(4), 529–536.

Radivojac, P. et al. (2013). A large-scale evaluation of computational protein function

prediction. Nat. Methods, 10(3), 221–227.

Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a

Taxonomy. In Proceedings of the Fourteenth International Joint Conference on

Artificial Intelligence, IJCAI 95, Montréal Québec, Canada, August 20-25 1995,

2 Volumes, pages 448–453. Morgan Kaufmann.

Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure

and its application to problems of ambiguity in natural language. J. Artif. Intell.

Res., 11, 95–130.

Saito, T. and Rehmsmeier, M. (2015). The precision-recall plot is more informative

than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS

One, 10(3), e0118432.

Szklarczyk, D. et al. (2019). STRING v11: protein–protein association networks with

increased coverage, supporting functional discovery in genome-wide experimental

datasets. Nucleic Acids Res., 47(D1), D607–D613.

Tong, H. et al. (2006). Fast Random Walk with Restart and Its Applications. In

Proceedings of the 6th IEEE International Conference on Data Mining (ICDM

2006), 18-22 December 2006, Hong Kong, China, pages 613–622. IEEE Computer

Society.

Wang, Z. et al. (2020). Toward heterogeneous information fusion: bipartite

graph convolutional networks for in silico drug repurposing. Bioinform.,

36(Supplement 1), i525–i533.

Zhou, D. et al. (2003). Learning with Local and Global Consistency. In Advances

in Neural Information Processing Systems 16 [Neural Information Processing

Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British

Columbia, Canada], pages 321–328. MIT Press.

Zhu, X. et al. (2003). Semi-Supervised Learning Using Gaussian Fields and

Harmonic Functions. In Machine Learning, Proceedings of the Twentieth

International Conference (ICML 2003), August 21-24, 2003, Washington, DC,

USA, pages 912–919. AAAI Press.

Zitnik, M. et al. (2018). Modeling polypharmacy side effects with graph

convolutional networks. Bioinform., 34(13), i457–i466.

“output” — 2022/7/16 — page 9 — #9

...

参考文献をもっと見る