リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites

Liu, Pengyu Song, Jiangning Lin, Chun-Yu Akutsu, Tatsuya 京都大学 DOI:10.1186/s12859-021-03993-0

2021

概要

[Background] Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and lack interpretability. Therefore, it is necessary to develop an accurate and explainable predictor, which employs relations between different sequences, to enhance the understanding of the mechanism by which human dicer cleaves pre-miRNA. [Results] In this study, we develop an accurate and explainable predictor for human dicer cleavage site – ReCGBM. We design relational features and class features as inputs to a lightGBM model. Computational experiments show that ReCGBM achieves the best performance compared to the existing methods. Further, we find that features in close proximity to the center of pre-miRNA are more important and make a significant contribution to the performance improvement of the developed method. [Conclusions] The results of this study show that ReCGBM is an interpretable and accurate predictor. Besides, the analyses of feature importance show that it might be of particular interest to consider more informative features close to the center of the pre-miRNA in future predictors.

この論文で使われている画像

参考文献

1. Tanase C, Ogrezeanu I, Badiu C, Heidelberg L. Molecular Pathology of Pituitary Adenomas. vol. 8; 2012.

2. Iorio MV, Ferracin M, Liu CG, Veronese A, Spizzo R, Sabbioni S, et al. MicroRNA gene expression deregulation in

human breast cancer. Cancer Res. 2005;65(16):7065–70.

3. Takamizawa J, Konishi H, Yanagisawa K, Tomida S, Osada H, Endoh H, et al. Reduced expression of the let-7 microR‑

NAs in human lung cancers in association with shortened postoperative survival. Cancer Res. 2004;64(11):3753–6.

4. He H, Jazdzewski K, Li W, Liyanarachchi S, Nagy R, Volinia S, et al. The role of microRNA genes in papillary thyroid

carcinoma. Proc Nat Acad Sci. 2005;102(52):19075–80.

5. Galka-Marciniak P, Urbanek-Trzeciak MO, Nawrocka PM, Dutkiewicz A, Giefing M, Lewandowska MA, et al. Somatic

mutations in miRNA genes in lung cancer-potential functional consequences of non-coding sequence variants.

Cancers. 2019;11(6):793.

6. Wee LJ, Tan TW, Ranganathan S. SVM-based prediction of caspase substrate cleavage sites. In: BMC bioinformatics.

vol. 7. Springer; 2006. p. S14.

7. Wee LJ, Tan TW, Ranganathan S. CASVM: web server for SVM-based prediction of caspase substrates cleavage sites.

Bioinformatics. 2007;23(23):3241–3.

8. Ono Y, Sorimachi H, Mamitsuka H, et al. Calpain cleavage prediction using multiple kernel learning. PLoS ONE.

2011;6(5):e19035.

9. Piippo M, Lietzén N, Nevalainen OS, Salmi J, Nyman TA. Pripper: prediction of caspase cleavage sites from whole

proteomes. BMC Bioinform. 2010;11(1):320.

10. Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, et al. Cascleave: towards more accurate prediction of caspase

substrate cleavage sites. Bioinformatics. 2010;26(6):752–60.

11. Song J, Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC, et al. PROSPER: an integrated feature-based tool for predict‑

ing protease substrate cleavage sites. PLoS ONE. 2012;7(11):e50300.

12. Wang M, Zhao XM, Tan H, Akutsu T, Whisstock JC, Song J. Cascleave 2.0, a new approach for predicting caspase and

granzyme cleavage targets. Bioinformatics. 2014;30(1):71–80.

13. Singh O, Su ECY. Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and phys‑

icochemical features. BMC Bioinformatics. 2016;17(17):478.

14. Liu Z, Yu K, Dong J, Zhao L, Liu Z, Zhang Q, et al. Precise prediction of calpain cleavage sites and their aberrance

caused by mutations in cancer. Front Genet. 2019;10:715.

15. Fan YX, Zhang Y, Shen HB. LabCaS: labeling calpain substrate cleavage sites from amino acid sequence using condi‑

tional random fields. Proteins Struct Funct Bioinf. 2013;81(4):622–34.

16. Ahmed F, Kaundal R, Raghava GP. PHDcleav: a SVM based method for predicting human Dicer cleavage sites using

sequence and secondary structure of miRNA precursors. In: BMC bioinformatics. vol. 14. BioMed Central; 2013. p. S9.

17. Bao Y, Hayashida M, Akutsu T. LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleav‑

age sites using loop/bulge length. BMC Bioinform. 2016;17(1):487.

18. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Satist. 2001;p. 1189–1232.

19. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res.

2007;36(suppl-1):D154–8.

20. Markham N, Zuker M, Keith J. UNAFold: software for nucleic acid folding and hybridization., pp. 3–31. Humana

Press,Totowa, NJ; 2008.

21. Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.

22. Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.

23. Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady.

1996;10:707–10.

Page 16 of 17

A Self-archived copy in

Kyoto University Research Information Repository

https://repository.kulib.kyoto-u.ac.jp

Liu et al. BMC Bioinformatics

(2021) 22:63

Page 17 of 17

24. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. In:

Advances in neural information processing systems. 2017;3146–54.

25. Ranka S, Singh V. CLOUDS: A decision tree classifier for large datasets. In: Proceedings of the 4th knowledge discov‑

ery and data mining conference. vol. 2; 1998. .

26. Jin R, Agrawal G. Communication and memory efficient parallel decision tree construction. In: Proceedings of the

2003 SIAM international conference on data mining. SIAM; 2003. p. 119–129.

27. Li P, Wu Q, Burges CJ. Mcrank: Learning to rank using multiple classification and gradient boosting. In: Advances in

neural information processing systems; 2008. p. 897–904.

28. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J

Mach Learn Res. 2011;12(Oct):2825–30.

29. Bellaousov S, Reuter JS, Seetin MG, Mathews DH. RNAstructure: web servers for RNA secondary structure prediction

and analysis. Nucleic Acids Res. 2013;41(W1):W471–4.

30. Leonard CW, Hajdin CE, Karabiber F, Mathews DH, Favorov OV, Dokholyan NV, et al. Principles for understanding the

accuracy of SHAPE-directed RNA structure modeling. Biochemistry. 2013;52(4):588–95.

31. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res.

2004;14(6):1188–90.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ready to submit your research ? Choose BMC and benefit from:

• fast, convenient online submission

• thorough peer review by experienced researchers in your field

• rapid publication on acceptance

• support for research data, including large and complex data types

• gold Open Access which fosters wider collaboration and increased citations

• maximum visibility for your research: over 100M website views per year

At BMC, research is always in progress.

Learn more biomedcentral.com/submissions

...

参考文献をもっと見る