リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text

You, Ronghui Liu, Yuxuan Mamitsuka, Hiroshi Zhu, Shanfeng 京都大学 DOI:10.1093/bioinformatics/btaa837

2021.03.01

概要

[Motivation] With the rapid increase of biomedical articles, large-scale automatic Medical Subject Headings (MeSH) indexing has become increasingly important. FullMeSH, the only method for large-scale MeSH indexing with full text, suffers from three major drawbacks: FullMeSH (i) uses Learning To Rank, which is time-consuming, (ii) can capture some pre-defined sections only in full text and (iii) ignores the whole MEDLINE database.[Results] We propose a computationally lighter, full text and deep-learning-based MeSH indexing method, BERTMeSH, which is flexible for section organization in full text. BERTMeSH has two technologies: (i) the state-of-the-art pre-trained deep contextual representation, Bidirectional Encoder Representations from Transformers (BERT), which makes BERTMeSH capture deep semantics of full text. (ii) A transfer learning strategy for using both full text in PubMed Central (PMC) and title and abstract (only and no full text) in MEDLINE, to take advantages of both. In our experiments, BERTMeSH was pre-trained with 3 million MEDLINE citations and trained on ∼1.5 million full texts in PMC. BERTMeSH outperformed various cutting-edge baselines. For example, for 20 K test articles of PMC, BERTMeSH achieved a Micro F-measure of 69.2%, which was 6.3% higher than FullMeSH with the difference being statistically significant. Also prediction of 20 K test articles needed 5 min by BERTMeSH, while it took more than 10 h by FullMeSH, proving the computational efficiency of BERTMeSH.

この論文で使われている画像

参考文献

Aronson, A. et al. (2004). The NLM indexing initiative’s Medical Text Indexer. Stud

Health Technol Inform, 107(Pt 1), 268–272.

Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document

transformer. arXiv preprint arXiv:2004.05150.

Burns, G., Li, X., and Peng, N. (2019). Building deep learning models for evidence

classification from the open access biomedical literature. Database, 2019, baz034.

Dai, S. et al. (2020). FullMeSH: improving large-scale MeSH indexing with full

text. Bioinformatics, 36(5), 1533–1541.

Devlin, J. et al. (2019). BERT: Pre-training of deep bidirectional transformers for

language understanding. In NAACL2019, pages 4171–4186.

Gu, J. et al. (2013). Efficient semisupervised MEDLINE document clustering

with MeSH-semantic and global-content constraints. IEEE Transactions on

Cybernetics, 43(4), 1265–1276.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural

computation, 9(8), 1735–1780.

Huang, X. et al. (2011). Enhanced clustering of biomedical documents using

ensemble non-negative matrix factorization. Information Sciences, 181(11),

2293–2302.

Jin, Q. et al. (2018). AttentionMesH: Simple, effective and interpretable automatic

mesh indexer. In BioASQ2018, pages 47–56.

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv

preprint arXiv:1412.6980.

Lee, J. et al. (2020). BioBERT: pre-trained biomedical language representation model

for biomedical text mining. Bioinformatics, 36(4), 1234–1240.

Li, H. (2011). A short introduction to learning to rank. IEICE Transactions, 94-D(10),

1854–1862.

Liu, K. et al. (2015). MeSHLabeler: improving the accuracy of large-scale MeSH

indexing by integrating diverse evidence. Bioinformatics, 31(12), i339–i347.

Lu, Z. et al. (2009). Evaluation of query expansion using MeSH in PubMed.

Information retrieval, 12(1), 69–80.

Mao, Y. and Lu, Z. (2017). MeSH Now: automatic MeSH indexing at PubMed scale

via learning to rank. Journal of biomedical semantics, 8(1), 15.

Mikolov, T. et al. (2013). Distributed representations of words and phrases and their

compositionality. In NIPS2013, pages 3111–3119.

Mork, J. et al. (2017). 12 years on–is the NLM Medical Text Indexer still useful and

relevant? Journal of biomedical semantics, 8(1), 8.

Mork, J. G., Jimeno-Yepes, A., and Aronson, A. R. (2013). The NLM Medical Text

Indexer system for indexing biomedical literature. In BioASQ@ CLEF.

Peng, S. et al. (2016). DeepMeSH: deep semantic representation for improving

large-scale MeSH indexing. Bioinformatics, 32(12), i70–i79.

Peng, Y., Yan, S., and Lu, Z. (2019). Transfer learning in biomedical natural language

processing: An evaluation of bert and elmo on ten benchmarking datasets. In

Proceedings of the 2019 Workshop on Biomedical Natural Language Processing

(BioNLP 2019), pages 58–65.

Pennington, J., Socher, R., and Manning, C. (2014). GloVe: Global vectors for word

representation. In EMNLP2014, pages 1532–1543.

Peters, M. et al. (2018). Deep contextualized word representations. In NAACL2018,

pages 2227–2237.

Pillai, I., Fumera, G., and Roli, F. (2013). Threshold optimisation for multi-label

classifiers. Pattern Recognition, 46(7), 2055–2065.

Sayers, E. W. et al. (2020). Database resources of the National Center for

Biotechnology Information. Nucleic acids research, 48(D1), D9–D16.

Stokes, N. et al. (2009). Exploring criteria for successful query expansion in the

genomic domain. Information retrieval, 12(1), 17–50.

Tsatsaronis, G. et al. (2015). An overview of the BIOASQ large-scale biomedical

semantic indexing and question answering competition. BMC Bioinformatics, 16,

138.

Tsoumakas, G. et al. (2013). Large-scale semantic indexing of biomedical

publications at BioASQ. In BioASQ workshop.

Xun, G. et al. (2019). MeSHProbeNet: a self-attentive probe net for MeSH indexing.

Bioinformatics, 35(19), 3794–3802.

Zhu, S. et al. (2009). Enhancing MEDLINE document clustering by incorporating

MeSH semantic similarity. Bioinformatics, 25(15), 1944–1951.

“output” — 2022/7/16 — page 9 — #9

...

参考文献をもっと見る

全国の大学の
卒論・修論・学位論文

一発検索!

この論文の関連論文を見る