Medical concept normalization aims to construct a semantic mapping between mentions and concepts and to uniformly represent mentions that belong to the same concept. In the large-scale biological literature database, a fast concept normalization method is essential to process a large number of requests and literature. To this end, we propose a hierarchical concept normalization method, named FastMCN, with much lower computational cost and a variant of transformer encoder, named stack and index optimized self-attention (SISA), to improve the efficiency and performance. In training, FastMCN uses SISA as a word encoder to encode word representations from character sequences and uses a mention encoder which summarizes the word representations to represent a mention. In inference, FastMCN indexes and summarizes word representations to represent a query mention and output the concept of the mention which with the maximum cosine similarity. To further improve the performance, SISA was pre-trained using the continuous bag-of-words architecture with 18.6 million PubMed abstracts. All experiments were evaluated on two publicly available datasets: NCBI disease and BC5CDR disease. The results showed that SISA was three times faster than the transform encoder for encoding word representations and had better performance. Benefiting from SISA, FastMCN was efficient in both training and inference, i.e. it achieved the peak performance of most of the baseline methods within 30 s and was 3000-5600 times faster than the state-of-the-art method in inference.
基金:
National Natural Science Foundation
of China (Grant No. 61871141), Natural Science Foundation of
Guangdong Province (Grant No. 2021A1515011339), Collaborative
Innovation Team of Guangzhou University of Traditional Chinese
Medicine (Grant No. 2021XK08).
第一作者机构:[1]South China Normal Univ, Sch Comp Sci, 55 Zhongshan Ave West, Guangzhou 510631, Peoples R China
通讯作者:
推荐引用方式(GB/T 7714):
Liang Likeng,Hao Tianyong,Zhan Choujun,et al.Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention[J].NEURAL COMPUTING & APPLICATIONS.2022,34(19):16311-16324.doi:10.1007/s00521-022-07228-y.
APA:
Liang, Likeng,Hao, Tianyong,Zhan, Choujun,Qiu, Hong,Wang, Fu Lee...&Qu, Yingying.(2022).Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention.NEURAL COMPUTING & APPLICATIONS,34,(19)
MLA:
Liang, Likeng,et al."Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention".NEURAL COMPUTING & APPLICATIONS 34..19(2022):16311-16324