高级检索
当前位置: 首页 > 详情页

Fast medical concept normalization for biomedical literature based on stack and index optimized self-attention

文献详情

资源类型:
WOS体系:

收录情况: ◇ SCIE ◇ CPCI(ISTP)

机构: [1]South China Normal Univ, Sch Comp Sci, 55 Zhongshan Ave West, Guangzhou 510631, Peoples R China [2]Nanfang Coll, Sch Elect & Comp Engn, 882 Wenquan Ave, Guangzhou 510970, Peoples R China [3]Shanghai Mental Hlth Ctr, Informat & Stat Dept, 600 Wanping South Rd, Shanghai 200030, Peoples R China [4]Hong Kong Metropolitan Univ, Sch Sci & Technol, Homantin, Hong Kong 999077, Peoples R China [5]Yidu Cloud Beijing Technol Co Ltd, AI Lab, 35 Huayuan North Rd, Beijing 100191, Peoples R China [6]Guangzhou Univ Chinese Med, Affiliated Hosp 2, State Key Lab Dampness Syndrome Chinese Med, 111 Dade Rd, Guangzhou 510120, Peoples R China [7]Guangdong Univ Foreign Studies, Business Coll, 2 Baiyun Ave North, Guangzhou 510420, Peoples R China
出处:
ISSN:

关键词: Medical concept normalization Transformer Distributed word embedding

摘要:
Medical concept normalization aims to construct a semantic mapping between mentions and concepts and to uniformly represent mentions that belong to the same concept. In the large-scale biological literature database, a fast concept normalization method is essential to process a large number of requests and literature. To this end, we propose a hierarchical concept normalization method, named FastMCN, with much lower computational cost and a variant of transformer encoder, named stack and index optimized self-attention (SISA), to improve the efficiency and performance. In training, FastMCN uses SISA as a word encoder to encode word representations from character sequences and uses a mention encoder which summarizes the word representations to represent a mention. In inference, FastMCN indexes and summarizes word representations to represent a query mention and output the concept of the mention which with the maximum cosine similarity. To further improve the performance, SISA was pre-trained using the continuous bag-of-words architecture with 18.6 million PubMed abstracts. All experiments were evaluated on two publicly available datasets: NCBI disease and BC5CDR disease. The results showed that SISA was three times faster than the transform encoder for encoding word representations and had better performance. Benefiting from SISA, FastMCN was efficient in both training and inference, i.e. it achieved the peak performance of most of the baseline methods within 30 s and was 3000-5600 times faster than the state-of-the-art method in inference.

基金:
语种:
WOS:
中科院(CAS)分区:
出版当年[2021]版:
大类 | 3 区 计算机科学
小类 | 3 区 计算机:人工智能
最新[2025]版:
JCR分区:
出版当年[2020]版:
Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
最新[2023]版:
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

影响因子: 最新[2023版] 最新五年平均 出版当年[2020版] 出版当年五年平均 出版前一年[2019版] 出版后一年[2021版]

第一作者:
第一作者机构: [1]South China Normal Univ, Sch Comp Sci, 55 Zhongshan Ave West, Guangzhou 510631, Peoples R China
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:2018 今日访问量:0 总访问量:645 更新日期:2024-07-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 广东省中医院 技术支持:重庆聚合科技有限公司 地址:广州市越秀区大德路111号