机构:[1]Dept.of Electrical Engineering & Computer Science York University Toronto, Canada[2]Guangdong Prov Acad Chinese Med Sci Guangzhou Univ Chinese Med Guangzhou, China广东省中医院[3]School of Medical Information Engineering Guangzhou Univ Chinese Med Guangzhou, China[4]School of Information Technology York University Toronto, Canada[5]Guangdong Prov Acad Chinese Med Sci Guangzhou Univ Chinese Med Guangzhou, China广东省中医院[6]Dapasoft INC. Toronto, Canada
Patient entity recognition or patient entity extraction is to detect the relevant electronic health records (EHRs) across multiple data sources belonging to an identical patient, and to link the relevant data together. Patient entity recognition is a useful technology for cross-system electronic health data analysis to define commonality, to synthesize multiple data sources, and to reduce data redundancy. In this paper, we propose a deep learning solution, a sequential LSTM + Word Embedding network model (WE + LSTM) to filter and represent the non-structured electronic health records by measuring their context similarity and link them to the identical patient entities. The text context features are at first filtered by a trained bidirectional LSTM network to filter the irrelevant patient entities, and the related patient information context is estimated by a trained shallow word embedding network for its word vector similarity with the existing entities in the database. Finally, the new input patient data will be linked to the existing patient entity in the dataset with the greatest context similarity. Our hypothesis is that the records pointing to the identical patient have closest context similarity, so the patterns can be encoded by a trained word embedding network. An infection disease registration dataset (5304 patient entities) is used to evaluate the performance of the proposed WE+LSTM model. The classification accuracy is 0.837 and the F score is 0.843, which is the highest compared to the comparators including a single word embedding model, a random forest model, and a conventional neural network model. In addition, the WE + LSTM model has the greatest AUG area when the ROC of the four models are compared. This result indicates the proposed WE + LSTM model provides a feasible solution to correctly recognize the patient identities from electronic records by measure the text context similarity. It provides a solution for patient identity recognition through multi-source health big data integration, which is an urgent task for health big data projects.
基金:
National Natural Science Foundation of ChinaNational Natural Science Foundation of China (NSFC) [81573827]
语种:
外文
被引次数:
WOS:
第一作者:
第一作者机构:[1]Dept.of Electrical Engineering & Computer Science York University Toronto, Canada
共同第一作者:
推荐引用方式(GB/T 7714):
Liang Zhaohui,Liu Jun,Zhang Honglai,et al.Patient Entity Recognition by Automatic EHR Context Understanding and Deep Learning[J].2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM).2019,1096-1099.
APA:
Liang, Zhaohui,Liu, Jun,Zhang, Honglai,Huang, Jimmy Xiangji,Li, Ziping&Chan, Stephen.(2019).Patient Entity Recognition by Automatic EHR Context Understanding and Deep Learning.2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM),,
MLA:
Liang, Zhaohui,et al."Patient Entity Recognition by Automatic EHR Context Understanding and Deep Learning".2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) .(2019):1096-1099