高级检索
当前位置: 首页 > 详情页

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology

文献详情

资源类型:
WOS体系:

收录情况: ◇ SCIE ◇ EI

机构: [1]Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA [2]The Second Clinical College Guangzhou University of Chinese Medicine, China [3]Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
出处:
ISSN:

关键词: Enriched node embeddings Human Phenotype Ontology Heterogeneous knowledge resources Phenotypic relevance detection Deep phenotyping

摘要:
Background: In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph. Methods: In this study, we developed a framework, HPO2Vec +, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, no-de2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes. Results: The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships. Conclusion: We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).

基金:
语种:
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2018]版:
大类 | 3 区 医学
小类 | 3 区 计算机:跨学科应用 3 区 医学:信息
最新[2025]版:
大类 | 2 区 医学
小类 | 3 区 计算机:跨学科应用 3 区 医学:信息
JCR分区:
出版当年[2017]版:
Q2 MEDICAL INFORMATICS Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
最新[2023]版:
Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Q2 MEDICAL INFORMATICS

影响因子: 最新[2023版] 最新五年平均 出版当年[2017版] 出版当年五年平均 出版前一年[2016版] 出版后一年[2018版]

第一作者:
第一作者机构: [1]Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA [*1]Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55901, USA.
通讯作者:
通讯机构: [1]Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA [*1]Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55901, USA.
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:2018 今日访问量:0 总访问量:645 更新日期:2024-07-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 广东省中医院 技术支持:重庆聚合科技有限公司 地址:广州市越秀区大德路111号