高级检索
当前位置: 首页 > 详情页

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]BGI-Shenzhen, Shenzhen 518083 [2]Geneplus-Beijing, Beijing 102206 [3]Department of Oncology, Fujian Medical University Union Hospital, Fuzhou 350001 [4]Fujian Key Laboratory of Translational Cancer Medicine, Fuzhou 350014 [5]Department of Stem Cell Research Institute, Fujian Medical University Stem Cell Research Institute, Fuzhou 350000 [6]Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha 410073 [7]Intel China Ltd., Shanghai 200336 [8]Guangdong Provincial Hospital of Chinese Medicine, Guangzhou 510120 [9]Department of Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong [10]James D. Watson Institute of Genome Sciences, Hangzhou 310058, China
出处:
ISSN:

关键词: high-throughput sequencing quality control preprocessing MapReduce

摘要:
Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster. We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a similar to 30x NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved similar to 5.7 times the fastest speed of other tools.

基金:

基金编号: 2013YZ0002-2 2015J01397

语种:
高被引:
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2016]版:
大类 | 2 区 综合性期刊
小类 | 2 区 综合性期刊
最新[2025]版:
大类 | 2 区 生物学
小类 | 2 区 综合性期刊
JCR分区:
出版当年[2015]版:
Q1 MULTIDISCIPLINARY SCIENCES
最新[2023]版:
Q1 MULTIDISCIPLINARY SCIENCES

影响因子: 最新[2023版] 最新五年平均 出版当年[2015版] 出版当年五年平均 出版前一年[2014版] 出版后一年[2016版]

第一作者:
第一作者机构: [1]BGI-Shenzhen, Shenzhen 518083
共同第一作者:
通讯作者:
通讯机构: [1]BGI-Shenzhen, Shenzhen 518083 [3]Department of Oncology, Fujian Medical University Union Hospital, Fuzhou 350001 [4]Fujian Key Laboratory of Translational Cancer Medicine, Fuzhou 350014 [5]Department of Stem Cell Research Institute, Fujian Medical University Stem Cell Research Institute, Fuzhou 350000 [6]Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha 410073
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:2018 今日访问量:0 总访问量:645 更新日期:2024-07-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 广东省中医院 技术支持:重庆聚合科技有限公司 地址:广州市越秀区大德路111号