PLEDGER: Embedded Whole Genome Read Mapping using Algorithm-HW Co-design and Memory-aware Implementation

Sidharth Maheshwari1,a, Rishad Shafik1,b, Ian Wilson1, Alex Yakovlev1, Venkateshwarlu Y. Gudur2 and Amit Acharyya2
1Newcastle University, Newcastle upon Tyne, UK
aS.Maheshwari2@newcastle.ac.uk
bRishad.Shafik@newcastle.ac.uk
2Department of Electrical Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India

ABSTRACT


With over 6000 known genetic disorders, genomics is a key driver to transform the current generation of healthcare from reactive to personalized, predictive, preventive and participatory (P4) form. High throughput sequencing technologies produce large volumes of genomic data, making genome reassembly and analysis computationally expensive in terms of performance and energy. In this paper, we propose an algorithmhardware co-design driven acceleration approach for enabling translational genomics. Core to our approach is a Pyopencl based tooL for gEnomic workloaDs tarGeting Embedded platforms (PLEDGER). PLEDGER is a scalable, portable and energyefficient solution to genomics targeting low-cost embedded platforms. It is a read mapping tool to reassemble genome, which is a crucial prerequisite to genomics. Using bit-vectors and variable level optimisations, we propose a low-memory footprint, dynamic programming based filtration and verification kernel capable of accelerated parallel heterogeneous executions. We demonstrate, for the first time, mapping of real reads to whole human genome on a memory-restricted embedded platform using novel memory-aware preprocessed data structures. We compare the performance and accuracy of PLEDGER with state-of-theart RazerS3, Hobbes3, CORAL and REPUTE on two systems: 1) Intel i7-8750H CPU + Nvidia GTX 1050 Ti; 2) Odroid N2 with 6 cores: 4×Cortex-A73 + 2×Cortex-A53 and Mali GPU. PLEDGER demonstrates persistent energy and accuracy advantages compared to state-of-the-art read mappers producing up to 11× speedups and 5.9× energy savings compared to stateof- the-art hardware resources.

Keywords: OpenCL, Embedded Genomics, Read Mapping, Heterogeneous Computing, Low-Memory Footprint, Energy Efficient.



Full Text (PDF)