2.7 Optimizing emerging applications for power-efficient computing

Time	Label	Presentation Title Authors
11:30	2.7.1	GENIEHD: EFFICIENT DNA PATTERN MATCHING ACCELERATOR USING HYPERDIMENSIONAL COMPUTING Speaker: Mohsen Imani, University of California, San Diego, US Authors: Yeseong Kim, Mohsen Imani, Niema Moshiri and Tajana Rosing, University of California, San Diego, US Abstract DNA pattern matching is widely applied in many bioinformatics applications. The increasing volume of the DNA data exacerbates runtime and power consumption to discover DNA patterns. In this paper, we propose a hardware-software codesign, called GenieHD, which efficiently parallelizes the DNA pattern-matching task. We exploit brain-inspired hyperdimensional (HD) computing which mimics pattern-based computations in human memory. We transform inherent sequential processes of the DNA pattern matching to highly-parallelizable computation tasks using HD computing. We accordingly design an accelerator architecture targeting various parallel computing platforms to effectively parallelize the HD-based DNA pattern matching while significantly reducing memory accesses. We evaluate GenieHD on practical large-size DNA datasets such as human and Escherichia Coli genomes. Our evaluation shows that GenieHD significantly accelerates the DNA matching procedure, e.g., 44.4× speedup and 54.1× higher energy efficiency as compared to state-of-the-art FPGA-based design. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.7.2	REPUTE: AN OPENCL BASED READ MAPPING TOOL FOR EMBEDDED GENOMICS Speaker: Sidharth Maheshwari, Newcastle University, GB Authors: Sidharth Maheshwari¹, Rishad Shafik¹, Alex Yakovlev¹, Ian Wilson¹ and Amit Acharyya² ¹Newcastle University, GB; ²IIT Hyderabad, IN Abstract Genomics is transforming medicine from reactive to personalized, predictive, preventive and participatory (P4). The massive amount of data produced by genomics is a major challenge as it requires extensive computational capabilities, consuming large amounts of energy. A crucial prerequisite for computational genomics is genome assembly but the existing mapping tools used are predominantly software based, optimized for homogeneous high-performance systems. In this paper, we propose an OpenCL based REad maPper for heterogeneoUs sysTEms (REPUTE), which can use diverse and parallel compute and storage devices effectively. Core to this tool are dynamic programming based filtration and verification kernel to map the reads on multiple devices, concurrently. We show hardware/ software co-design and implementations of REPUTE across different platforms, and compare it with state-of-the-art mappers. We demonstrate the performance of mappers on two systems: 1) Intel CPU + 2Nvidia GPUs; 2) HiKey970 embedded SoC with ARM Cortex-A73/A53 cores. The results show that REPUTE outperforms other read mappers in most cases producing up to 13x speedup with better or comparable accuracy. We also demonstrate that the embedded implementation can achieve up to 27x energy savings, enabling low-cost genomics. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.7.3	A FAST AND ENERGY EFFICIENT COMPUTING-IN-MEMORY ARCHITECTURE FOR FEW-SHOT LEARNING APPLICATIONS Speaker: Dayane Reis, University of Notre Dame, US Authors: Dayane Reis, Ann Franchesca Laguna, Michael Niemier and X. Sharon Hu, University of Notre Dame, US Abstract Among few-shot learning methods, prototypical networks (PNs) are one of the most popular approaches due to their excellent classification accuracies and network simplicity. Test examples are classified based on their distances from class prototypes. Despite the application-level advantages of PNs, the latency of transferring data from memory to compute units is much higher than the PN computation time. Thus, PNs performance is limited by memory bandwidth. Computing-in-memory addresses this bandwidth-bottleneck problem by bringing a subset of compute units closer to memory. In this work, we propose a CiM-PN framework that enables the computation of distance metrics and prototypes inside the memory. CiM-PN replaces the computationally intensive Euclidean distance metric by the CiM-friendly Manhattan distance metric. Additionally, prototypes are computed using an in-memory mean operation realized by accumulation and division by powers of two, which enables few-shot learning implementations where "shots" are powers of two. The CiM-PN hardware uses CMOS memory cells, as well as CMOS peripherals such as customized sense amplifiers, carry look-ahead adders, in-place copy buffers and a log bit-shifter. Compared with a GPU implementation, a CMOS-based CiM-PN achieves speedups of 2808x/111x and energy savings of 2372x/5170x at iso-accuracy for the prototype and nearest-neighbor computation, respectively, and over 2x end-to-end speedup and energy improvements. We also gain 3-14% accuracy improvement when compared to existing non-GPU hardware approaches due to the floating-point CiM operations. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session

Time

Label

Presentation Title
Authors

11:30

2.7.1

GENIEHD: EFFICIENT DNA PATTERN MATCHING ACCELERATOR USING HYPERDIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Yeseong Kim, Mohsen Imani, Niema Moshiri and Tajana Rosing, University of California, San Diego, US
Abstract
DNA pattern matching is widely applied in many bioinformatics applications. The increasing volume of the DNA data exacerbates runtime and power consumption to discover DNA patterns. In this paper, we propose a hardware-software codesign, called GenieHD, which efficiently parallelizes the DNA pattern-matching task. We exploit brain-inspired hyperdimensional (HD) computing which mimics pattern-based computations in human memory. We transform inherent sequential processes of the DNA pattern matching to highly-parallelizable computation tasks using HD computing. We accordingly design an accelerator architecture targeting various parallel computing platforms to effectively parallelize the HD-based DNA pattern matching while significantly reducing memory accesses. We evaluate GenieHD on practical large-size DNA datasets such as human and Escherichia Coli genomes. Our evaluation shows that GenieHD significantly accelerates the DNA matching procedure, e.g., 44.4× speedup and 54.1× higher energy efficiency as compared to state-of-the-art FPGA-based design.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.7.2

REPUTE: AN OPENCL BASED READ MAPPING TOOL FOR EMBEDDED GENOMICS
Speaker:
Sidharth Maheshwari, Newcastle University, GB
Authors:
Sidharth Maheshwari¹, Rishad Shafik¹, Alex Yakovlev¹, Ian Wilson¹ and Amit Acharyya²
¹Newcastle University, GB; ²IIT Hyderabad, IN
Abstract
Genomics is transforming medicine from reactive to personalized, predictive, preventive and participatory (P4). The massive amount of data produced by genomics is a major challenge as it requires extensive computational capabilities, consuming large amounts of energy. A crucial prerequisite for computational genomics is genome assembly but the existing mapping tools used are predominantly software based, optimized for homogeneous high-performance systems. In this paper, we propose an OpenCL based REad maPper for heterogeneoUs sysTEms (REPUTE), which can use diverse and parallel compute and storage devices effectively. Core to this tool are dynamic programming based filtration and verification kernel to map the reads on multiple devices, concurrently. We show hardware/ software co-design and implementations of REPUTE across different platforms, and compare it with state-of-the-art mappers. We demonstrate the performance of mappers on two systems: 1) Intel CPU + 2Nvidia GPUs; 2) HiKey970 embedded SoC with ARM Cortex-A73/A53 cores. The results show that REPUTE outperforms other read mappers in most cases producing up to 13x speedup with better or comparable accuracy. We also demonstrate that the embedded implementation can achieve up to 27x energy savings, enabling low-cost genomics.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.7.3

A FAST AND ENERGY EFFICIENT COMPUTING-IN-MEMORY ARCHITECTURE FOR FEW-SHOT LEARNING APPLICATIONS
Speaker:
Dayane Reis, University of Notre Dame, US
Authors:
Dayane Reis, Ann Franchesca Laguna, Michael Niemier and X. Sharon Hu, University of Notre Dame, US
Abstract
Among few-shot learning methods, prototypical networks (PNs) are one of the most popular approaches due to their excellent classification accuracies and network simplicity. Test examples are classified based on their distances from class prototypes. Despite the application-level advantages of PNs, the latency of transferring data from memory to compute units is much higher than the PN computation time. Thus, PNs performance is limited by memory bandwidth. Computing-in-memory addresses this bandwidth-bottleneck problem by bringing a subset of compute units closer to memory. In this work, we propose a CiM-PN framework that enables the computation of distance metrics and prototypes inside the memory. CiM-PN replaces the computationally intensive Euclidean distance metric by the CiM-friendly Manhattan distance metric. Additionally, prototypes are computed using an in-memory mean operation realized by accumulation and division by powers of two, which enables few-shot learning implementations where "shots" are powers of two. The CiM-PN hardware uses CMOS memory cells, as well as CMOS peripherals such as customized sense amplifiers, carry look-ahead adders, in-place copy buffers and a log bit-shifter. Compared with a GPU implementation, a CMOS-based CiM-PN achieves speedups of 2808x/111x and energy savings of 2372x/5170x at iso-accuracy for the prototype and nearest-neighbor computation, respectively, and over 2x end-to-end speedup and energy improvements. We also gain 3-14% accuracy improvement when compared to existing non-GPU hardware approaches due to the floating-point CiM operations.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session