FPGA Acceleration of Protein Back-Translation and Alignment

Sahand Salamat1,a, Jaeyoung Kang1,b, Yeseong Kim1,c, Mohsen Imani2, Niema Moshiri1,d and Tajana Rosing1,e
1Department of Computer Science and Engineering, UC San Diego, CA 92093, USA
asasalama@ucsd.edu
bj5kang@ucsd.edu
cyek048@ucsd.edu
da1moshir@ucsd.edu
etajana@ucsd.edu
2Department of Computer Science, University of California, Irvine, CA 92697, USA
m.imani@uci.edu

ABSTRACT


Identifying genome functionality changes our understanding of humans and helps us in disease diagnosis; as well as drug, bio-material, and genetic engineering of plants and animals. Comparing the structure of the protein sequences, when only sequence information is available, against a database with known functionality helps us to identify and recognize the functionality of the unknown sequence. The process of predicting the possible RNA sequence that a specific protein has originated from is called backtranslation. Aligning the back-translated RNA sequence against the database locates the most similar sequences, which is used to predict the functionality of the unknown protein sequence. Providing massive parallelism, FPGAs can accelerate bioinformatics applications substantially. In this paper, we propose, FabP1, an optimized FPGA-based accelerator for aligning a back-translated protein sequence against a database of DNA/RNA sequences. FabP is deeply optimized to fully utilize the FPGA resources and the DRAM memory bandwidth to maximize the performance. FabP on a mid-range FPGA provides 8.1% and 23.3× (24.8× and 266.8×) speedup and higher energy efficiency as compared to the GPU-based implementation on a high-end NVIDIA GPU (stateof- the-art CPU implementation), respectively.



Full Text (PDF)