A Framework for Efficient and Binary Clustering in High-Dimensional Space
Alejandro Hernández-Cano1, Yeseong Kim2 and Mohsen Imani3
1School of Science Universidad Nacional Autónoma de México
ale.hdz333@ciencias.unam.mx
2Daegu Gyeongbuk Institute of Science and Technology
yeseongkim@dgist.ac.kr
3Department of Computer Science University of California, Irvine
m.imani@uci.edu
ABSTRACT
Today’s applications generate a large amount of data where the majority of the data are not associated with any labels. Clustering methods are the most commonly used algorithms for data analysis, especially in healthcare. However, running clustering algorithms on embedded devices is significantly slow as the computation involves a large amount of complex pairwise similarity measurements. In this paper, we proposed FebHD, an adaptive framework for efficient and fully binary clustering in high-dimensional space. Instead of using complex similarity metrics, e.g., Euclidean distance, FebHD introduces a nonlinear encoder to map data points into sparse high-dimensional space. FebHD encoder simplifies the similarity search, the most costly and frequent clustering operation, to Hamming distance, which can be accelerated in today’s hardware. FebHD performs clustering by assigning each data point to a set of initialized centers. It then updates the centers adaptively based on: (i) data points assigned to each cluster, and (ii) the confidence of the model on the clustering prediction. This adaptive update enables FebHD to provide a high quality of clustering with very few learning iterations. We also propose an end-to-end hardware accelerator that parallelizes the entire FebHD computation by exploiting FPGA bit-level granularity. Our evaluation shows that FebHD provides comparable accuracy to state-of-the-art clustering algorithms, while providing 6.2× and 9.1× (4.7× and 5.8×) faster and higher energy efficiency when running on the same FPGA (GPU) platform.