10.6 Approximate computing and neural networks for novel communication and multimedia systems

Time	Label	Presentation Title Authors
11:00	10.6.1	EXPLOITING SPECIAL-PURPOSE FUNCTION APPROXIMATION FOR HARDWARE-EFFICIENT QR-DECOMPOSITION Speaker: Jochen Rust, University of Bremen, DE Authors: Jochen Rust¹ and Steffen Paul² ¹University of Bremen, DE; ²University Bremen, DE Abstract Efficient signal processing takes a key role in application-specific circuit design. For instance, future mobile communication standards, e.g., high-performance industrial mobile communication, require high data rates, low latency and/or high energy-efficiency. Hence, sophisticated algorithms and computing schemes must be explored to satisfy these challenging constraints. In this paper we leverage the paradigm of approximate computing to enable hardware-efficient QR-decomposition for channel precoding. For an efficient computation of the Givens-Rotation, bivariate, non-linear numeric functions are taken into account. An effective design method is introduced leading to highly adapted (special-purpose) functions. For evaluation, our work is tested with different configurations in a Tomlinson-Harashima precoding downlink environment. In addition, a corresponding HDL implementation is set up and logic and physical CMOS synthesis is performed. The comparison to actual references prove our work to be a powerful approach for future mobile communication systems. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.6.2	(Best Paper Award Candidate) EMBRACING APPROXIMATE COMPUTING FOR ENERGY-EFFICIENT MOTION ESTIMATION IN HIGH EFFICIENCY VIDEO CODING Speaker: Muhammad Shafique, Vienna University of Technology (TU Wien), AT Authors: Walaa El-Harouni¹, Semeen Rehman², Bharath Srinivas Prabakaran², Akash Kumar³, Rehan Hafiz⁴ and Muhammad Shafique⁵ ¹Private Researcher, DE; ²Technische Universität Dresden, DE; ³Technische Universitaet Dresden, DE; ⁴ITU, PK; ⁵Vienna University of Technology (TU Wien), AT Abstract Approximate Computing is an emerging paradigm for developing highly energy-efficient computing systems. It leverages the inherent resilience of applications to trade output quality with energy efficiency. In this paper, we present a novel approximate architecture for energy-efficient motion estimation (ME) in high efficiency video coding (HEVC). We synthesized our designs for both ASIC and FPGA design flows. ModelSim gate-level simulations are used for functional and timing verification. We comprehensively analyze the impact of heterogeneous approximation modes on the power/energy-quality tradeoffs for various video sequences. To facilitate reproducible results for comparisons and further research and development, the RTL and behavioral models of approximate SAD architectures and constituting approximate modules are made available at https://sourceforge.net/projects/lpaclib/. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.6.3	HARDWARE ARCHITECTURE OF BIDIRECTIONAL LONG SHORT-TERM MEMORY NEURAL NETWORK FOR OPTICAL CHARACTER RECOGNITION Speaker: Vladimir Rybalkin, University of Kaiserslautern, DE Authors: Vladimir Rybalkin¹, Mohammad Reza Yousefi², Norbert Wehn¹ and Didier Stricker³ ¹University of Kaiserslautern, DE; ²Augmented Vision Department, German Research Center for Artificial Intelligence (DFKI), DE; ³German Research Center for Artificial Intelligence (DFKI), DE Abstract Optical Character Recognition is conversion of printed or handwritten text images into machine-encoded text. It is a building block of many processes such as machine translation, text-to-speech conversion and text mining. Bidirectional Long Short-Term Memory Neural Networks have shown a superior performance in character recognition with respect to other types of neural networks. In this paper, to the best of our knowledge, we propose the first hardware architecture of Bidirectional Long Short-Term Memory Neural Network with Connectionist Temporal Classification for Optical Character Recognition. Based on the new architecture, we present an FPGA hardware accelerator that achieves 459 times higher throughput than state-of-the-art. Visual recognition is a typical task on mobile platforms that usually use two scenarios either the task runs locally on embedded processor or offloaded to a cloud to be run on high performance machine. We show that computationally intensive visual recognition task benefits from being migrated to our dedicated hardware accelerator and outperforms high-performance CPU in terms of runtime, while consuming less energy than low power systems with negligible loss of recognition accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-2, 436	EXTENDING MEMORY CAPACITY OF NEURAL ASSOCIATIVE MEMORY BASED ON RECURSIVE SYNAPTIC BIT REUSE Speaker: Tianchan Guan, Columbia University, US Authors: Tianchan Guan¹, Xiaoyang Zeng¹ and Mingoo Seok² ¹Fudan University, CN; ²Columbia University, US Abstract Neural associative memory (AM) is one of the critical building blocks for cognitive workloads such as classification and recognition. It learns and retrieves memories as humans brain does, i.e., changing the strengths of plastic synapses (weights) based on inputs and retrieving information by information itself. One of the key challenges in designing AM is to extend memory capacity (i.e., memories that a neural AM can learn) while minimizing power and hardware overhead. However, prior arts show that memory capacity scales slowly, often logarithmically or in squire root with the total bits of synaptic weights. This makes it prohibitive in hardware and power to achieve large memory capacity for practical applications. In this paper, we propose a synaptic model called recursive synaptic bit reuse, which enables near-linear scaling of memory capacity with total synaptic bits. Also, our model can handle input data that are correlated, more robustly than the conventional model. We experiment our proposed model in Hopfield Neural Networks (HNN) which contains the total synaptic bits of 5kB to 327kB and find that our model can increase the memory capacity as large as 30X over conventional models. We also study hardware cost via VLSI implementation of HNNs in a 65nm CMOS, confirming that our proposed model can achieve up to 10X area savings at the same capacity over conventional synaptic model. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break in Garden Foyer Keynote Lecture session 11.0 in "Garden Foyer" 1320 - 1350 Lunch Break in the Garden Foyer On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

Time

Label

Presentation Title
Authors

11:00

10.6.1

EXPLOITING SPECIAL-PURPOSE FUNCTION APPROXIMATION FOR HARDWARE-EFFICIENT QR-DECOMPOSITION
Speaker:
Jochen Rust, University of Bremen, DE
Authors:
Jochen Rust¹ and Steffen Paul²
¹University of Bremen, DE; ²University Bremen, DE
Abstract
Efficient signal processing takes a key role in application-specific circuit design. For instance, future mobile communication standards, e.g., high-performance industrial mobile communication, require high data rates, low latency and/or high energy-efficiency. Hence, sophisticated algorithms and computing schemes must be explored to satisfy these challenging constraints. In this paper we leverage the paradigm of approximate computing to enable hardware-efficient QR-decomposition for channel precoding. For an efficient computation of the Givens-Rotation, bivariate, non-linear numeric functions are taken into account. An effective design method is introduced leading to highly adapted (special-purpose) functions. For evaluation, our work is tested with different configurations in a Tomlinson-Harashima precoding downlink environment. In addition, a corresponding HDL implementation is set up and logic and physical CMOS synthesis is performed. The comparison to actual references prove our work to be a powerful approach for future mobile communication systems.
Download Paper (PDF; Only available from the DATE venue WiFi)

11:30

10.6.2

(Best Paper Award Candidate)
EMBRACING APPROXIMATE COMPUTING FOR ENERGY-EFFICIENT MOTION ESTIMATION IN HIGH EFFICIENCY VIDEO CODING
Speaker:
Muhammad Shafique, Vienna University of Technology (TU Wien), AT
Authors:
Walaa El-Harouni¹, Semeen Rehman², Bharath Srinivas Prabakaran², Akash Kumar³, Rehan Hafiz⁴ and Muhammad Shafique⁵
¹Private Researcher, DE; ²Technische Universität Dresden, DE; ³Technische Universitaet Dresden, DE; ⁴ITU, PK; ⁵Vienna University of Technology (TU Wien), AT
Abstract
Approximate Computing is an emerging paradigm for developing highly energy-efficient computing systems. It leverages the inherent resilience of applications to trade output quality with energy efficiency. In this paper, we present a novel approximate architecture for energy-efficient motion estimation (ME) in high efficiency video coding (HEVC). We synthesized our designs for both ASIC and FPGA design flows. ModelSim gate-level simulations are used for functional and timing verification. We comprehensively analyze the impact of heterogeneous approximation modes on the power/energy-quality tradeoffs for various video sequences. To facilitate reproducible results for comparisons and further research and development, the RTL and behavioral models of approximate SAD architectures and constituting approximate modules are made available at https://sourceforge.net/projects/lpaclib/.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

10.6.3

HARDWARE ARCHITECTURE OF BIDIRECTIONAL LONG SHORT-TERM MEMORY NEURAL NETWORK FOR OPTICAL CHARACTER RECOGNITION
Speaker:
Vladimir Rybalkin, University of Kaiserslautern, DE
Authors:
Vladimir Rybalkin¹, Mohammad Reza Yousefi², Norbert Wehn¹ and Didier Stricker³
¹University of Kaiserslautern, DE; ²Augmented Vision Department, German Research Center for Artificial Intelligence (DFKI), DE; ³German Research Center for Artificial Intelligence (DFKI), DE
Abstract
Optical Character Recognition is conversion of printed or handwritten text images into machine-encoded text. It is a building block of many processes such as machine translation, text-to-speech conversion and text mining. Bidirectional Long Short-Term Memory Neural Networks have shown a superior performance in character recognition with respect to other types of neural networks. In this paper, to the best of our knowledge, we propose the first hardware architecture of Bidirectional Long Short-Term Memory Neural Network with Connectionist Temporal Classification for Optical Character Recognition. Based on the new architecture, we present an FPGA hardware accelerator that achieves 459 times higher throughput than state-of-the-art. Visual recognition is a typical task on mobile platforms that usually use two scenarios either the task runs locally on embedded processor or offloaded to a cloud to be run on high performance machine. We show that computationally intensive visual recognition task benefits from being migrated to our dedicated hardware accelerator and outperforms high-performance CPU in terms of runtime, while consuming less energy than low power systems with negligible loss of recognition accuracy.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

IP5-2, 436

EXTENDING MEMORY CAPACITY OF NEURAL ASSOCIATIVE MEMORY BASED ON RECURSIVE SYNAPTIC BIT REUSE
Speaker:
Tianchan Guan, Columbia University, US
Authors:
Tianchan Guan¹, Xiaoyang Zeng¹ and Mingoo Seok²
¹Fudan University, CN; ²Columbia University, US
Abstract
Neural associative memory (AM) is one of the critical building blocks for cognitive workloads such as classification and recognition. It learns and retrieves memories as humans brain does, i.e., changing the strengths of plastic synapses (weights) based on inputs and retrieving information by information itself. One of the key challenges in designing AM is to extend memory capacity (i.e., memories that a neural AM can learn) while minimizing power and hardware overhead. However, prior arts show that memory capacity scales slowly, often logarithmically or in squire root with the total bits of synaptic weights. This makes it prohibitive in hardware and power to achieve large memory capacity for practical applications. In this paper, we propose a synaptic model called recursive synaptic bit reuse, which enables near-linear scaling of memory capacity with total synaptic bits. Also, our model can handle input data that are correlated, more robustly than the conventional model. We experiment our proposed model in Hopfield Neural Networks (HNN) which contains the total synaptic bits of 5kB to 327kB and find that our model can increase the memory capacity as large as 30X over conventional models. We also study hardware cost via VLSI implementation of HNNs in a 65nm CMOS, confirming that our proposed model can achieve up to 10X area savings at the same capacity over conventional synaptic model.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

End of session
Lunch Break in Garden Foyer

Keynote Lecture session 11.0 in "Garden Foyer" 1320 - 1350

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.