10.2 Neural Networks and Neurotechnology

Time	Label	Presentation Title Authors
11:00	10.2.1	DESIGN AND OPTIMIZATION OF FEFET-BASED CROSSBARS FOR BINARY CONVOLUTION NEURAL NETWORKS Speaker: Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN Authors: Xiaoming Chen, Xunzhao Yin, Michael Niemier and Xiaobo Sharon Hu, University of Notre Dame, US Abstract Binary convolution neural networks (CNNs) have attracted much attention for embedded applications due to low hardware cost and acceptable accuracy. Nonvolatile, resistive random-access memories (RRAMs) have been adopted to build crossbar accelerators for binary CNNs. However, RRAMs still face fundamental challenges such as sneak paths, high write energy, etc. We exploit another emerging nonvolatile device---- ferroelectric field-effect transistor (FeFET), to build crossbars to improve the energy efficiency for binary CNNs. Due to the three-terminal transistor structure, an FeFET can function as both a nonvolatile storage element and a controllable switch, such that both write and read power can be reduced. Simulation results demonstrate that compared with two RRAMbased crossbar structures, our FeFET-based design improves write power by 5600X and 395X, and read power by 4.1X and 3.1X. We also tackle an important challenge in crossbar-based CNN accelerators: when a crossbar array is not large enough to hold the weights of one convolution layer, how do we partition the workload and map computations to the crossbar array? We introduce a hardware-software co-optimization solution for this problem that is universal for any crossbar accelerators. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.2.2	LOW-POWER 3D INTEGRATION USING INDUCTIVE COUPLING LINKS FOR NEUROTECHNOLOGY APPLICATIONS Speaker: Benjamin Fletcher, University of Southampton, GB Authors: Benjamin Fletcher¹, Shidhartha Das², Chi-Sang Poon³ and Terrence Mak¹ ¹University of Southampton, GB; ²ARM Ltd., GB; ³Massachusetts Institute of Technology, US Abstract Three dimensional system integration offers the ability to stack multiple dies, fabricated in disparate technologies, within a single IC. For this reason, it is gaining popularity for use in sensor devices which perform concurrent analogue and digital processing, as both analogue and digital dies can be coupled together. One such class of devices are closed-loop neuromodulators; neurostimulators which perform real-time digital signal processing (DSP) to deliver bespoke treatment. Due to their implantable nature, these devices are inherently governed by very strict volume constraints, power budgets, and must operate with high reliability. To address these challenges, this paper presents a low-power inductive coupling link (ICL) transceiver for 3D integration of digital CMOS and analogue BiCMOS dies for use in closed-loop neuromodulators. The use of an ICL, as opposed to through silicon vias (TSVs), ensures high reliability and fabrication yield in addition to circumventing the use of voltage level conversion between disparate dies, improving power efficiency. The proposed transceiver is experimentally evaluated using SPICE as well as nine traditional TSV baseline solutions. Results demonstrate that, whilst the achievable bandwidth of the TSV-based approaches is much higher, for the typical data rates demanded by neuromodulator applications (0.5 - 1 Gbps) the ICL design consumes on average 36.7% less power through avoiding the use of voltage level shifters. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.2.3	MAPPING OF LOCAL AND GLOBAL SYNAPSES ON SPIKING NEUROMORPHIC HARDWARE Speaker: Francky Catthoor, IMEC Fellow, BE Authors: Anup Das¹, Yuefeng Wu¹, Khanh Huynh¹, Francesco Dell Anna², Francky Catthoor² and Siebren Schaafsma¹ ¹IMEC, NL; ²IMEC, BE Abstract Spiking neural networks (SNNs) are widely deployed to solve complex pattern recognition, function approximation and image classification tasks. With the growing size and complexity of these networks, hardware implementation becomes challenging because scaling up the size of a single array (crossbar) of fully connected neurons is no longer feasible due to strict energy budget. Modern neromorphic hardware integrate small-sized crossbars with time-multiplexed interconnects. Partitioning SNNs becomes essential in order to map them on neuromorphic hardware with the major aim to reduce the global communication latency and energy overhead. To achieve this goal, we propose our instantiation of particle swarm optimization, which partitions SNNs into local synapses (mapped on crossbars) and global synapses (mapped on time-multiplexed interconnects), with the objective of reducing spike communication on the interconnect. This improves latency, power consumption as well as application performance by reducing inter-spike interval distortion and spike disorders. Our framework is implemented in Python, interfacing CARLsim, a GPU-accelerated application-level spiking neural network simulator with an extended version of Noxim, for simulating time-multiplexed interconnects. Experiments are conducted with realistic and synthetic SNN-based applications with different computation models, topologies and spike coding schemes. Using power numbers from in-house neuromorphic chips, we demonstrate significant reductions in energy consumption and spike latency over PACMAN, the widely-used partitioning technique for SNNs on SpiNNaker. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	10.2.4	ENERGY-EFFICIENT NEURAL NETWORKS USING APPROXIMATE COMPUTATION REUSE Speaker: Xun Jiao, University of California San Diego, US Authors: Xun Jiao¹, Vahideh Akhlaghi¹, Yu Jiang² and Rajesh Gupta¹ ¹University of California, San Diego, US; ²Tsinghua University, CN Abstract As a problem-solving method, neural networks have shown broad success for medical applications, speech recognition, and natural language processing. Current hardware implementations of neural networks exhibit high energy consumption due to the intensive computing workloads. This paper proposes a methodology to design an energy-efficient neural network that effectively exploits computation reuse opportunities. To do so, we use Bloom filters (BFs) by tightly integrating them with computation units. BFs store and recall frequently occurring input patterns to reuse computations. We expand the opportunities for computation reuse by storing frequent input patterns specific to a given layer and using approximate pattern matching with hashing for limited data precision. This reconfigurable matching is key to achieving a "controllable approximation" for neural networks. To lower the energy consumption of BFs, we also use low-pow memristor arrays to implement BFs. Our experimental results show that for convolutional neural networks, the BFs enable 47.5% energy saving of multiplication operations while incurring only 1% accuracy drop. While the actual savings will vary depending upon the extent of approximation and reuse, this paper presents a method for reducing computing workloads and improving energy efficiency. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP4-11, 428	AN ENERGY-EFFICIENT STOCHASTIC COMPUTATIONAL DEEP BELIEF NETWORK Speaker: Yidong Liu, University of Alberta, CA Authors: Yidong Liu¹, Yanzhi Wang², Fabrizio Lombardi³ and Jie Han¹ ¹University of Alberta, CA; ²Syracuse university, US; ³Northeastern University, US Abstract Deep neural networks (DNNs) are effective machine learning models to solve a large class of recognition problems, including the classification of nonlinearly separable patterns. The applications of DNNs are, however, limited by the large size and high energy consumption of the networks. Recently, stochastic computation (SC) has been considered to implement DNNs to reduce the hardware cost. However, it requires a large number of random number generators (RNGs) that lower the energy efficiency of the network. To overcome these limitations, we propose the design of an energy-efficient deep belief network (DBN) based on stochastic computation. An approximate SC activation unit (A-SCAU) is designed to implement different types of activation functions in the neurons. The A-SCAU is immune to signal correlations, so the RNGs can be shared among all neurons in the same layer with no accuracy loss. The area and energy of the proposed design are 5.27% and 3.31% (or 26.55% and 29.89%) of a 32-bit floating-point (or an 8-bit fixed-point) implementation. It is shown that the proposed SC-DBN design achieves a higher classification accuracy compared to the fixed-point implementation. The accuracy is only lower by 0.12% than the floating-point design at a similar computation speed, but with a significantly lower energy consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP4-12, 375	PUSHING THE NUMBER OF QUBITS BELOW THE "MINIMUM": REALIZING COMPACT BOOLEAN COMPONENTS FOR QUANTUM LOGIC Speaker: Alwin Zulehner, Johannes Kepler University Linz, AT Authors: Alwin Zulehner and Robert Wille, Johannes Kepler University Linz, AT Abstract Research on quantum computers has gained attention since they are able to solve certain tasks significantly faster than classical machines (in some cases, exponential speed-ups are possible). Since quantum computations typically contain large Boolean components, design automation techniques are required to realize the respective Boolean functions in quantum logic. They usually introduce a significant amount of additional qubits - a highly limited resource. In this work, we propose an alternative method for the realization of Boolean components for quantum logic. In contrast to the current state-of-the-art, we dedicatedly address the main reasons causing the additionally required qubits (namely the number of the most frequently occurring output pattern as well as the number of primary outputs of the function to be realized) and propose to manipulate the function so that both issues are addressed. The resulting methods allow to push the number of required qubits below what is currently considered the minimum. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break in Großer Saal and Saal 1 Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

11:00

10.2.1

DESIGN AND OPTIMIZATION OF FEFET-BASED CROSSBARS FOR BINARY CONVOLUTION NEURAL NETWORKS
Speaker:
Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Xiaoming Chen, Xunzhao Yin, Michael Niemier and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Binary convolution neural networks (CNNs) have attracted much attention for embedded applications due to low hardware cost and acceptable accuracy. Nonvolatile, resistive random-access memories (RRAMs) have been adopted to build crossbar accelerators for binary CNNs. However, RRAMs still face fundamental challenges such as sneak paths, high write energy, etc. We exploit another emerging nonvolatile device---- ferroelectric field-effect transistor (FeFET), to build crossbars to improve the energy efficiency for binary CNNs. Due to the three-terminal transistor structure, an FeFET can function as both a nonvolatile storage element and a controllable switch, such that both write and read power can be reduced. Simulation results demonstrate that compared with two RRAMbased crossbar structures, our FeFET-based design improves write power by 5600X and 395X, and read power by 4.1X and 3.1X. We also tackle an important challenge in crossbar-based CNN accelerators: when a crossbar array is not large enough to hold the weights of one convolution layer, how do we partition the workload and map computations to the crossbar array? We introduce a hardware-software co-optimization solution for this problem that is universal for any crossbar accelerators.
Download Paper (PDF; Only available from the DATE venue WiFi)

11:30

10.2.2

LOW-POWER 3D INTEGRATION USING INDUCTIVE COUPLING LINKS FOR NEUROTECHNOLOGY APPLICATIONS
Speaker:
Benjamin Fletcher, University of Southampton, GB
Authors:
Benjamin Fletcher¹, Shidhartha Das², Chi-Sang Poon³ and Terrence Mak¹
¹University of Southampton, GB; ²ARM Ltd., GB; ³Massachusetts Institute of Technology, US
Abstract
Three dimensional system integration offers the ability to stack multiple dies, fabricated in disparate technologies, within a single IC. For this reason, it is gaining popularity for use in sensor devices which perform concurrent analogue and digital processing, as both analogue and digital dies can be coupled together. One such class of devices are closed-loop neuromodulators; neurostimulators which perform real-time digital signal processing (DSP) to deliver bespoke treatment. Due to their implantable nature, these devices are inherently governed by very strict volume constraints, power budgets, and must operate with high reliability. To address these challenges, this paper presents a low-power inductive coupling link (ICL) transceiver for 3D integration of digital CMOS and analogue BiCMOS dies for use in closed-loop neuromodulators. The use of an ICL, as opposed to through silicon vias (TSVs), ensures high reliability and fabrication yield in addition to circumventing the use of voltage level conversion between disparate dies, improving power efficiency. The proposed transceiver is experimentally evaluated using SPICE as well as nine traditional TSV baseline solutions. Results demonstrate that, whilst the achievable bandwidth of the TSV-based approaches is much higher, for the typical data rates demanded by neuromodulator applications (0.5 - 1 Gbps) the ICL design consumes on average 36.7% less power through avoiding the use of voltage level shifters.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

10.2.3

MAPPING OF LOCAL AND GLOBAL SYNAPSES ON SPIKING NEUROMORPHIC HARDWARE
Speaker:
Francky Catthoor, IMEC Fellow, BE
Authors:
Anup Das¹, Yuefeng Wu¹, Khanh Huynh¹, Francesco Dell Anna², Francky Catthoor² and Siebren Schaafsma¹
¹IMEC, NL; ²IMEC, BE
Abstract
Spiking neural networks (SNNs) are widely deployed to solve complex pattern recognition, function approximation and image classification tasks. With the growing size and complexity of these networks, hardware implementation becomes challenging because scaling up the size of a single array (crossbar) of fully connected neurons is no longer feasible due to strict energy budget. Modern neromorphic hardware integrate small-sized crossbars with time-multiplexed interconnects. Partitioning SNNs becomes essential in order to map them on neuromorphic hardware with the major aim to reduce the global communication latency and energy overhead. To achieve this goal, we propose our instantiation of particle swarm optimization, which partitions SNNs into local synapses (mapped on crossbars) and global synapses (mapped on time-multiplexed interconnects), with the objective of reducing spike communication on the interconnect. This improves latency, power consumption as well as application performance by reducing inter-spike interval distortion and spike disorders. Our framework is implemented in Python, interfacing CARLsim, a GPU-accelerated application-level spiking neural network simulator with an extended version of Noxim, for simulating time-multiplexed interconnects. Experiments are conducted with realistic and synthetic SNN-based applications with different computation models, topologies and spike coding schemes. Using power numbers from in-house neuromorphic chips, we demonstrate significant reductions in energy consumption and spike latency over PACMAN, the widely-used partitioning technique for SNNs on SpiNNaker.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:15

10.2.4

ENERGY-EFFICIENT NEURAL NETWORKS USING APPROXIMATE COMPUTATION REUSE
Speaker:
Xun Jiao, University of California San Diego, US
Authors:
Xun Jiao¹, Vahideh Akhlaghi¹, Yu Jiang² and Rajesh Gupta¹
¹University of California, San Diego, US; ²Tsinghua University, CN
Abstract
As a problem-solving method, neural networks have shown broad success for medical applications, speech recognition, and natural language processing. Current hardware implementations of neural networks exhibit high energy consumption due to the intensive computing workloads. This paper proposes a methodology to design an energy-efficient neural network that effectively exploits computation reuse opportunities. To do so, we use Bloom filters (BFs) by tightly integrating them with computation units. BFs store and recall frequently occurring input patterns to reuse computations. We expand the opportunities for computation reuse by storing frequent input patterns specific to a given layer and using approximate pattern matching with hashing for limited data precision. This reconfigurable matching is key to achieving a "controllable approximation" for neural networks. To lower the energy consumption of BFs, we also use low-pow memristor arrays to implement BFs. Our experimental results show that for convolutional neural networks, the BFs enable 47.5% energy saving of multiplication operations while incurring only 1% accuracy drop. While the actual savings will vary depending upon the extent of approximation and reuse, this paper presents a method for reducing computing workloads and improving energy efficiency.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

IP4-11, 428

AN ENERGY-EFFICIENT STOCHASTIC COMPUTATIONAL DEEP BELIEF NETWORK
Speaker:
Yidong Liu, University of Alberta, CA
Authors:
Yidong Liu¹, Yanzhi Wang², Fabrizio Lombardi³ and Jie Han¹
¹University of Alberta, CA; ²Syracuse university, US; ³Northeastern University, US
Abstract
Deep neural networks (DNNs) are effective machine learning models to solve a large class of recognition problems, including the classification of nonlinearly separable patterns. The applications of DNNs are, however, limited by the large size and high energy consumption of the networks. Recently, stochastic computation (SC) has been considered to implement DNNs to reduce the hardware cost. However, it requires a large number of random number generators (RNGs) that lower the energy efficiency of the network. To overcome these limitations, we propose the design of an energy-efficient deep belief network (DBN) based on stochastic computation. An approximate SC activation unit (A-SCAU) is designed to implement different types of activation functions in the neurons. The A-SCAU is immune to signal correlations, so the RNGs can be shared among all neurons in the same layer with no accuracy loss. The area and energy of the proposed design are 5.27% and 3.31% (or 26.55% and 29.89%) of a 32-bit floating-point (or an 8-bit fixed-point) implementation. It is shown that the proposed SC-DBN design achieves a higher classification accuracy compared to the fixed-point implementation. The accuracy is only lower by 0.12% than the floating-point design at a similar computation speed, but with a significantly lower energy consumption.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:31

IP4-12, 375

PUSHING THE NUMBER OF QUBITS BELOW THE "MINIMUM": REALIZING COMPACT BOOLEAN COMPONENTS FOR QUANTUM LOGIC
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Research on quantum computers has gained attention since they are able to solve certain tasks significantly faster than classical machines (in some cases, exponential speed-ups are possible). Since quantum computations typically contain large Boolean components, design automation techniques are required to realize the respective Boolean functions in quantum logic. They usually introduce a significant amount of additional qubits - a highly limited resource. In this work, we propose an alternative method for the realization of Boolean components for quantum logic. In contrast to the current state-of-the-art, we dedicatedly address the main reasons causing the additionally required qubits (namely the number of the most frequently occurring output pattern as well as the number of primary outputs of the function to be realized) and propose to manipulate the function so that both issues are addressed. The resulting methods allow to push the number of required qubits below what is currently considered the minimum.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

End of session
Lunch Break in Großer Saal and Saal 1

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018