11.6 Video Architectures for Multimedia and Communications

Printer-friendly version PDF version

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Bayard

Chair:
Frederic Petro, TIMA, FR

Co-Chair:
Marcello Coppola, STMicroelectronics, FR

This session presents innovative work in video architectures and algorithms used in multimedia and communication systems.

TimeLabelPresentation Title
Authors
14:0011.6.1SAPPHIRE: AN ALWAYS-ON CONTEXT-AWARE COMPUTER VISION SYSTEM FOR PORTABLE DEVICES
Speakers:
Swagath Venkataramani1, Victor Bahl2, Xian-Sheng Hua2, Jie Liu2, Jin Li2, Matthai Phillipose2, Bodhi Priyantha2 and Mohammed Shoaib2
1Purdue University, US; 2Microsoft Research, US
Abstract
Being aware of objects in the ambient provides a new dimension of context awareness. Towards this goal, we present a system that exploits powerful computer vision algo- rithms in the cloud by collecting data through always-on cameras on portable devices. To reduce comunication-energy costs, our system allows client devices to continually analyze streams of video and distill out frames that contain objects of interest. Through a dedicated image-classification engine SAPPHIRE, we show that if an object is found in 5% of all frames, we end up selecting 30% of them to be able to detect the object 90% of the time: 70% data reduction on the client device at a cost of < 60mW of power (45nm ASIC). By doing so, we demonstrate system-level energy reductions of 2X. Thanks to multiple levels of pipelining and parallel vector-reduction stages, SAPPHIRE consumes only 3.0 mJ/frame and 38 pJ/OP - estimated to be lower by 11.4X than a 45 nm GPU - and a slightly higher level of peak performance (29 vs. 20 GFLOPS). Further, compared to a parallelized sofware implementation on a mobile CPU, it provides a processing speed up of up to 235X (1.81 s vs. 7.7 ms/frame), which is necessary to meet the real-time processing needs of an always-on context-aware system.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.6.2APPROXIMATE ASSOCIATIVE MEMRISTIVE MEMORY FOR ENERGY-EFFICIENT GPUS
Speakers:
Abbas Rahimi1, Amirali Ghofrani2, Kwang-Ting Cheng2, Luca Benini3 and Rajesh Gupta1
1UC San Diego, US; 2UC Santa Barbara, US; 3UniversitĂ  di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), IT
Abstract
Multimedia applications running on thousands of deep and wide pipelines working concurrently in GPUs have been an important target for power minimization both at the architectural and algorithmic levels. At the hardware level, energy-efficiency techniques that employ voltage overscaling face a barrier so-called "path walls": reducing operating voltage beyond a certain point generates massive number of timing errors that are impractical to tolerate. We propose an architectural innovation, called A2M2 module (approximate associative memristive memory) that exhibits few tolerable timing errors suitable for GPU applications under voltage overscaling. A2M2 is integrated with every floating point unit (FPU), and performs partial functionality of the associated FPU by pre-storing high frequency patterns for computational reuse that avoids overhead due to re-execution. Voltage overscaled A2M2 is designed to match an input search pattern with any of the stored patterns within a Hamming distance range of 0-2. This matching behavior under voltage overscaling leads to a controllable approximate computing for multimedia applications. Our experimental results for the AMD Southern Islands GPU show that four image processing kernels tolerate the mismatches during pattern matching resulting in a PSNR > 30dB. The A2M2 module with 8-row enables 28% voltage overscaling in 45nm technology resulting in 32% average energy saving for the kernels, while delivering an acceptable quality of service.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.6.3PLATFORM-AWARE DYNAMIC CONFIGURATION SUPPORT FOR EFFICIENT TEXT PROCESSING ON HETEROGENEOUS SYSTEM
Speakers:
Mi Sun Park1, Omesh Tickoo2, Vijaykrishnan Narayanan1, Mary Jane Irwin1 and Ravi Iyer2
1The Pennsylvania State University, US; 2Intel Labs, US
Abstract
Significant efforts have been made in accelerating computer vision and machine learning algorithms by utilizing parallel processors such as multi-core CPUs and GPUs. Although the suitability of GPU is well-known for computer graphics and image processing applications which require massively parallel floating-point computations, recent research movement towards general purpose computing on-GPU (GPGPU) makes it possible to take advantage of parallel processors to accelerate text processing applications as well. However, how to fully leverage different types of parallel processor architectures to obtain optimal performance (especially with text) without making specific efforts to each platform still remains a great challenge. We applied performance and accuracy enhancements to Naive Bayes algorithm to develop a practically sound implementation of text classification. A platform-aware dynamic configuration support automation flow is also proposed to support the seamless execution of our work across platforms. Experiments on various (integrated graphics, dedicated multiple GPUs) platforms demonstrate that our proposed approach improves both accuracy and performance of text classification.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.6.4A DEBLOCKING FILTER HARDWARE ARCHITECTURE FOR THE HIGH EFFICIENCY VIDEO CODING STANDARD
Speakers:
Cláudio Diniz1, Muhammad Shafique2, Felipe Dalcin1, Sergio Bampi1 and Joerg Henkel2
1Federal University of Rio Grande do Sul (UFRGS), BR; 2Karlsruhe Institute of Technology (KIT), DE
Abstract
The new deblocking filter (DF) tool of the next generation High Efficiency Video Coding (HEVC) standard is one of the most time consuming algorithms in video decoding. In order to achieve real-time performance at low-power consumption, we developed a hardware accelerator for this filter. This paper proposes a high throughput hardware architecture for HEVC deblocking filter employing hardware reuse to accelerate filtering decision units with a low area cost. Our architecture achieves either higher or equivalent throughput (4096x2048 @ 60 fps) with 5X-6X lower area compared to state-of-the-art deblocking filter architectures.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-15, 176FASTTREE: A HARDWARE KD-TREE CONSTRUCTION ACCELERATION ENGINE FOR REAL-TIME RAY TRACING
Speakers:
Xingyu Liu, Yangdong Deng, Yufei Ni and Zonghui Li, Institute of Microelectronics, Tsinghua University, CN
Abstract
The ray tracing algorithm is well-known for its ability to generate photo-realistic rendering effects. Recent years have witnessed a renewed momentum in pushing it to real-time for better user experience. Today the construction of acceleration structures, e.g., kd-tree, has become the bottleneck of ray tracing. A dedicated hardware architecture, FastTree, was proposed for kd-tree construction by adopting a fully parallel construction algorithm. FastTree was validated by an FPGA prototype and evaluated as an ASIC implementation. Experiment result shows FastTree outperforms existing hardware construction engines by a factor of nearly 4X at a similar area and power budget.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-16, 688REVERSE LONGSTAFF-SCHWARTZ AMERICAN OPTION PRICING ON HYBRID CPU/FPGA SYSTEMS
Speakers:
Christian Brugger, Javier Alejandro Varela, Norbert Wehn, Songyin Tang and Ralf Korn, University of Kaiserslautern, DE
Abstract
In today's markets, high-speed and energy-efficient computations are mandatory in the financial and insurance industry. At the same time, the gradual convergence of high-performance computing with embedded systems is having a huge impact on the design methodologies, where dedicated accelerators are implemented to increase performance and energy efficiency. This paper follows this trend and presents a novel way to price high-dimensional American options using techniques of the embedded community. The proposed architecture targets heterogeneous CPU/FPGA systems, and it exploits the FPGA reconfiguration to deliver high-throughput. With a bit-true algorithmic transformation based on recomputation, it is possible to eliminate the memory bottleneck and access costs. The result is a pricing system that is 16x faster and 268x more energy-efficient than an optimized Intel CPU implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00