## An Energy-Conscious Exploration Methodology for Reconfigurable DSPs Jan Rabaey and Marlene Wan EECS Department of the University of California, Berkeley, CA. USA ## Abstract As the "system-on-a-chip" concept is rapidly becoming a reality, time-to-market and product complexity push the reuse of complex macromodules. Circuits combining a variety of the macromodules (micro-processors, DSPs, programmable logic and embedded memories) are being reported by a number of companies [2]. Most of these systems target the embedded market where speed, area, and power requirements are paramount, and a balance between hardware and software implementation is needed. Reconfigurable computing devices have recently emerged as one of the major alternative implementation approaches, addressing most of the requirements outlined above. The design and reuse of this new generation of reconfigurable systems calls for a methodology that not only considers all of the PDA (power-delay-area) metrics simultaneously but also allows designers to evaluate different choices at early stages of design. Such a methodology needs to support trade-off between architectures of different programming granularity (from microprocessors to fine grain programmable logics) [6], which requires the capability to predict the impact of different implementation choices. We introduce an energy-conscious methodology to guide algorithm partitioning and mapping of embedded DSP applications onto heterogeneous architecture components. The methodology supports both realistic algorithm-architecture co-design (simultaneous optimization at different abstraction levels) as well as hardware-software co-design (optimization over various architectural alternatives). Macromodel based predictors are advocated in this methodology in order to provide early feedback on the impact of design selections and partitions. While the methodology can be applied to a wide range of embedded systems, it is described here for a particular low- Figure 1 : Low-power reconfigurable DSPs Figure 2: Architectures of different granularity and possible suitable computations power reconfigurable DSP architecture, which is the original motivator for our tool development. This novel architecture, which is under development as the Pleiades Project [5] at UC Berkeley, is composed of a core processor with heterogeneous accelerators (ranging from reconfigurable dataflow to fine grain FPGA). The architecture combines computing devices of varying granularity (Figure 2) to achieve flexibility over a range of applications, while still maintaining energy efficiency [1]. Figure 3 presents the basic exploration flow used to guide the algorithm partitioning onto the heterogeneous reconfig- Figure 3 : Basic flow of the proposed exploration methodology urable DSPs. This exploration environment relies on macromodelling at different levels to provide efficient yet effective feedback on the impact of various implementations. The following paragraph outlines some modeling techniques used for different architecture models. - For the processor core, instruction-level power modeling, together with a dynamic instruction tracer, is used. - Macromodelling techniques proposed in [4] or analytical models assuming white noise input are employed for parameterizable functional modules. - Libraries of macromodules can be used for PGAs. - Once the number and type of functional units are decided, the length and power consumption of the wires in the programmable interconnect can be predicted. Our current exploration utilizes the hyper-spreadsheet approach proposed by PowerPlay[3]. A snapshot of our exploration environment is shown in Figure 4. An application is mapped to the core at first and the energy cost of each function is presented. The most costly kernels (*dot\_product* is shown below as an example) are mapped to accelerator and new costs can be re-evaluated. Figure 4 A snapshot of the exploration Future efforts of our research are devoted to the following challenges: - Efficient memory allocation and address generation. - Automatic kernel mapping and performance estimation. - Automatic interface code generation for the reconfigurable architecture. ## References - A. Abnous and J. Rabaey, "Ultra-Low-Power Domain-Specific Multimedia Processors," *Proceedings of the IEEE VLSI Signal Processing Workshop*, San Francisco, October 1996. - J. Borel, "Technologies for multimedia systems on a chip", 1997 IEEE International Solid-State Circuits Conference. pages. 18-21. - [3] D. Lidsky and J. Rabaey, "Early Power Exploration -- a World Wide Web Application", Proc. Design Automation Conference, Las Vegas, NV, June 1996. - [4] P. Landman and J. Rabaey, "Black Box Capacitance Models for Architectural Power Analysis", Proc. of the International Workshop on Low Power Design, pp. 165-170, April 1994. - [5] The Pleiades project homepage (http://infopad.EECS.Berkeley.EDU/ research/reconfigurable/) - [6] J. M. Rabaey, "Reconfigurable Computing: the Solution to Low Power Programmable DSP", Proc. to 1997 ICASSP Conference, Munich, April 1997.