SIGDA, Super Compendium, DATE 2001, Abstracts

DATE 2001 Abstracts

Sessions: [Keynote] [1A] [1B] [1C] [1E] [2A] [2B] [2C] [2E] [3A] [3B] [3C] [3E] [4A] [4B] [4C] [4E] [4F] [5A] [5B] [5C] [5E] [5F] [6A] [6B] [6C] [6E] [6F] [7A] [7B] [7C] [7E] [7F] [8A] [8B] [8C] [8E] [8F] [9A] [9B] [9C] [9E] [9F] [9L] [10A] [10B] [10C] [10E] [10F] [Posters]

Plenary -- Keynote Session

Moderator: A. Jerraya, TIMA, Grenoble, F

The Semiconductor Dynamic in the Information Age -- Driving New Technologies, Trends and Markets: U. Schumacher, CEO, Infineon, Munich, D

1A: Complementary Approaches to Designing Correct Circuits

Moderators: T. Kropf, Robert Bosch GmbH, D; H. Eveking, TU Darmstadt, D

Abstraction of Word-Level Linear Arithmetic Functions from Bit-Level Component Descriptions [p. 4]

P. Dasgupta, P. Chakrabarti, A. Nandi, S. Krishna, and A. Chakrabarti

RTL descriptions for word-level arithmetic components typically specify the architecture at the bit-level of the registers. The problem studied in this paper is to abstract the word-level functionality of a component from its bit-level specification. This is particularly useful in simulation since word-level descriptions can be simulated much faster than bit-level descriptions. Word-level abstractions are also useful for reducing the complexity of component matching, since the number of words is significantly smaller than the number of bits. This paper presents an algorithm for abstraction of word-level linear functions from bit-level component descriptions. We also present complexity results for component matching which justifies the advantage of performing abstraction prior to component matching.

Biasing Symbolic Search by Means of Dynamic Activity Profiles [p. 9]

G. Cabodi, P. Camurati, and S. Quer

We address BDD based reachability analysis, which is the core technique of symbolic sequential verification and Model Checking. Within this framework, non purely breadth-first and guided traversals have shown their value to improve efficiency by reducing memory consumption for BDD representation. We propose a guided search strategy exploiting performance statistics. These activity figures are gathered through a continuous and dynamic learning process on a variable-by-variable basis. This technique is completely integrated with the reachability analysis routine, as it is fully compatible with dynamic reordering and allows multiple partial traversal phases. We thus move away from the static and manual schemes, which are one of the main limitations of previous approaches. Experiments are given to demonstrate the efficiency and robustness of the approach.

1B: New Design Methods with SystemC

Moderators: W. Rosenstiel, FZI/Tuebingen U, D; E. Villar, Cantabria U, ES

A Methodology for Interfacing Open Source SystemC with a Third Party Software [p. 16]

L. Charest, M. Reid, E. Aboulhamid, and G. Bois

SystemC is a new open source library in C++ for developing cycle-accurate or more abstract models of software algorithms, hardware architecture and system-level designs. SystemC is meant to be an interoperable, modeling platform allowing seamless tool integration. Our objective is to evaluate the feasibility of linking a third party software to SystemC without modifying the SystemC source. We chose the development of a GUI as such an application. This application illustrates a set of applications following the observer pattern defined recently in software engineering. This class of applications can be loosely coupled to a platform designed following specific rules of software reuse.

Behavioral Synthesis with SystemC [p. 21]

G. Economakos, P. Oikonomakos, I. Panagopoulos, I. Poulakis, and G. Papakonstantinou

Having to cope with the continuously increasing complexity of modern digital systems, hardware designers are considering more and more seriously language based methodologies for parts of their designs. Last year, the introduction of a new language for hardware descriptions, the SystemC C++ class library, initiated a closer relationship between software and hardware descriptions and development tools. This paper presents a synthesis environment and the corresponding synthesis methodology, based on traditional compiler generation techniques, which incorporate SystemC, VHDL and Verilog to transform existing algorithmic software models into hardware system implementations. Following this approach, reusability of software components is introduced in the hardware world and time-to-market is decreased, as shown by experimental results.

SystemC^SV -- An Extension of SystemC for Mixed Multi-Level Communication Modeling and Interface-Based System Design [p. 26]

R. Siegmund and D. Müller

An extension of SystemC for mixed-multi level communication modeling and Interface-based system design is proposed in this paper. SystemC SV provides a new design unit, the interface, which enables specification, design and verification of system communication separately from system functionality, thus introducing a new quality of system design into SystemC. The concepts and computational model of SystemC SV interfaces are presented together with a design example, the digital part of a wireless SmartCard transponder-reader/writer system.

1C: Embedded Tutorial: TRP: Integrating Embedded Test and ATE

Organizer: Y. Zorian, LogicVision, USA
Moderator: P. Prinetto, Politecnico di Torino, IT
Speakers: J. Teixeira, IST/INESC, PT; I. Teixeira, IST/INESC, PT; C. Pereira, UFRGS, BR; O. Dias, IST/INESC, PT; J. Semiao, IST/INESC, PT; P. Muhmenthaler, Infineon, D; Y. Zorian, LogicVision, USA; W. Radermacher, Agilent, USA

Test Resource Partitioning: A Design and Test Issue [p. 34]

Product development economics and specs drive the need for on chip embedded test functionality. However, optimal partitioning of test functionality between a tester and a SOC is a non-trivial task, which must be solved during the system analysis phase. Hence, at system level, a trade-off analysis must be performed, in order to evaluate the costs and benefits of different partitioning schemes. The purpose of this contribution is to present a methodology and tools, using the Object Oriented (OO) Paradigm and UML, and a set of architectural Quality Metrics (QMs), to analyze the impact of different TRP schemes on system's architecture. A 4-core SOC case study is presented to guide the discussion.

1E: Embedded Tutorial: Current Trends in the Design of Automotive Electronic Systems

Organizer and Moderator: P. van Staa, Robert Bosch GmbH, D
Speaker: T. Beck, ETAS GmbH, D

Current Trends in the Design of Automotive Electronic Systems [p. 38]

Future developments in the automotive industry will be governed by a variety of different requirements. Our vision of a modern vehicle includes comprehensive safety, a high degree of comfort, low energy consumption, and minimal pollutant emission. These demands can only be accomplished by employing interconnected intelligent electronic devices, capable of processing and sharing information about the car, the driver, the environment, and others sources of data. The implementation of such features will be critical for the manufacturer's success and puts a high pressure on the development process itself and the hardware- and software-tools used for every step in this process.

2A: Platforms and IP-Based Design

Moderators: G. Martin, Cadence, USA; R. Seepold, FZI, D

Component Selection and Matching for IP-Based Design [p. 40]

T. Zhang, L. Benini, and G. De Micheli

Intellectual Property (IP) reuse is one of the most promising techniques addressing the design complexity problem. IP reuse assumes that pre-designed components can be integrated into the design under development, thereby reducing design complexity and time. On the other hand, as the number of IP providers increases, the selection of the best IP block for a given design becomes more challenging and time-consuming. In this paper, we present an IP component matching system targeting automatic component searching and matching across the Internet. The system is based on Extensible Markup Language (XML) specification both for IP libraries (a repository of pre-designed IP components indexed by their corresponding specifications) and an IP user queries (specifications with incomplete/uncertain attributes). An IP query is parsed into a document object model (DOM) and the DOM is transformed to an internal tree-structured model. Fuzzy logic scoring and aggregation algorithms are applied to the internal tree structure to provide a set of candidate approximate matches ranked by proximity between the query and IP specification.

A Universal Communication Model for an Automotive System Integration Platform [p. 47]

T. Demmeler and P. Giusto

In this paper, we present a virtual integration platform based design methodology for distributed automotive systems. The platform, built within the 'Virtual Component Co-Design' tool (VCC), provides the ability of distributing a given system functionality over an architecture so as to validate different solutions in terms of cost, safety requirements, and real-time constraints. The virtual platform constitutes the foundation for design decisions early in the development phase, therefore enabling decisive and competitive advantages in the development process. This paper focuses on one of the key-enablers of the methodology, the Universal Communication Model (UCM). The UCM is defined at a level of abstraction that allows accurate estimates of the performance including the latencies over the bus network, and good simulation performance. In addition, due to the high level of reusability and parameterization of its components, it can be used as a framework for modeling the different communication protocols common in the automotive domain.

An Efficient Architecture Model for Systematic Design of Application-Specific Multiprocessor SoC [p. 55]

A. Baghdadi, D. Lyonnard, N. Zergainoh, and A. Jerraya

In this paper, we present a novel approach for the design of application specific multiprocessor systems-on-chip. Our approach is based on a generic architecture model which is used as a template throughout the design process. The key characteristics of this model are its great modularity, flexibility and scalability which make it reusable for a large class of applications. In addition, it allows to accelerate the design cycle. This paper focuses on the definition of the architecture model and the systematic design flow that can be automated. The feasibility and effectiveness of this approach are illustrated by two significant demonstration examples.

2B: Approaching Semantics of Design Languages

Moderators: N. Fristacky, Slovak TU, SLK; F. Rammig, C-LAB/Paderborn U, D

The Simulation Semantics of SystemC [p. 64]

J. Ruf, D. Hoffmann, J. Gerlach, T. Kropf, W. Rosenstiehl, and W. Mueller

We present a rigorous but transparent semantics definition of SystemC that covers method, thread, and clocked thread behavior as well as their interaction with the simulation kernel process. The semantics includes watching statements, signal assignment, and wait statements as they are introduced in SystemC V1.0. We present our definition in form of distributed Abstract State Machines (ASMs) rules reflecting the view given in the SystemC User's Manual and the reference implementation. We mainly see our formal semantics as a concise, unambiguous, high "level specification for SystemC" based implementations and for standardization. Additionally, it can be used as a sound basis to investigate SystemC interoperability with Verilog and VHDL.

MetaRTL: Raising the Abstraction Level of RTL Design [p. 71]

J. Zhu

The register transfer abstraction (RTL) has been established as the industrial standard for ASIC design, soft IP exchange and the backend interface for chip design at higher level. Unfortunately, the "synthesizable" VHDL/Verilog incarnation of the RTL abstraction has problems which prevent it from more productive use. For example, the confusion as the result of using simulation semantics for synthesis purpose, the lack of facility for component reuse at the "protocol" level, and the lack of memory abstraction. After a detailed discussion of these problems, this paper proposes a new RTL abstraction, called MetaRTL, which can be implemented by a modest extension to the traditional imperative programming languages. The productivity gain is further demonstrated by the description of a synthesis tool, called MetaSyn, which provides the "added-value". Experiments on the benchmark set show that MetaRTL is far more concise than the "synthesizable" HDL specification, and incurs no overhead for synthesis result.

A Model for Describing Communication between Aggregate Objects in the Specification and Design of Embedded Systems [p. 77]

K. Svarstad, G. Nicolescu, and A. Jerraya

The elevation of design description abstractions is a well accepted technique for handling the complexity and shortening the design time of modern embedded systems. It is shown that abstractions for communication are as important as for behaviour for specification and system level abstractions, and an extension on a novel higher level communication mechanism which has features for supporting the description of complex aggregate associations between objects in specifications such as UML is investigated. The communication primitives have been implemented as extensions to SystemC, and a comprehensive example from a UML specification through functional specification down to an executable SystemC decription is included.

2C: BIST and Diagnosis

Moderators: P. Harrod, ARM, UK; B. Becker, Freiburg U, D

Circuit Partitioning for Efficient Logic BIST Synthesis [p. 86]

A. Irion, G. Kiefer, H. Vranken, and H. Wunderlich

A divide-and-conquer approach using circuit partitioning is presented, which can be used to accelerate logic BIST synthesis procedures. Many BIST synthesis algorithms contain steps with a time complexity which increases more than linearly with the circuit size. By extracting sub-circuits which are almost constant in size, BIST synthesis for very large designs may be possible within linear time. The partitioning approach does not require any physical modifications of the circuit under test. Experiments show that significant performance improvements can be obtained at the cost of a longer test application time or a slight increase in silicon area for the BIST hardware.
Keywords: circuit partitioning, deterministic BIST, divide-and-conquer

Deterministic Software -Based Self-Testing of Embedded Processor Cores [p. 92]

A. Paschalis, D. Gizopoulos, N. Kranitis, M. Psarakis, and Y. Zorian

A deterministic software-based self-testing methodology for processor cores is introduced that efficiently tests the processor datapath modules without any modification of the processor structure. It provides a guaranteed high fault coverage without repetitive fault simulation experiments which is necessary in pseudorandom software-based processor self-testing approaches. Test generation and output analysis are performed by utilizing the processor functional modules like accumulators (arithmetic part of ALU) and shifters (if they exist) through processor instructions. No extra hardware is required and there is no performance degradation.

Memory Fault Diagnosis by Syndrome Compression [p. 97]

J. Li and C. Wu

In this paper we present a data compression technique that can be used to speed up the transmission of diagnosis data from the embedded RAM with built-in self-diagnosis (BISD) support. The proposed approach compresses the faulty-cell address and March syndrome to about 28% of the original size under the March-17N diagnostic test algorithm. The key component of the compressor is a novel syndrome-accumulation circuit, which can be realized by a content-addressable memory. Experimental results show that the area overhead is about 0.9% for a 1Mb SRAM with 164 faults. The proposed compression technique reduces the time for diagnostic test, as well as the tester storage capacity requirement.

Diagnosis for Scan-Based BIST: Reaching Deep into the Signatures [p. 102]

I. Bayraktaroglu and A. Orailoglu

For partitioning-based diagnosis in a scan-based BIST environment, an exact analysis scheme, capable of identifying all scan cells that receive incorrect data, is proposed. In contrast to previously suggested approaches, the scheme we propose identifies all failing scan cells with no ambiguity whatsoever. Not only do we resolve failing scan cells unambiguously, but we do so at the earliest possible instance through reexamination of already computed signatures. Intensive utilization of this highly precise diagnostic state information leads to prognostic information regarding the usefulness of running upcoming tests which in turn leads to reductions in diagnosis time in excess of 30% compared to previous approaches.

2E: Hot Topic : EUCAR Session

Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: S. Reiniger, DaimlerChrysler, D

Vehicle Electric/Electronic Architecture -- One of the Most Important Challenges for OEM's [p. 112]

G. Hettich and T. Thurner

One of the most important challenge of a vehicle manufacturer is the management of the increasing number of networked E/E-Systems and their complex functional dependencies. To master this challenge, sophisticated E/E-architecture approaches will be presented which cover both, the vertical functional orientation, as well as the horizontal integration aspects of a vehicle manufacturer. Therefore we will present architectures and methods to support the development of future E/E-Systems, whereby the typical requirements of a vehicle system integrator will be considered, such as composability, hardware and software independence, network-wide distribution of software components, and the ability for separation between indication, operation and behavior. The paper describes the motivation, the system integration requirements, actual existing solutions, future technical challenges, and some detailed architecture approaches itself. Furthermore the impacts of the architecture on the development process and the OEM-supplier relationship will be highlighted.

AIL: description of a global electronic architecture at the vehicle scale

Arjun Panday, Damien Couderc, Simon Marichalar

This paper introduces the Architecture Implementation Language; a description language that allows for an internal representation of the architecture and acts as a connection with tools to simplify the construction, planning, verification, capitalisation, and documentation of an architecture. The objective of AIL is to describe a vehicle architecture from the level of the desired services down to the level of physical implementation, rendered concrete in one or more resulting operational architectures. The proposed methodology introduces the concepts of high level component based architectures to the highly constrained automotive world.

Methods and Tools for Systems Engineering of Automotive Electronic Architectures

Jakob Axelsson

The latest generations of road vehicles have seen a tremendous development in on-board electronic systems, which control increasingly large parts of the functionality. In this paper, we discuss how the vehicle manufacturers need to adjust their methods and tools to handle the increasing complexity. The key issue is the system integration aspect, which calls for increasing systems engineering capabilities.

3A: SAT Based Verification Techniques

Moderators: W. Damm, Oldenburg U/OFFIS, D; C. Delgado Kloos, U Carlos III de Madrid, ES

Using SAT for Combinational Equivalence Checking [p. 114]

E. Goldberg, M. Prasad, and R. Brayton

This paper addresses the problem of combinational equivalence checking (CEC) which forms one of the key components of the current verification methodology for digital systems. A number of recently proposed BDD based approaches have met with considerable success in this area. However, the growing gap between the capability of current solvers and the complexity of verification instances necessitates the exploration of alternative, better solutions. This paper revisits the application of Satisfiability (SAT) algorithms to the combinational equivalence checking (CEC) problem. We argue that SAT is a more robust and flexible engine of Boolean reasoning for the CEC application than BDDs, which have traditionally been the method of choice. Preliminary results on a simple framework for SAT based CEC show a speedup of up to two orders of magnitude compared to state-of-the-art SAT based methods for CEC and also demonstrate that even with this simple algorithm and untuned prototype implementation it is only moderately slower and sometimes faster than a state-of-the-art BDD based mixed engine commercial CEC tool. While SAT based CEC methods need further research and tuning before they can surpass almost a decade of research in BDD based CEC, the recent progress is very promising and merits continued research.

Combinational Equivalence Checking Using Boolean Satisfiability and Binary Decision Diagrams [p. 122]

S. Reda and A. Salem

Most recent combinational equivalence checking techniques are based on exploiting circuit similarity. In this paper, we focus on circuits with no internal equivalent nodes or after internal equivalent nodes have been identified and merged. We present a new technique integrating Boolean Satisfiability and Binary Decision Diagrams. The proposed approach is capable of solving verification instances that neither of both techniques was capable to solve. The efficiency of the proposed approach is shown through its application on hard to prove industrial circuits and the ISCAS'85 benchmark circuits.

An Efficient Learning Procedure for Multiple Implication Checks [p. 127]

Y. Novikov and E. Goldberg

In the paper, we consider the problem of checking whether cubes from a set S are implicants of a DNF formula D, at the same time minimizing the overall time taken by the checks. An obvious but inefficient way of solving the problem is to perform all the checks independently. In the paper, we consider a different approach. The key idea is that when checking whether a cube C from S is an implicant of D we can deduce (learn) implicants of D that are not implicants of C. These cubes can be used in the following checks for search pruning. Experiments on random DNF formulas, DIMACS benchmarks and DNF formulas describing circuits show that the proposed learning procedure reduces the overall time taken by checks by up to two orders of magnitude.

3B: Panel Session : C/C ++ : Progress or Deadlock in SLD Specification?

Organizers: D. Gajski, UC Irvine, USA; E. Villar, Cantabria U, ES
Moderator: E. Villar, Cantabria U, ES
Panellists: W. Rosenstiel, FZI/Tuebingen U, D; V. Gerousis, Infineon, D; D. Barton, Averstar, USA; J. Plantin, Ericsson, SE; P. Cavalloro, Italtel, IT; D. Gajski, UC Irvine, USA; G. de Jong, Telelogic, B

C/C ++ : Progress or Deadlock in System-Level Specification [p. 136]

The lack of a general methodology and notation has been identified as one of the main obstacles bedeviling system-on-chip designers. Nevertheless, there is a lot of confusion about what SLD (System Level Design) means and which SLDL (System Level Design Language) is the most appropriate. With SOC demands there has been recently high interest in system level design, particularly, HW/SW co-design. In order to accommodate SW, the system companies as well as EDA vendors would like to use C as the language for System level Design. Many people are trying with subset of C and others with C++ by introducing classes that correspond to HW (VHDL/Verilog) concepts. C/C++ syntax has become the most popular for defining new C/C++ language extensions for system-level specification and design. A wide community of system designers and EDA suppliers believe that C/C++ is the most appropriate vehicle to use as a next-generation language. However, there are many challenges and open problems.

3C: Advances in SoC Testing

Moderators: P. Muhmenthaler, Infineon Technologies, D; E.J. Marinissen, Philips Research, NL

An Integrated System-On-Chip Test Framework [p. 138]

E. Larsson and Z. Peng

In this paper we propose a framework for the testing of system-on-chip (SOC), which includes a set of design algorithms to deal with test scheduling, test access mechanism design, test sets selection, test parallelization, and test resource placement. The approach minimizes the test application time and the cost of the test access mechanism while considering constraints on tests, power consumption and test resources. The main feature of our approach is that it provides an integrated design environment to treat several different tasks at the same time, which were traditionally dealt with as separate problems. Experimental results shows the efficiency and the usefulness of the proposed technique.

Efficient Test Data Compression and Decompression for System-on-a-Chip Using Internal Scan Chains and Golomb Coding [p. 145]

A. Chandra and K. Chakrabarty

We present a data compression method and decompression architecture for testing embedded cores in a system-on-a-chip (SOC). The proposed approach makes effective use of Golomb coding and the internal scan chains of the core under test, and provides significantly better results than a recent compression method that uses Golomb coding and a separate cyclical scan register (CSR). The use of the internal scan chain for decompression obviates the need for a CSR. In addition, the novel interleaving decompression architecture allows multiple cores in an SOC to be tested concurrently using a single ATE I/O channel. We demonstrate the effectiveness of the proposed approach by applying it to the ISCAS 89 benchmark circuits.

Testing TAPed Cores and Wrapped Cores with the Same Test Access Mechanism [p. 150]

M. Benabdenbi, W. Maroufi, and M. Marzouki

This paper describes a way of testing both wrapped cores and TAPed cores within a System On a Chip (SoC) with the same Test Access Mechanism (TAM). The TAM's architecture, which is dynamically reconfigurable, scalable and flexible, is named CAS-BUS and have a central controller. All the cores can be tested this way in the same session through a modified Boundary Scan Test Access Port.

On Applying the Set Covering Model to Reseeding [p. 156]

S. Chiusano, S. Di Carlo, P. Prinetto, and H. Wunderlich

The Functional BIST approach is a rather new BIST technique based on exploiting embedded system functionality to generate deterministic test patterns during BIST. The approach takes advantages of two well-known testing techniques, the arithmetic BIST approach and the reseeding method. The main contribution of the present paper consists in formulating the problem of an optimal reseeding computation as an instance of the set covering problem. The proposed approach guarantees high flexibility, is applicable to different functional modules, and, in general, provides a more efficient test set encoding then previous techniques. In addition, the approach shorts the computation time and allows to better exploiting the trade-off between area overhead and global test length as well as to deal with larger circuits.

3E: Panel Session: Data Management: Limiter or Accelerator for Electronic Design Creativity?

Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: H. Heidbrink, Descon GmbH, D
Panellists: B. Potock, Mentor Graphics Corp, USA; J. Mueller, Rosemann &Lauridsen GmbH, D; U. Ahle, Siemens Business Services, D; C. Basille, Aerospatiale Matra Missiles, F; W. Kisselmann, Infineon Technologies, D; W. Herden, Robert Bosch GmbH, D

Data Management -- Limiter or Accelerator for Electronic Design Creativity [p. 162]

Data Management is the key to introduce concurrent engineering, configuration management and work in progress control throughout the entire design process. That has been recognized by MCAD and ERP/MRP Software vendors years ago. Product Data Management (PDM) solutions are used and accepted for mechanical designs but not in electronic design departments. The EDA industry has not been focusing on strategies to fill the gap between business processes and design activities. Therefore today proprietary processes on a directory file level mostly manage variant handling and configuration management. Standard database management solutions or Product Data Management applications could not reach major market shares up to now.

4A: Analysis of Communication Systems

Moderators: H. Gräb, TU Munich, D; J. Eckmüller, Infineon Technologies, D

Efficient Bit-Error-Rate Estimation of Multicarrier Transceivers [p. 164]

G. Vandersteen, P. Wambacq, Y. Rolain, J. Schoukens, S. Donnay, M. Engels, I. Bolsens

Multicarrier modulation schemes are widely used in several digital telecommunication systems, such as Asymmetric Digital Subscriber Lines (ADSL) and Wireless Local Area Network (WLAN) based on Orthogonal Frequency Domain Multiplexing (OFDM). An estimate of the Bit-Error-Rate (BER) degradation due to non-idealities in the transceiver (e.g. nonlinear distortions in the analog front-ends, digital clipping,...) is much more complicated in a multicarrier system than in a single-carrier system due to the large number of carriers and the huge number of possible transmitted symbols. This paper proposes a method for estimating the BER of such OFDM modulation schemes in a CPU time that is two orders of magnitude smaller than a Monte-Carlo method, as confirmed by simulations on a 5 GHz IEEE 802.11 WLAN receiver front-end.

Efficient Time -Domain Simulation of Telecom Frontends Using a Complex Damped Exponential Signal Model [p. 169]

P. Vanassche, G. Gielen, and W. Sansen

This paper presents an efficient time-domain simulation approach for telecommunication frontends at architectural level. It is based upon the use of complex damped exponential modeling functions. These allow to construct accurate signal models for digitally modulated telecom signals, requiring only few modeling functions. Since these models are valid over a long range of time, they allow for a large timestep, which greatly speeds up time-domain simulation of the telecom frontends. Details of a simulation approach based upon this signal model are discussed. The approach is verified by experimental results.

Simulation Method to Extract Characteristics for Digital Wireless Communication Systems [p. 176]

L. Nguyen and V. Janicot

In all wireless standards involving digital modulation, new fundamental characteristics have to be extracted for quantifying the linearity/distortion in RF designs. This paper describes a simulation technique, Modulated Steady State, and its use to extract these specifications. An example of its application to a typical RF transmitter with a p/4-DQPSK modulator is presented.

4B: Design of Low Power Systems I

Moderators: G. Stamoulis, Intel, USA; K. Roy, Purdue U, USA

Microprocessor Power Analysis by Labeled Simulation [p. 182]

C. Hsieh, L. Chen, and M. Pedram

In many applications, it is important to know how power is consumed while software is being executed on the target processor. Instruction-level power microanalysis, which is a cycle-accurate simulation technique based on instruction label generation and propagation, is aimed at answering this question for a superscalar and pipelined processor. This technique requires the micro-architectural details of the CPU and provides the power consumption of every module (or gate) for each active instruction in each cycle. To validate this approach, a Zilog digital signal processor core was designed by using a 0.25 u TSMC cell library, and the power consumption per instruction was collected using a Verilog simulator specially written for the DSP core.

Power Aware Microarchitecture Resource Scaling [p. 190]

A. Iyer and D. Marculescu

In this paper we present a strategy for run-time profiling to optimize the configuration of a microprocessor dynamically so as to save power with minimum performance penalty. The configuration of the processor changes according to the parallelism in the running program. Experiments on some benchmark programs show good savings in total energy consumption; we have observed a decrease of up to 23% in energy/cycle and up to 8% in energy per instruction. Our proposed approach can be used for energy-aware computing in either portable applications or in desktop environments where power density is becoming a concern. This approach can also be incorporated in larger power management strategies like ACPI.

Extending Lifetime of Portable Systems by Battery Scheduling [p. 197]

L. Benini, G. Castelli, A. Macii, E. Macii, M. Poncino, and R. Scarsi

Multi-battery power supplies are becoming popular in electronic appliances of the latest generations, due to economical and manufacturing constraints. Unfortunately, a partitioned battery subsystem is not able to deliver the same amount of charge as a monolithic battery with the same total capacity. In this paper, we define the concept of battery scheduling, we investigate policies for solving the problem of optimal charge delivery, and we study the relationship of such policies with different configurations of the battery subsystem. Results, obtained for different workloads, demonstrate that the choice of the proper scheduling can make, in the best case, system lifetime as close as 1% of that guaranteed by a monolithic battery of equal capacity.

4C: Test Generation and Evaluation

Moderators: R. Galivanche, Intel, USA; B. Straube, FhG IIS/EAS Dresden, D

Efficient Spectral Techniques for Sequential ATPG [p. 204]

A. Giani, S. Sheng, M. Hsiao, and V. Agrawal

We present a new test generation procedure for sequential circuits using spectral techniques. Iterative processes of filtering via compaction and spectral analysis of the filtered test set are performed for each primary input, extracting inherent spectral information embedded within the test sequence. This information, when viewed in the frequency domain, reveals the characteristics of the input spectrum. The filtered and analyzed set of vectors is then used to predict and generate future vectors. We also developed a fault-dropping technique to speed up the process. We show that very high fault coverages and small vector sets are consistently obtained in short execution times for sequential benchmark circuits.

On the Test of Microprocessor IP Cores [p. 209]

F. Corno, M. Sonza Reorda, S. Squillero, and M. Violante

Testing is a crucial issue in SOC development and production process. A popular solution for SOCs that include microprocessor cores is based on making them execute a test program. Thus, implementing a very attracting BIST solution. This paper describes a method for the generation of effective programs for the self-test of a processor. The method can be partially automated, and combines ideas from traditional functional approaches and from the ATPG field. We assess the feasibility and effectiveness of the method by applying it to a 8051 core.

Sequence Reordering to Improve the Levels of Compaction Achievable by Static Compaction Procedures [p. 214]

I. Pomeranz and S. Reddy

We describe a reordering procedure that changes the order of test vectors in a test sequence for a synchronous sequential circuit without reducing the fault coverage. We use this procedure to investigate the effects of reordering on the ability to compact the test sequence. Reordering is shown to have two effects on compaction. (1) The reordering process itself allows us to reduce the test sequence length. (2) Reordering can improve the effectiveness of an existing static compaction procedure. Reordering also provides an insight into the detection by test generation procedures of faults that are detected by relatively long subsequences.

SEU Effect Analysis in an Open-Source Router via a Distributed Fault Injection Environment [p. 219]

A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto

The paper presents a detailed error analysis and classification of the behavior of an open-source router, when affected by Single Event Upsets (SEUs). The experimental results have been gathered on a real communication network, resorting to an ad-hoc Fault Injection system. The injector has been designed to corrupt the router during its normal service and to analyze the SEU injection effects on the overall distributed system. The performed experiments allowed the authors to identify the most critical memory regions and to cluster the router variables according to their impact on system dependability.

4E: Panel Session: The Programmable Platform: Does One Size Fit All?

Organizer: A. Lock, Synopsys, USA
Moderator: R. Camposano, Synopsys, USA
Panellists: R. Camposano, Synopsys, USA; A. Cuomo, STMicrolectronics, IT; R. Subramanian, MorphICs., USA; H. Meyr, TU Aachen, D

The Programmable Platform: Does One Size Fit All? [p. 226]

This special panel session brings together several leading technologists representing organisations within the telecom and system-on-chip design communities. The panel will discuss the trend in platform-based design, where new products are increasingly based on re-programmability or re-configuration of more general-purpose devices. Particular emphasis will be placed on the need to meet the requirements of the Telecom market, where flexibility is a key concern, but with the shift towards third-generation wireless systems, so too is performance.

4F: Planning Support

Moderators: F. Johannes, TU Munich, D; R. Otten, TU Delft, NL

Slicing Tree is a Complete Floorplan Representation [p. 228]

M. Lai and D. Wong

Slicing tree has been an effective tool for VLSI floorplan design. Floorplanners using slicing tree representation take full advantage of shape and orientation flexibility of circuit modules to find highly compact slicing floorplans. However, slicing floorplans are commonly believed to suffer from poor utilization of space when all modules are hard. For this reason, a large body of literature has recently been devoted to various new representations of non-slicing floorplans to improve space utilization. In this paper, we prove that by using slicing tree representation and compaction, all maximally compact placements of modules can be generated. In conclusion, slicing tree is a complete floorplan representation for all non-slicing floorplans as well.

Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques [p. 233]

C. Cheung, Y. Wu, and D. Cheng

Efficient circuit partitioning is gaining more importance with the increasing size of modern circuits. Conventionally, circuit partitioning is solved by modeling a circuit as a hypergraph for the ease of applying graph algorithms. However, there exist rooms for further improvement on even optimum hypergraph partitioning results, if logic information can be applied for perturbation. In this paper, we present a multi-way partitioning framework which can couple any excellent hypergraph partitioner and a noval logic perturbation based (GBAW) technique for further improvement over very excellent partitioning results. Our approach can integrate with any graph partitioner. We performed experiments on 2-, 3-, 4-, and 5-way partitionings for various circuits of different sizes from MCNC benchmarks. We have chosen the state-of-the-art hMetis-Kway to obtain high quality initial solutions for the experiments. Our experiments showed that this partitioning approach can achieve a further 15% reduction in cut size for 2-way partitioning with an area penalty of only 0.33%. The good results demonstrated the effectiveness of this new partitioning technique.

Clustering Based Fast Clock Scheduling for Light Clock-Tree [p. 240]

M. Saitoh, M. Azuma, and A. Takahashi

We introduce a clock schedule algorithm to obtain a clock schedule that achieves a shorter clock period and that can be realized by a light clock tree. A shorter clock period can be achieved by controlling the clock input timing of each register, but the required wire length and power consumption of a clock tree tends to be large if clock input timings are determined without considering the locations of registers. To overcome the drawback, our algorithm constructs a cluster that consists of registers with the same clock input timing located in a close area. In our algorithm, first registers are partitioned into clusters by their locations, and clusters are modified to improve the clock period while maintaining the radius of each cluster small. In our experiments for an industrial data of 888 registers, the clock period achieved is 27% shorter than that achieved by a zero-skew clock tree, and 1% longer than the theoretical minimum. The computational time is about 24.9 seconds and the wire length and power consumption of the clock tree is comparable to these of a zero skew tree.

5A: Low-Power Channel Decoding and VLIW Architectures

Moderators: N. Wehn, Kaiserslautern U, D; M. Bolle, Systemonic, D

Power-Efficient Layered Turbo Decoder Processor [p. 246]

J. Dielissen, J. van Meerbergen, M. Bekooij, F. Harmsze, S. Sawitzki, J. Huisken, and A. van der Werf

Turbo decoding offers outstanding error correcting capabilities, that will be used in wireless applications like the Universal Mobile Telecom Standard[4] (UMTS). However, the algorithm is very computational intensive, and therefore an implementation on a general purpose programmable DSP results in a power consumption which reduces the applicability of turbo decoding in hand-held applications. In this paper we present a solution based on a layered processing architecture. This architecture includes an application specific Very Long Instruction Word (VLIW) processor, a data flow processor, and hardwired execution units in a hierarchical way. The power consumption of this solution is an order of magnitude better than the implementation on a current state of the art, power efficient general purpose DSP.

Exploiting Data Forwarding to Reduce the Power Budget of VLIW Embedded Processors [p. 252]

M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, and R. Zafalon

In this paper, a low-power approach to the design of embedded VLIW processor architectures is proposed. To solve the most part of data hazards in the pipeline, processors use forwarding (or bypassing) hardware to provide the required operands from the inter-stage pipeline registers directly to the inputs of the function units. The operands are then stored in the Register File during the write-back pipeline stage. In this paper, we propose a power optimization technique based on the exploitation of the forwarding paths in the processor to avoid the power cost of writing/reading short-lived variables to/from the Register File. In application-specific embedded systems, experimental evidence has shown that a significant number of variables are short-lived, that is their liveness (from first definition to last use) spans only few instructions. Values of short-lived variables can be accessed directly through the forwarding registers, avoiding write-back. An application example of our solution to a VLIW embedded core, when accessing the Register File, has shown a power saving up to 35% with respect to the unoptimized approach on the given set of target benchmarks. The performance overhead is equal to one-gate delay to be added on the processor critical-path.
Keywords: Low-Power, Pipeline Processors, VLIW Embedded Architectures, Forwarding.

Design of Low-Power High-Speed Maximum a Priori Decoder Architectures [p. 258]

A. Worm, H. Lamm, and N. Wehn

Future applications demand high-speed maximum a posteriori (MAP) decoders. In this paper, we present an in-depth study of design alternatives for high-speed MAP architectures with special emphasis on low power consumption. We exploit the inherent parallelism of the MAP algorithm to reduce power consumption on various abstraction levels. A fully parameterizable architecture is introduced, which allows to optimally adapt the architecture to the application requirements and the throughput. Intensive design space exploration has been carried out on a state-of-the-art 0.2 um technology, including efficient parallelism techniques, a data flow transformation for reduced power consumption, and an optimized FIFO implementation.

5B: Design of Low-Power Systems II

Moderators: E. Macii, Politecnico di Torino, IT; D. Marculescu, Carnegie Mellon U, USA

Low Complexity FIR Filters Using Factorization of Perturbed Coefficients [p. 268]

C. Neau, K. Muhammad, and K. Roy

This paper presents a factorization based technique to reduce the computational complexity of implementing Finite Impulse Response (FIR) digital filters. It is possible to design FIR filters in which all of the filter coefficients are products of the first seven prime numbers. For such filters, factorization of the filter coefficients allows the reuse of intermediate results among computations involving common factors. Since the coefficients are products of only small prime numbers, it is also possible to generate each of the partial products with a single shift and add operation. Compared to a traditional implementation, this results in a 35-50% reduction in computational complexity, which is shown to translate into lower power consumption.

An Adaptive Algorithm for Low-Power Streaming Multimedia Processing [p. 273]

A. Acquaviva, L. Benini, and B. Riccó

This paper addresses the problem of power consumption in multimedia system architectures and presents an algorithmic optimization technique to achieve the goal of power reduction in the context of real time processing. The technique is based on a mixed speed-setting and shutdown policy. We address the problem from both a theoretical and practical point of view, by presenting a power efficient implementation of a MPEG-layer3 real-time decoder algorithm designed for wearable devices as a case study. The target system is the Hewlett-Packard's SmartBadgeIII prototype of wearable system based on the StrongARM1100 processor. Theoretical analysis as well as quantitative results of power measurements are provided to show the effectiveness of this technique. The experimental set-up is also described.

A Static Power Estimation Methodology for IP-Based Design [p. 280]

X. Liu and C. Papaefthymiou

This paper proposes a novel system-level power estimation methodology for electronic designs consisting of intellectual property (IP) components. Our methodology relies on analytical output and power macromodels of the IP blocks to estimate system dissipation without performing any simulation. We derive upper bounds on the estimation error of our methodology and demonstrate the relation of this error to the sensitivities of the macromodeling functions. For circuits without feedback, we give a sufficient condition for the worst-case power estimation error to increase only linearly with the length of the IP cascades. We also give a tighter sufficient condition that ensures error boundedness in IP systems of any topology. Experiments with signal processing and data encryption systems validate the accuracy and efficiency of our approach. For designs of up to 576 IP blocks, power estimates are obtained within 0.2 seconds. In comparison with switch-level simulation results, the average error of our power estimates is 7.3%.

5C: On-Line Testing Techniques

Moderators: C. Metra, DEIS-Bologna U, IT; R. Leveugle, TIMA, Grenoble, F

Optimization of Error Detecting Codes for the Detection of Crosstalk Originated Errors [p. 290]

M. Favalli and C. Metra

This work applies weight based codes [1] to the detection of crosstalk originated errors. This kind of faults, whose importance grows with device scaling, may originate errors that are undetectable by the mostly used error detecting codes in VLSI ICs. Conversely, such errors can be easily detected by weight based codes that, however, have smaller encoding capabilities. In order to reduce the cost of these codes, a graph theoretic optimization is used. Moreover, new applications of these codes are explored regarding the synthesis of self-checking FSMs, and the detection of errors related to the clock distribution network.

System Safety through Automatic High-Level Code Transformations: An Experimental Evaluation [p. 297]

P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, and M. Violante

This paper deals with a software modification strategy allowing the on-line detection of transient errors. Being based on a set of rules for introducing redundancy in the high-level code, the method can be completely automated, and is particularly suited for low-cost safety-critical microprocessor-based applications. Experimental results from software and hardware fault injection campaigns are presented and discussed, demonstrating the effectiveness of the approach in terms of fault detection capabilities.

From DFT to Systems Test -- A Model Based Cost Optimization Tool [p. 302]

M. Wahl, T. Ambler, C. Maaß and M. Rahman

Long lasting systems like airplanes have a cost structure where the maintenance costs are larger than the purchasing costs. Testing is required, both for preventive maintenance as well as repair and a majpor source for cost. Previously we have analysed test and Design for Testability for digital systems, covering ASICs, boards and systems. Besides, the continuous development of technology requires cost models that can grow dynamically and, because we will never have all information, can work with incomplete data sets. In this paper we present a tool that is well suited for a wide range of applications. Previously developed cost models can be incorporated and new elements can be added to the model as needed. Due to the generic approach the tool allows modelling general systems. It is not bound to the digital domain, although it has a strong background there.

Efficient On-Line Testing Method for a Floating-Point Adder [p. 307]

A. Drozd and M. Lobachev

In this paper we present a residue method for on-line testing of the floating-point adder. This circuit contains arithmetic shifter which executes an abridged operation. In the method the problem of the abridged operation checking with the reduced hardware amount is solved.

5E: Design Methodology for PicoRadio Networks

Organizer: J. Rabaey, UC Berkeley, USA
Moderator: M. Engels, IMEC, B

Design Methodology for PicoRadio Networks [p. 314]

J. da Silva Jr., J. Shamberger, M. Ammer, C. Guo, S. Li, R. Shah, T. Tuan, M. Sheets, J. Rabaey, B. Nikolic, A. Sangiovanni-Vincentelli, and P. Wright

One of the most compelling challenges of the next decade is the "last-meter" problem, extending the expanding data network into end-user data-collection and monitoring devices. PicoRadio supports the assembly of an ad hoc wireless network of self-contained mesoscale, low-cost, low-energy sensor and monitor nodes. While technology advances have made it conceivable to deploy wireless networks of heterogeneous nodes, the design of a low-power, low-cost, adaptive node in a reduced time to market is still a challenge. We present a design methodology for PicoRadio Networks, from system conception and optimization to silicon platform implementation. For each phase of the design, we demonstrate the applicability of our methodology through promising experimental results.

5F: EMC on Chip and High Density Package Level

Moderators: W. John, Fraunhofer Institute Berlin/Paderborn, D; F. Sabath, Armed Forces Institute for Protection Technologies, USA

High-Level Simulation of Substrate Noise Generation from Large Digital Circuits with Multiple Supplies [p. 326]

M. Badaroglu, M. van Heijningen, V. Gravot, S. Donnay, H. De Man, G. Gielen M. Engels, and I. Bolsens

Substrate noise generated by large digital circuits degrades the performance of analog circuits sharing the same substrate. Existing approaches usually extract the model of the substrate from the layout information and then simulate the extracted transistor-level netlist with this substrate model using a transistor-level simulator. For large digital circuits, the substrate simulation is however not feasible with a transistor-level simulator. In our previous work, it has been demonstrated that efficient and accurate simulation of substrate noise generation at gate-level is feasible. In this paper several important extensions to our previous work are introduced: modeling of IO cells, modeling of input transition time and load dependency and the extraction methodology of an equivalent substrate model within multiple supply domains. Experimental results show an improved accuracy (6.3% error on RMS substrate voltage with respect to a full SPICE level simulation) with these extensions, while maintaining a large speedup with respect to SPICE simulations.

Crosstalk Noise in Future Digital CMOS Circuits [p. 331]

C. Werner, R. Göttsche, A. Wörner, and U. Ramacher

This paper presents simulation results for crosstalk noise in future CMOS generations down to 35 nm features. The noise voltage is calculated from circuit simulations with lumped RLC networks and static CMOS cells. A static noise margin is derived from inverter characteristics of NAND and NOR gates and a critical wire length is calculated from considering statistical variations in the chip manufacturing process. The model agrees well with measurements on a quarter micron testchip and predicts a drastic drop of critical wirelengths to 50-60 um after the 100 nm technology generation.

Modeling Electromagnetic Emission of Integrated Circuits for System Analysis [p. 336]

P. Kralicek, W. John, and H. Garbe

In this contribution a new methodology for modeling electromagnetic emission of integrated circuits in system analysis is shown. By using a physical model based on a multipole expansion, the emitted fields can be well approximated in the space outside a component. This allows a convenient representation with a low number of model parameters which can be determined by measurement or simulation. To show the applicability, the developed models are used in a system level printed circuit board simulator. The results are compared with reference calculations.

Analysis of EME Produced by a Microcontroller Operation [p. 341]

F. Fiori and F. Musolino

This paper deals with the characterization of integrated circuits electromagnetic emissions. The TEM cell method is employed in order to identify primary emissions sources of complex digital devices. An 8-bit microcontroller, realized by a 0.8 um HCMOS process is considered. It is composed of several building blocks like the central processing unit, the analog to digital converter and the EPROM memory. Emission measurements are performed by operating a specific program code stored in the microcontroller memory and emissions due to each building block are identified.

6A: Design Methods for Analog and Mixed Signal Circuits

Moderators: A. Kaiser, IEMN-ISEN, F; P. Wambacq, IMEC, B

Top-Down Design of a xDSL 14-bit 4MS/s Sigma-Delta Modulator in Digital CMOS Technology [p. 348]

R. del Río, J. de la Rosa, F. Medeiro, B. Pérez-Verdú, and A. Rodríguez-Vázquez

This paper describes the design of a Sigma-Delta modulator aimed for A/D conversion in xDSL applications, featuring 14-bit@4Msample/s in a 0.35mm mainstream digital CMOS technology. Architecture selection, modulator sizing and cell sizing tasks where supported by a CAD methodology, thus allowing us to obtain a power efficient implementation in a short design cycle.

Analog Design for Reuse -- Case Study: Very Low-Voltage Sigma-Delta Modulator [p. 353]

M. Dessouky, A. Kaiser, M. Louërat, and A. Greiner

This paper presents the complete design methodology of a very low-voltage DS third-order modulator from high-level specifications down to layout. Behavioral models taking into account cell non-idealities are developed and used to map performance specifications to lower levels. Emphasis has been made on eventual design reuse through design plans and layout templates in a layout-oriented circuit design approach. The modulator has been designed for two different technologies demonstrating the suitability of the methodology for very high performance mixed-signal circuits. Moreover, the same design knowledge has been successfully reused in another fourth-order modulator.

A Design Strategy for Low-Voltage Low-Power Continuous-Time Sigma-Delta A/D Converters [p. 361]

F. Gerfers and Y. Manoli

This paper presents a design strategy for low-voltage low-power Sigma-Delta analog-to-digital (A/D) converter using a continuous-time (CT) lowpass loopfilter. An improved method is used to find the optimal Sigma-Delta modulator implementation with respect to a minimal power consumption on the one hand and to fulfill a rapid prototyping approach on the other hand. The influence of the low supply voltage as well as circuit nonidealities on the overall Sigma-Delta modulator determined and verified by behavioral simulations. Transistor-level simulation results of a 1:5 V CT Sigma-Delta A/D converter show a 75 dB dynamic range in a bandwidth of 25kHz.

6B: Issues in Synthesis and Power Optimization

Moderators: R. Murgai, Fujitsu Labs of America, USA; S. Minato, NTT, JP

Minimizing Stand-By Leakage Power in Static CMOS Circuits [p. 370]

S. Naidu and E. Jacobs

In this paper we concern ourselves with the problem of minimizing leakage power in CMOS circuits consisting of AOI (and-or-invert) gates as they operate in stand-by mode or an idle mode waiting for other circuits to complete their operation. It is known that leakage power due to subthreshold leakage current in transistors in the OFF state is dependent on the input vector applied. Therefore, we try to compute an input vector that can be applied to the circuit in stand-by mode so that the power loss due to sub-threshold leakage current is the minimum possible. We employ a integer linear programming (ILP) approach to solve the problem of minimizing leakage by first obtaining a good lower bound (estimate) on the minimum leakage power and then rounding the solution to actually obtain an input vector that causes low leakage. The chief advantage of this technique as opposed to others in the literature is that it invariably provides us with a good idea about the quality of the input vector found.

In-Place Delay Constrained Power Optmization Using Functional Symmetries [p. 377]

C. Chang, B. Hu, and M. Marek-Sadowska

In-Place Optimization (IPO) has become the backend methodology of choice to resolve the gap between logic synthesis and physical design as the optimization can be guided by accurate physical information. To perform optimization without perturbing too much the placed netlist, only buffer insertion and gate sizing are commonly used in current design tools. In this paper, we address the problem of delay-constrained power optimization by introducing another degree of freedom: functional symmetry based rewiring. Theoretical results on the effect of using functional symmetry on transition density for power estimation is also derived. Experimental results show that, under the same delay constraint, our technique achieves much better power reduction as compared to the discrete gate sizing only technique.

High-Quality Sub-Function Construction in Functional Decomposition Based on Information Relationship Measures [p. 383]

L. Józwiak and A. Chojnacki

Functional decomposition seems to be the most effective circuit synthesis approach for look-up table (LUT) FPGAs, (C)PLDs and complex gates. In the functional decomposition that targets LUT FPGAs, the circuit is constructed by recursively decomposing a given function and its sub-functions until each of the resulting sub-functions can be directly implemented with a LUT. The choice of sub-functions constructed in this process decides the quality of the resulting multi-level circuit expressed in terms of the logic block count and speed. In this paper, we propose a new effective and efficient method for the sub-function construction, and we consider its application in our circuit synthesis tool that targets LUT-based FPGAs. The method is based on the information relationship measures. The experimental results demonstrate that the proposed approach leads to extremely fast and very small circuits.

Generalized Reasoning Scheme for Redundancy Addition and Removal Logic Optimization [p. 391]

J. Espejo, L. Entrena, E. San Millán, and E. Olías

In this work a generalization of the structural Redundancy Addition and Removal (RAR) logic optimization method is presented. New concepts based on the functional description of the nodes in the network are introduced to support this generalization. Necessary and sufficient conditions to identify all the possible structural expansions are given for the general case of multiple variable expansion. Basic nodes are no longer restricted to simple gates and can be any function of any size. With this generalization, an incremental mechanism to perform structural transformations involving any number of variables can be applied in a very efficient manner. Experimental results are presented that illustrate the efficiency of our scheme.

6C: High Level Validation

Moderators: J. Teixeira, IST/INESC, PT; M. Sonza Reorda, Politecnico di Torino, IT

LPSAT: A Unified Approach to RTL Satisfiability [p. 398]

Z. Zeng, P. Kalla, and M. Ciesielski

LPSAT is an LP-based comprehensive infrastructure designed to solve the satisfiability (SAT) problem for complex RTL designs containing both word-level arithmetic operators and bit-level Boolean logic. The presented technique uses a mixed integer linear program to model the constraints corresponding to both domains of the design. Our technique renders the constraint propagation between the two domains implicit to the MILP solver, thus enhancing the overall efficiency of the SAT framework. The experimental results are quite promising when compared with generic CNF-based and BDD-based SAT algorithms.

Functional Test Generation for Behaviorally Sequential Models [p. 403]

F. Ferrandi, G. Ferrara, D. Sciuto, A. Fin, and F. Fummi

Functional testing of HDL specifications is one of the most promising approaches for the verification of the functionalities of a design before synthesis. The contribution of this work is the development of a test generation algorithm targeting a new coverage metric (called bit-coverage) that provides full statement coverage, branch coverage, condition coverage and partial path coverage for behaviorally sequential models. The behavioral test sequences can be also the only way to evaluate testability of VHDL model for which a gate-level representation is not available (e.g third-party cores), since the behavioral error model is characterized also by a high correlation with the RT and gate-level stuck-at fault model. Moreover, the preciseness of the proposed coverage metric makes the identified test sequences more effective in identifying design errors, than other test patterns developed by following standard coverage metrics.

High Quality Behavioral Verification Using Statistical Stopping Criteria [p. 411]

A. Hajjar, T. Chen, I. Munn, A. Andrews, and M. Bjorkman

In order to improve the efficiency of behavioral model verification, it is important to determine the points of deminishing return for a given verification strategy. This paper compares the existing stopping rules and presents a new stopping rule based on static Bayesian technique. The new stopping rule was applied to verifying 14 complex VHDL models. We used the figure of merit to compare the efficiency of the stopping rules. The results in terms of coverage and verification time were shown to consistently outperform existing stopping rules.
Keywords: Behavioral Model Verification, VHDL, Statistical Stopping Rules.

6E: Hot Topic: Network Processors: A Perspective on Market Requirements, Processor Architectures and Embedded S/W Tools

Organizers: P. Bromley, F. Karim, and P. Paulin, STMicroelectronics, F
Moderator: P. Paulin, STMicroelectronics, F

Network Processors: A Perspective on Market Requirements, Processor Architectures and Embedded S/W Tools [p. 420]

P. Paulin, F. Karim, and P. Bromley

With the projected explosion of low-cost bandwidth availability, the intensive processing tasks and service hosting will move close to consumers on the "intelligent edge" of the network, where a significant portion of the future storage, processing and network management will take place. We address the rationale for this change, the characteristics of the network processor architecture required to address it, and the software development tools needed in order to improve time-to-market without sacrificing embedded software performance.

6F: Interconnect Extraction and Modelling

Moderators: L. Silveira, IST/INESC, PT; H. Grabinski, Hannover U, D

Efficient Inductance Extraction via Windowing [p. 430]

M. Beattie and L. Pileggi

We propose a new, efficient and accurate localized inductance modeling technique via windowing in a manner that is analogous to localized capacitance extraction. The stability and accuracy of this process is made possible by twice inverting the localized inductance models, and in the process exploit properties of the magnetostatic interactions as modeled via the susceptance (inverse inductance). Application of these localized double-inverse inductance models to actual IC bus examples demonstrates the significant improvement in simulation efficiency and overall accuracy as compared to alternative methods of approximation and simplification.

Efficient and Passive Modeling of Transmission Lines by Using Differential Quadrature Method [p. 437]

Q. Xu and P. Mazumder

This paper introduces a new transmission line modeling approach that employs an efficient numerical approximation technique called the Differential Quadrature Method (DQM). The transmission line has been discretized and the approximation framework is constructed by using the 5th order differential quadrature method, consequently an improved discrete equivalent-circuit model is developed in the paper. The DQM-based modeling requires far fewer intervening grid points for building an accurate discrete model of the transmission line than numerical methods like FD requires. It introduces far less state variables than FD-based models; therefore, it has higher efficiency. The DQM technique can be integrated in a circuit simulator since it preserves the passivity.

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements [p. 445]

Q. Yu and E. Kuh

In today's deep submicron technology, the coupling capacitances among individual on-chip RC trees have essential effect on the signal delay and crosstalk, and the interconnects should be modeled as coupled RC trees. We provide simple explicit formulas for the Elmore delay and higher order voltage moments, and a linear order recursive algorithm for the voltage moment computation for lumped and distributed coupled RC trees. By using the formulas and algorithms, the moment matching method can be efficiently implemented to deal with delay and crosstalk estimation, model order reduction and optimal design of interconnects.

On the Impact of On-Chip Inductance on Signal Nets under the Influence of Power Grid Noise [p. 451]

T. Chen

It has been well recognized that the impact of on-chip inductance on some critical nets, such as clock nets, is significant and cannot be ignored in delay modeling for these nets. However, the impact of on-chip inductance on signal nets in general is still not well understood. We present results of analyzing inductive effects on signal nets for ultra-deep submicron technologies. The analysis is based on a Al-based 0.18 um CMOS process and a Cu-based 0.13 um CMOS process. The impact of on-chip inductance is shown to be insignificant if we assume a perfect power supply network around the interconnect routes. Otherwise, the impact of on-chip inductance can be significant. Furthermore, the results presented in this paper illustrate the impact of on-chip inductance one would expect from transitioning from an Al-based interconnect technology to a Cu-based interconnect technology.

7A: Timing and Parallel Simulation

Moderators: S. Yoo, TIMA, Grenoble, F; F. Wagner, UFRGS, BRZ

Timing Simulation of Digital Circuits with Binary Decision Diagrams [p. 460]

R. Ubar, A. Jutman, and Z. Peng

Meeting timing requirements is an important constraint imposed on highly integrated circuits, and the verification of timing of a circuit before manufacturing is one of the critical tasks to be solved by CAD tools. In this paper, a new approach and the implementation of several algorithms to speed up gate-level timing simulation are proposed where, instead of gate delays, path delays for tree-like subcircuits (macros) are used. Therefore timing waveforms are calculated not for all internal nodes of the gate-level circuit but only for outputs of macros. The macros are represented by structurally synthesized binary decision diagrams (SSBDD) which enable a fast computation of delays for macros. The new approach to speed up the timing simulation is supported by encouraging experimental results.

HALOTIS: High Accuracy LOgic TIming Simulator with Inertial and Degradation Delay Model [p. 467]

P. Vazquez, J. Juan-Chico, M. Bellido, A. Acosta, and M. Valencia

This communication presents HALOTIS, a novel high accuracy logic timing simulation tool, that incorporates a new simulation algorithm based on different concepts for transitions and events. This new simulation algorithm is intended for including the inertial and degradation delay models. Simulation results are very similar to those obtained by electrical simulators, and show a higher accuracy compared to conventional delay models implemented in current logic simulators.

dlbSIM -- A Parallel Functional Logic Simulator Allowing Dynamic Load Balancing [p. 472]

K. Hering, J. Löser, and J. Markwardt

To meet the demanding time-to-market requirements in VLSI/ULSI design, the acceleration of verification processes is inevitable. The parallelization of cycle-based simulation at register-transfer- and gate level is one facet in a series of efforts targeted at this objective. We introduce dlbSIM, a parallel compiled code functional logic simulator that has been developed to run on loosely-coupled systems. It has the ability to balance the application-specific load of cooperating simulator instances in dependence of the overall load situation on involved processor nodes. Thereby, the load of a simulator instance is expressed in terms of a set of circuit model parts which are to be simulated by the corresponding instance. The centralized load management runs simultaneously with a parallel simulation. Both processes interact after a controllable number of simulated clock-cycles to transmit load information and realize load modifications. dlbSIM is successfully used to simulate IBM S/390 processor models.

Architecture Driven Partitioning [p. 479]

J. Küter and E. Barke

In this paper, we present a new algorithm to partition netlists for logic emulation under consideration of the targeted emulator architecture. The proposed algorithm allows the flexible use for a wide variety of applications because the description of the architecture is part of the input data. It combines a new approach of finding and improving an initial solution with existing algorithms to cluster the netlist and optimize the number of cut nets between blocks. As a result, the algorithm ensures that the cut nets between the created blocks can be connected within the emulation system, even without a full interconnect structure. Experiments on a number of designs and architectures demonstrate that the algorithm is competitive for architectures with full interconnect and that it is unique for architectures with limited interconnect resources.

7B: Embedded Tutorial: Low-Power Issues for SOCs

Moderator: C. Piguet, CSEM, Neuchatel, CH

Low-Power Systems on Chips (SOCs) [p. 488]

C. Piguet, M. Renaudin, and T. Omnès

For innovative portable products, Systems on Chips (SoCs) containing several processors, memories and specialised modules are obviously required. Performances but also low-power are main issues in the design of such SoCs. Are these low-power SoCs only constructed with low-power processors, memories and logic blocks? If the latter are unavoidable, many other issues are quite important for low-power SoCs, such as the way to synchronise the communications between processors as well as test procedures, on-line testing, software design and development tools. This paper is a general framework for the design of low-power SoCs, starting from the system level to the architecture level, assuming that the SoC is mainly based on the re-use of low-power processors, memories and logic peripherals.

7C: Defect Oriented Testing

Moderators: H. Kerkhoff, Twente U, NL; J. Pineda de Gyvez, Philips Research, NL

Static and Dynamic Behavior of Memory Cell Array Opens and Shorts in Embedded DRAMs [p. 496]

Z. Al-Ars and A. van de Goor

Fault analysis of memory devices using defect injection and simulation is becoming increasingly important as the complexity of memory faulty behavior increases. In this paper, this approach is used to study the effects of opens and shorts on the faulty behavior of embedded DRAM (eDRAM) devices produced by Infineon Technologies. The analysis shows the existence of previously defined memory fault models, and establishes new ones. The paper also investigates the concept of dynamic faulty behavior and establishes its importance for memory devices. Conditions to test the newly established fault models are also given.
Key words: Embedded DRAM, functional fault models, fault primitives, defect simulation, opens, shorts.

Definitions of the Numbers of Detections of Target Faults and their Effectiveness in Guiding Test Generation for High Defect Coverage [p. 504]

I. Pomeranz and S. Reddy

The number of times a fault f in a combinational circuit is detected by a given test set T was shown earlier to affect the defect coverage of the test set. The earlier definition counted each test in T, that detects f, as a distinct detection of f. This definition counts two tests as distinct detections even if they differ only in the values of inputs that do not affect the activation or propagation of the fault. In this work, we introduce a stricter definition that requires that two counted tests would be different in the way they activate and/or propagate the fault. We describe procedures for constructing test sets based on the stricter definition, and compare them to test sets for the earlier, less strict definition. The results show a simple criterion to decide when it may be necessary to combine the two definitions in order to obtain a high quality test set.

CMOS Open Defect Detection by Supply Current Test [p. 509]

M. Hashizume, M. Ichimiya, H. Yotsuyanagi, and T. Tamesada

In this paper, a new test method is proposed for detecting open defects in CMOS ICs. The method is based on supply current of ICs generated by applying time-variable electric field from the outside of the ICs. The feasibility of the test is examined by some experiments. The empirical results promised us that by using the method, open defects in CMOS ICs can be detected by measuring supply current which flows when time-variable electric field is applied.

Full Chip False Timing Path Identification: Applications to the PowerPC^TM Microprocessors [p. 514]

J. Zeng, M. Abadir, J. Bhadra, and J. Abraham

Static timing analysis sets the industry standard in the design methodology of high speed/performance microprocessors to determine whether timing requirements have been met. Unfortunately, not all the paths identified using such analysis can be sensitized. This leads to a pessimistic estimation of the processor speed. Also, no amount of engineering effort spent on optimizing such paths can improve the timing performance of the chip. In the past, we demonstrated initial results of how ATPG techniques can be used to identify false paths efficiently[1]. Due to the gap between the physical design on which the static timing analysis of the chip is based and the test view on which the ATPG techniques are applied to identify false paths, in many cases only sections of some of the paths in the full-chip were analyzed in our initial results. In this paper, we will fully analyze all the timing paths using the ATPG techniques, thus overcoming the gap between the testing and timing analysis techniques. This enables us to do false path identification at the full-chip level of the circuit. Results of applying our technique to the second generation G4 PowerPC^TM will be presented.

7E: Embedded Tutorial: CAD for RF Integrated Circuits and Systems

Moderator: P. Wambacq, IMEC, B

CAD for RF Circuits [p. 520]

P. Wambacq, G. Vandersteen, J. Phillips, J. Roychowdhury, W. Eberle, B. Yang, D. Long, and A. Demir

Wireless transceivers for digital telecommunications are heterogeneous systems that combine digital hardware, software and analog circuitry. The pressure to miniaturization and lower power consumption for these transceivers imposes tight specifications on their analog RF parts. Many aspects of RF circuits cannot be simulated accurately and efficiently with a classical circuit-level SPICE approach. In this paper three important simulation problems for RF circuits are addressed:
1. high-level simulation of analog and RF blocks for the determination of the specifications of the circuits
2. accurate circuit-level simulation of nonlinear circuits with time constants that differ largely,
3. efficient and accurate computation of phase noise in RF oscillators For each of these problems, solutions are proposed. These solutions illustrate that accurate and efficient simulations of RF communication circuits need a heterogeneous variety of advanced algorithms.

7F: Routing Enhancements

Moderators: J. Lienig, Robert Bosch GmbH, D; A. Takahashi, Tokyo IT, JP

Modeling Crosstalk Noise for Deep Submicron Verification Tools [p. 530]

P. Bazargan-Sabet and F. Ilponse

In deep submicron technologies, the verification task has to cover some new issues to certify the correctness of a design. The noise produced by crosstalk couplings is one of these emerging problems. In this paper, we propose a model to evaluate the peak value of the noise injected on a signal when its neighboring signals make their transitions. This model has been used in a prototype verification tool and has shown a satisfying performace-accuracy ratio.

A Graph Based Algorithm for Optimal Buffer Insertion under Accurate Delay Models [p. 535]

Y. Gao and D. Wong

Buffer insertion is an efficient technique in interconnect optimization. This paper presents a graph based algorithm for optimal buffer insertion under accurate delay models. In our algorithm, a signal is accurately represented by a finite ramp which is characterized by two parameters, shift time and transition time. Any accurate delay model, such as delay models based on the transmission line model and SPICE simulations, can be incorporated into our algorithm. The algorithm determines the optimal number of buffers and their locations on a wire such that some optimization objective is satisfied. Two typical examples of such optimization objectives are minimizing the 50% threshold delay and minimizing the transition time. Both can be easily determined in our algorithm. We show that the buffer insertion problem can be reduced to a shortest path problem. The algorithm can be easily extended for simultaneous buffer insertion and wire-sizing, and complexity is still polynomial. The algorithm can also be extended to deal with problems such as buffer insertion subject to transition time constraints at any position along the wire.

Repeater Block Planning under Simultaneous Delay and Transition Time Constraints [p. 540]

P. Sarkar and C. Koh

We present a solution to the problem of repeater block planning under both delay and signal transition time constraints for a given floorplan. Previous approaches have considered only meeting the target delay of a net. However, it has been observed that the repeater planning for meeting the delay target can cause signals on long interconnects to have very slow transition rates. Experimental results show that our new approach satisfies both timing constraints for an average of 79% of all global nets for six MCNC benchmark floorplans studied (at 1GHz frequency), compared with an average of 22% for the repeater block planner in [11].

8A: Layout Generation

Moderators: V. Meyer zu Bexten, Atmel Germany GmbH, D; E. Barke, Hannover U, D

On-The-Fly Layout Generation for PTL Macrocells [p. 546]

L. Macchiarulo, L. Benini, and E. Macii

Pass transistor logic (PTL) has been recently proposed as an alternative to standard MOS for aggressive circuit design. Even though PTL has been successful in a few handcrafted designs, its acceptance into mainstream digital design critically depends on the availability of tools for logic and physical synthesis and optimization. The automatic synthesis of pass transistor circuits starting from BDDs has been intensively studied in the past with promising results, but back-end tools for PTL cell generation are still missing. We describe an automatic layout generator that has been designed for seamless integration in a library-free PTL design flow. The generator exploits the distinctive characteristics of pass transistor networks produced by synthesis to achieve quality of results comparable with state-of-the art commercial cell generation tools in a fraction of the execution time.

Automatic Datapath Tile Placement and Routing [p. 552]

T. Serdar and C. Sechen

We report the very first fully automatic datapath tile layout flow. We subdivided the placement process into two steps: a global placement step using simulated annealing, and a new detailed placement step based on extensive modifications we made to the O-tree algorithm. The modifications have enabled the extended O-tree algorithm to handle the rectilinearly shaped transistor chains and gates common in datapath tile layout. We show that datapath tiles can be placed and routed automatically at the transistor level or at the mixed transistor/ gate level, achieving results for the very first time that are competitive to those obtained manually by a skilled designer.

A Boolean Satisfiability-Based Incremental Rerouting Approach with Application to FPGAs [p. 560]

G. Nam, K. Sakallah, and R. Rutenbar

Incremental redesign is an increasingly essential step in any complex design. Late changes or corrections in functional specifications (so-called "engineering change orders" or ECOs) force us to search for a minimal perturbation that achieves the desired repair. In reconfigurable design scenarios, these incremental repairs may be in response to physical faults: the goal is to "design around" the fault. For FPGAs, incremental rerouting is an essential component of this repair problem. We develop a new incremental rerouting algorithm for FPGAs using techniques from Boolean Satisfiability (SAT). In this application, these techniques have the twin virtues that they (1) represent all possible routing (and rerouting) constraints simultaneously and exactly, and (2) search for rerouting solutions by perturbing all nets concurrently. Preliminary results are promising. For several FPGA benchmarks, we were able to reroute fault reconfigurations that perturb up to 5.74% of all nets for a small number of fault sets (one to four faults) with only 1.55 track overhead per channel on average, with CPU time 0.76 to 4.91 seconds/fault.

8B: Modelling and Performance Analysis of Embedded Systems

Moderators: J. Plantin, Ericsson Radio Systems, SE; L. Lavagno, Udine U, IT

Dual Transitions Petri Net Based Modelling Technique for Embedded Systems Specification [p. 566]

M. Varea and B. Al-Hashimi

This paper presents a new modelling technique capable of modelling both control and data information using a single unified approach. This is achieved by modifying the classical Petri Net structure, allowing it to have two types of transitions and arcs. As a consequence, loops and conditional operations within complex specifications are easily identified. The system dynamic behaviour is modelled using a new marking scheme of the net consisting of a new element called value for data representation in addition to classical tokens used for control purpose. Structural definitions, behavioural rules and graphical representation of the new modelling technique are given. One potential application of the proposed modelling technique is the internal representation of embedded systems specification. Two examples are included illustrating the applicability and efficiency of the proposed modelling technique.

Probabilistic Application Modeling for System-Level Performance Analysis [p. 572]

R. Marculescu and A. Nandi

The objective of this paper is to introduce the Stochastic Automata Networks (SANs) as an effective formalism for application modeling in system-level analysis. More precisely, we present a methodology for application modeling for system-level power/performance analysis that can help the designer to select the right platform and implement a set of target multimedia applications. We also show that, under various input traces, the steady-state behavior of the application itself is characterized by very different 'clusterings' of the probability distributions. Having this information available, not only helps to avoid lengthy profiling simulations for predicting power and performance figures, but also enables efficient mappings of the applications onto a chosen platform. We illustrate the benefits of our methodology using the MPEG-2 video decoder as the driver application.
Keywords: system-level design, performance analysis, application modeling, stochastic automata networks, embedded multimedia systems.

Reliable Estimation of Execution Time of Embedded Software [p. 580]

P. Giusto, G. Martin, and E. Harcourt

Estimates of execution time of embedded software play an important role in function-architecture co-design. This paper describes a technique based upon a statistical approach that improves existing estimation techniques. Our approach provides a degree of reliability in the error of the estimated execution time. We illustrate the technique using both control-oriented and computational-dominated benchmark programs.

8C: Analog and Mixed Signal Testing

Moderators: M. Renovell, LIRMM, F; B. Kruseman, Philips Research, NL

Implementation of a Linear Histogram BIST for ADCs [p. 590]

F. Azaïs, S. Bernard, Y. Bertrand, and M. Renovell

This paper validates a linear histogram BIST scheme for ADC testing. This scheme uses a time decomposition technique in order to minimize the required hardware circuitry. A practical implementation is described and the structure together with the operating mode of the different modules are detailed. Through this practical implementation, the performances and limitations of the proposed scheme are evaluated both in terms of additional circuitry and test time.

Test Generation Based Diagnosis of Device Parameters for Analog Circuits [p. 596]

S. Cherubal and A. Chatterjee

With the increasing complexity of manufacturing processes and the shrinking of device geometries, the performance metrics of integrated circuits (ICs) are becoming increasingly sensitive to random fluctuations in the manufacturing process. We propose a diagnosis methodology that can be used to infer the cause(s) of variations in performance of analog ICs. The methodology consists of (a) a device parameter computation technique which is used to compute the device parameters of an IC from measurements made on it and (b) a cause-effect analysis module that is used to compute the cause of the variation in performance metrics of a given set of ICs. Simulation results to demonstrate the effectiveness of the technique are presented.

Generation of Optimum Test Stimuli for Nonlinear Analog Circuits Using Nonlinear Programming and Time -Domain Sensitivities [p. 603]

B. Burdiek

In this paper a novel approach for the generation of an optimum transient test stimulus for general analog circuits is proposed. The test stimulus is optimal with respect to the detection of a given fault set by means of a predefined fault detection criterion. The problem of finding an optimum test stimulus detecting all faults from the fault set is formulated as a nonlinear programming problem. A functional describing the differences between the good and all faulty test responses of the circuit serves as a merit functional for the programming problem. A parameter vector completely describing the test stimulus is used as the optimization vector. The gradient of the merit functional required for the optimization is computed using time-domain sensitivities. Since in this approach the evaluation of the fault detection criterion represented by the merit functional flows directly into the computation of the test stimulus, optimal test stimuli for hard to detect faults can be generated. If more than one input terminal is used for testing, several test stimuli can be generated simultaneously.

8E: Panel Session: Managing the SoC Design Challenge with 'Soft' Hardware

Organizer: D. Davis, Actel, USA
Moderator: R. Wilson, EETimes, USA
Panellists: T. Kambe, Sharp, JP; B. Gupta, STmicroelectronics, USA; C. Balough, Triscend, USA; Y. Tanurhan, Actel, USA

Managing the SoC Design Challenge with "Soft" Hardware [p. 610]

R. Wilson

Panel members will discuss, from their individual perspectives, why embedded reconfigurability has become critical to the future success of systems-on-a-chip and how they are attempting to implement solutions. The Opportunity: Implementing reconfigurable logic within SoCs will also help to expand and differentiate members of product families as well as extend product lifecycles and reduce design and test cycles, thus shortening product time to market. Having reconfigurability in system-on-a-chip silicon will increase design flexibility by allowing re-use of design elements to create differentiated products. Changing or revising logic elements on the fly via reconfigurability to meet changes in standards or features or to fix design errors will help avoid increasingly expensive NRE re-spins.

8F: Hardware-Software Architectures and Synthesis

Moderators: J. Henkel, NEC, USA; R. Leupers, Dortmund U, D

Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design i of Embedded Systems under Power and Latency Constraints [p. 612]

A. Doboli

This paper presents an integrated approach to hardware-software co-synthesis and HLS for design of low-power embedded systems. The main motivation for this work is that fine trade-offs between latency and power can be explored at the system level only with a detailed knowledge of used hardware resources. Integrated method was realized as a simulated annealing based solution-space exploration. Exploration is guided by Performance Models, that exactly capture the relationship between performances i.e. power consumption and latency and design decisions i.e. binding and scheduling. The proposed approach permits not only a more accurate latency and power estimation but also the exposure of RTL-level design decisions at the system level. As a result, more effective power-latency trade-offs are possible during co-synthesis as compared to traditional task-level methods.

Allocation and Scheduling of Conditional Task Graph in Hardware/Software Co-Synthesis [p. 620]

Y. Xie and W. Wolf

This paper introduces an allocation and scheduling algorithm that efficiently handles conditional execution in multi-rate embedded system. Control dependencies are introduced into the task graph model. We propose a mutual exclusion detection algorithm that helps the scheduling algorithm to exploit the resource sharing. Allocation and scheduling are performed simultaneously to take advantage of the resource sharing among those mutual exclusive tasks. The algorithm is fast and efficient,and so is suitable to be used in the inner loop of our hardware/software co-synthesis framework which must call the scheduling routine many times.

Code Placement in Hardware Software Co -Synthesis to Improve Performance and Reduce Cost [p. 626]

S. Parameswaran

This paper introduces an algorithm for code placement in cache, and maps it to memory using a second algorithm. The target architecture is a multiprocessor system with 1^st level cache and a common main memory. These algorithms guarantee that as many instruction codewords as possible of the high priority tasks remain in cache all of the time so that other tasks do not overwrite them. This method improves the overall performance, and might result in cheaper systems if more powerful processors are not needed. Amount of memory increase necessary to facilitate this scheme is in the order of 13%. The average percentage of highest priority tasks always in memory can vary from 3% to 100% depending upon how many tasks (and their sizes) are allocated to each processor.

System-On-A-Chip Processor Synchronization Support in Hardware [p. 633]

B. Saglam and V. Mooney III

For scalable-shared memory multiprocessor System-on-a-Chip implementations, synchronization overhead may cause catastrophic stalls in the system. Efficient improvements in the synchronization overhead in terms of latency, memory bandwidth, delay and scalability of the system involve a solution in hardware rather than in software. This paper presents a novel, efficient, small and very simple hardware unit that brings significant improvements in all of the above criteria: in an example, we reduce time spent for lock latency by a factor of 4.8, the worst-case execution of lock delay in a database application by a factor of more than 450. Furthermore, we developed a software architecture together with RTOS support to leverage our hardware mechanism. The worst-case simulation results of a client-server example on a four-processor system showed that our mechanism achieved an overall speedup of 27%.

9A: Reconfigurable Computing I

Moderators: K. Buchenrieder, Infineon Technologies, D; H. Grünbaecher, Carinthia Tech. Inst., Villach, A

A Decade of Reconfigurable Computing: A Visionary Retrospective [p. 642]

R. Hartenstein

The paper surveys a decade of R&D on coarse grain reconfigurable hardware and related CAD, points out why this emerging discipline is heading toward a dichotomy of computing science, and advocates the introduction of a new soft machine paradigm to replace CAD by compilation.

Hierarchical Memory Mapping during Synthesis in FPGA -Based Reconfigurable Computers [p. 650]

I. Ouaiss and R. Vemuri

One step in the synthesis for FPGA-based Reconfigurable Computers (RCs) involves mapping the design data structures onto the physical memory banks available in the hardware. The advent of Xilinx Virtex-style FPGAs and of hierarchical memory schemes on reconfigurable boards introduced an added complexity to this mapping. The new RC boards offer a wealth of memory banks many of them on-chip (such as the BlockRAMs available in the Virtex architecture) and many of them offering variable number of ports and several depth/width configurations. Along with the external RAMs, a hierarchy of memories with varying access performances are available in a reconfigurable computer. It becomes critical to perform a good mapping to achieve optimal design performance. This paper presents an automatic memory mapping methodology which takes into account: the number of words and word size of design data segments and physical memory banks, number of ports on the banks, access latency of the banks, proximity of the banks to the processing unit, life cycle analysis of data segments, and it also incorporates configuration selection from the multiple configurations available in BlockRAMs of Virtex series FPGAs. In the case of multiple processing elements on board, the paper also provides a framework in which the task of memory mapping interacts with spatial partitioning to provide the best implementation.

Optimal FPGA Module Placement with Temporal Precedence Constraints [p. 658]

S. Fekete, E. Köhler, and J. Teich

We consider the optimal placement of hardware modules in space and time for FPGA architectures with reconfiguration capabilities, where modules are modeled as three-dimensional boxes in space and time. Using a graph-theoretic characterization of feasible packings, we are able to solve the following problems: (a) Find the minimal execution time of the given problem on an FPGA of fixed size, (b) Find the FPGA of minimal size to accomplish the tasks within a fixed time limit. Furthermore, our approach is perfectly suited for the treatment of precedence constraints for the sequence of tasks, which are present in virtually all practical instances. Additional mathematical structures are developed that lead to a powerful framework for computing optimal solutions. The usefulness is illustrated by computational results.

9B: Embedded Software

Moderators: P. Marwedel, Dortmund U, D; Z. Peng, Linkoping U, SE

Generation of Minimal Size Code for Schedule Graphs [p. 668]

C. Passerone, Y. Watanabe, and L. Lavagno

This paper proposes a procedure for minimizing the code size of sequential programs for reactive systems. It identifies repeated code segments (a generalization of basic blocks to directed rooted trees) and finds a minimal covering of the input control flow graphs with code segments. The segments are disjunct, i.e. no two segments have the same code in common. The program is minimal in the sense that the number of code segments is minimum under the property of disjunction for the given control flow specification. The procedure makes no assumption on the target processor architecture, and is meant to be used between task synthesis algorithms from a concurrent specification and a standard compiler for the target architecture. It is aimed at optimizing the size of very large, automatically generated flat code, and extends dramatically the scope of classical common sub-expression identification techniques. The potential effectiveness of the proposed approach is demonstrated through preliminary experiments.

Generating Production Quality Software Development Tools Using a Machine Description Language [p. 674]

A. Hoffmann, A. Nohl, S. Pees, G. Braun, and H. Meyr

This paper presents a methodology to automatically generate production quality software development tools for programmable architectures using the machine description language LISA. Various architectures presenting diverse architectural originalities will be presented and the feasibility of automatically generating simulator, assembler, linker and graphical debugger frontend will be discussed. The presented approach is not limited to a fixed abstraction level -- case studies of the Texas Instruments C62x and C54x, the Analog Devices ADSP2101 as well as the ARM7 will show the applicability of the methodology from cycle/phase to instruction accurate models.

Automatic Generation and Targeting of Application Specific Operating Systems and Embedded Systems Software [p. 679]

L. Gauthier, S. Yoo, and A. Jerraya

We propose a method of automatic generation of application specific operating systems (OS's) and automatic targeting of application software. OS generation starts from a very small but yet flexible OS kernel. OS services, which are specific to the application and deduced from dependencies between services, are added to the kernel to construct the whole OS. Communication and synchronization functions in the application code are adapted to the generated OS. As a preliminary experiment, we applied the proposed method to a system example called token ring system.

Cache Conscious Data Layout Organization for Embedded Multimedia Applications [p. 686]

C. Kulkarni, C. Ghez, M. Miranda, F. Catthoor, and H. De Man

Cache misses form a major bottleneck for real-time multimedia applications due to the off-chip accesses to the main memory. This results in both a major access bandwidth overhead (and related power consumption) as well as performance penalties. In this paper, we propose a new technique for organizing data in the main memory for data dominated multimedia applications so as to reduce majority of the conflict cache misses. The focus of this paper is on the formal and heuristic algorithms we use to steer the data layout decisions and the experimental results obtained using a prototype tool. Experiments on real-life demonstrators illustrate that we are able to reduce up to 82% of the conflict misses for applications that are already aggressively transformed at the source-level. At the same time, we also reduce the of-chip data accesses by up to 78% and combined with address optimizations we are able to reduce the execution time. Thus out approach is complimentary to the more conventional way of reducing misses by reorganizing the execution order.

9C: Panel Session: Design Challenges and Emerging EDA Solutions in Mixed-Signal IC Design

Organizer and Moderator: G. Gielen, KU Leuven, B
Panellists: B. Sorensen, Atrium Design Solutions; H. Casier, Alcatel Microelectronics, B; P. Magarshack, STMicroelectronics, F; J. Rodriguez, Anacad; J. Pollet, Dolphin, F

Design Challenges and Emerging EDA Solutions in Mixed-Signal IC Design [p. 694]

With increasing integration levels, more and more ICs and systems-on-chip turn into mixed-signal designs. Typical examples are telecom (Bluetooth, WLAN, xDSL...and multimedia (digital video, MP3 audio...) systems. This hot topic session will explore the challenges that designers face with these mixed-signal designs, covering both technical and methodological challenges as well as engineering resource and skill shortage problems. On the technical side, basic challenges are in incorporating analog design in a digital-oriented system design flow, signal integrity problems (supply and substrate noise, crosstalk...), trailing analog design productivity and test. In addition, the session will discuss the emerging progress in the methodology and EDA field, ranging from new software startups to analog and mixed-signal IP providers. The session will start with a brief tutorial overview about the problems and emerging solutions in the mixed-signal domain, for the audience to get an update of the current state of the art in mixed-signal. This will be followed by a panel discussion, where the goal for the audience is to really explore where the unaddressed problems are in mixed-signal design and which problems are today close to being solved commercially in this dynamically moving market. Issues addressed by the panel members include the integration of analog and mixed-signal IP, the emergence of mixed-signal CAD tools including behavioral modeling and simulation as well as analog synthesis, the challenge of rapid technology changes and analog design retargeting, the mixed-signal signal integrity nightmare, the rise of specialized mixed-signal design companies, single-chip versus single-package integration, the trimming of analog courses in many recently restructured EE curricula and the shortage of analog designers.

9E: Hot Topic : Game Processors

Organizers/Moderators: W. Rosenstiel, FZI/Tübingen U, D; Y. Nakamura, Kyoto U, JP
Speakers: H. Tago, System LSI R&D Center, Toshiba Semiconductor Company;
A. Mandapati, ATI Research Inc (Subsidiary of Nintendo in the US);
S. Narita, Advanced Microcomputer Business Operation, System LSI Business Division, Hitachi Ltd.

CPU for PlayStation®2 [p. 696]

H. Tago, K. Hashimoto, N. Ikumi, M. Nagamatsu, M. Suzuoki, and Y. Yamamoto

Processors designed for computer entertainment must perform 3D graphics calculations, especially geometry and perspective transformations. In the PlayStationR2, we introduced the new idea of synthesizing emotion called Emotion Synthesis and devised a new processor architecture to support its graphics demands. The architecture is embodied in the PlayStationR2's "Emotion Engine" CPU, which uses vector units (VUs) as the key units for floating-point calculations. Emotion synthesis means the real-time synthesis of a computer graphics animation scene that projects a great deal of atmosphere. For example, when a female character walks into a video game scene, her motion must be determined by solving physical equations in response to interactive events instead of replaying prerecorded data. Moreover, differential equations with a large number of variables must be used to describe, for example, the waving motions of her hair in a breeze. For authenticity in emotion synthesis, the CPU must execute these calculations in real time. "Emotion Engine" ("EE") is a system LSI including a 300MHz 128-bit 2-way superscalar RISC core, two Vector Units ("VU"s), Image Processing Unit ("IPU") for MPEG-2 stream decode, a 10-channel memory access (DMA) controller, two channel RambusR memory controller (RAC) and other peripheral modules. 13.5M transistors are integrated on 15.02mm x 15.04mm die with 0.25um device technology with 0.18um gate length. Design strategy and LSI design methodologies and CAD for "Emotion Engine" LSI are presented with emphasis on practical aspects of verification and timing closure. A combination of simulation, emulation and formal verification ensured the functional first silicon for system evaluation. In order to control wire delay in early design stage, floor-plan based synthesis and wire load estimation are adopted for quick timing closure.

Implementation of the ATI Flipper Chip [p. 697]

A. Mandapati

The Nintendo GameCube(tm) video game console system is designed to outpace all other such systems when released. Formerly known by the codename Dolphin, this system includes an IBM PowerPC(tm) processor and specialized hardware from ATI. This specialized hardware is embodied in ATI's Flipper chip, the centerpiece in the Dolphin design. Flipper functions as the graphics processor, audio processor, host controller, memory controller, and I/O processor of the Dolphin system. Such a complex chip requires a very robust design flow to get to functioning silicon in as little time as possible. Here we will describe that design flow, developed by ATI engineers to implement the Flipper design. The goal was to develop a flow to implement the best gaming hardware on a chip that needed to be as cost-effective as possible. There were many challenges the design offered, requiring optimal use of a small design team with a minimal budget to achieve aggressive schedules. The biggest challenge the team was presented was that of area. With high volumes, chips for consumer devices can benefit greatly from smaller die sizes, due in part to higher yields and also in part to lower power and cheaper packages. Another daunting challenge the design offered was that of the use of embedded DRAM. The Dolphin architecture called for the use of an embedded frame buffer and texture memory buffer for fast access.

SH-4 RISC Microprocessor for Multimedia, Game Machine [p. 699]

S. Narita

The SH-4 is a 2-issue superscalar 32-bit RISC microprocessor for SEGA's game machine, Dreamcast. In order to extend the floating-point performance, a graphic FPU and graphic-oriented instructions are provided. The performance is 360 VAX MIPS, 6.0M Polygons/sec, 1.4G FLOPS(peak with the new instructions) at 200MHz.

9F: Decision Diagrams

Moderators: A. Oliveira, IST/INESC, PT; E. Macii, Politecnico di Torino, IT

Streaming BDD Manipulation for Large-Scale Combinatorial Problems [p. 702]

S. Minato and S. Ishihara

We propose a new BDD manipulation method that never causes memory over ow or swap out. In our method, BDD data are accessed through the I/O stream ports. We can read unlimited length of BDD data streams usng a limited size of the memory, and the result of BDD data streams are concurrently produced. Our streaming method features that (1) it gives a continuous trade-off between the memory usage and the streaming data length, (2) a valid partial result can be obtained before completing process, and (3) easily accelerated by pipelined multiprocessing. Experimental result shows that our new method is especially useful for the cases where conventional BDD packages are ineffective. For example, we succeeded in finding a number of solutions to a SAT problem using a commodity PC with a 64 MB memory, where the conventional method will require a 100 memory to compute it. BDD manipulation has been considered as an intensively memory-consuming procedure, but now we can also utilize the hard disk and network resources as well. Our method will lead a new style of BDD applications.

Binary Decision Diagram with Minimum Expected Path Length [p. 708]

Y. Liu, K. Wang, T. Hwang, and C. Liu

We present methods to generate a Binary Decision Diagram (BDD) with minimum expected path length. A BDD is a generic data structure which is widely used in several fields. One important application is the representation of Boolean functions. A BDD representation enables us to evaluate a Boolean function: Simply traverse the BDD from the root node to the terminal node and retrieve the value in the terminal node. For a BDD with minimum expected path length will be also minimized the evaluation time for the corresponding Boolean function. Three efficient algorithms for constructing BDDs with minimum expected path length are proposed.

Spectral Decision Diagrams Using Graph Transformations [p. 713]

M. Thornton and R. Drechsler

Spectral techniques are powerful methods for synthesis and verification of digital circuits. The advances in DD representations for discrete valued functions in terms of computational efficiency can be exploited in the calculation of the spectra of Boolean functions. The classical approach in computing the spectrum of a function by taking advantage of factored transformation matrices as used in the "Fast Fourier Transform" may be reformulated in terms of DD based graph algorithms resulting in a complete representation of the spectrum. The relationship between DD based interpretations and the linear algebra based definitions of spectral methods are described.

9L: Friday Keynote Session: Electronic System Design Methodology: Europe's Positioning

Moderator: A. Jerraya, TIMA, Grenoble, F
Speaker: G. Matheron, MEDEA Office Director, Paris, F

Electronic System Design Methodology: Europe's Positioning [p. 720]

The engine that drives all the ICT industries is microelectronics. By 2015, according to Mark Pinto of Bell Labs, the microelectronics industry "will be manufacturing 10 million silicon transistors per human being per day ... and the applications will exist to consume them". Microelectronics, through its dramatic increase in performance, is the enabler of this revolution. Soon entire products -- such as mobile telephones, computers and camcorders -- will be based on single silicon chips, reducing product cost and price, opening new markets and boosting manufacturing. Microelectronic chips, together with embedded software, drive the entire ICT industry, by doubling performance and halving cost every 18 months, allowing continuous innovation in products such as mobile phones and smart cards and in services like the Internet and e-commerce. The chips generate new products used by professionals and laymen: 60% of today's electronics applications have been made possible solely by the technical progress of microelectronics. Gérard Matheron will describe how the evolution of electronic system design is changing the world.

10A: Reconfigurable Computing II

Moderators: R. Lauwereins, KU Leuven, B; R. Hartenstein, Kaiserslautern U, D

Precision and Error Analysis of MATLAB Applications during Automated Hardware Synthesis for FPGAs [p. 722]

A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee

We present a compiler that takes high level signal and image processing algorithms described in MATLAB and generates an optimized hardware for an FPGA with external memory. We propose a precision analysis algorithm to determine the minimum number of bits required by an integer variable and a combined precision and error analysis algorithm to infer the minimum number of bits required by a floating point variable. Our results show that on an average, our algorithms generate hardware requiring a factor of 5 less FPGA resources in terms of the Configurable Logic Blocks (CLBs) consumed as compared to the hardware generated without these optimizations. We show that our analysis results in the reduction in the size of lookup tables for functions like sin, cos, sqrt, exp etc. Our precision analysis also enables us to pack various array elements into a single memory location to reduce the number of external memory accesses. We show that such a technique improves the performance of the generated hardware by an average of 35%.

A HW/SW Partitioning Algorithm for Dynamically Reconfigurable Architectures [p. 729]

J. Noguera and R. Badia

"System-On-Chip" has become a reality, and recently new reconfigurable devices have appeared. However, few efforts have been carried out in order to define HW/SW codesign methodologies and algorithms which address the challenges presented by new reconfigurable devices. In this paper we address this open problem and present a novel HW/SW partitioning algorithm for dynamically reconfigurable architectures. The algorithm is a constructive algorithm, which obtains an initial solution and afterwards tries to optimize it. The HW/SW partitioning is done taking into account the features of the dynamically reconfigurable devices, and its final goal is minimize the reconfiguration latency. The partitioning algorithm has been implemented and integrated into our developed codesign environment, where several experiments have been carried out. The results obtained demonstrate the benefits of the algorithm.

Managing Dynamic Reconfiguration Overhead in Systems -On-A-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks [p. 735]

Z. Huang and S. Malik

This research examines the role of dynamically reconfigurable logic in systems-on-a-chip (SOC) design. Specifically we study the overhead of storing and downloading the configuration code bits for different parts of an application in a dynamically reconfigurable coprocessor environment. For SOC designs the different configuration bit-streams will likely need to be stored on chip, thus it becomes crucial to reduce the storage overhead. In addition, reducing the reconfiguration time overhead is crucial in realizing performance benefits. This study provides insight into the granularity of the reconfigurable logic that is appropriate for the SOC context. Our initial study is in the domain of multimedia and communication systems. We first present profiling results for these using the MESCAL compiler infrastructure. These results are used to derive an architecture template that consists of dynamically reconfigurable datapaths using coarse grain logic blocks and a reconfigurable interconnection network. We justify this template based on the constraints of SOC design. We then describe a design flow where we start from an application, derive the kernel loops via profiling and then map the application using the dynamically reconfigurable datapath and the simplest interconnection network. As part of this flow we have developed a mapping algorithm that minimizes the size of the interconnection network, and thus the overhead of reconfiguration, which is key for systems-on-a-chip. We provide some initial results that validate our approach.

10B: Co-Simulation and System Verification Techniques

Moderators: P. Schwarz, FhG IIS/EAS Dresden, D; M. Rencz, TU Budapest, H

Simulation-Guided Property Checking Based on a Multi-Valued AR-Automata [p. 742]

J. Ruf, D. Hoffmann, T. Kropf, and W. Rosenstiel

The verification of digital designs, i.e., hardware or embedded hardware/software systems, is an important task in the design process. Often more than 70% of the development time is spent for locating and correcting errors in the design. Therefore, many techniques have been proposed to support the debugging process. Recently, simulation and test methods have been accompanied by formal methods such as equivalence checking and property checking. However, their industrial applicability is currently restricted to small or medium sized designs or to a specific phase in the design cycle. In this paper, we present a method for verifying temporal properties of systems described in an executable description language. Our method allows the user to specify properties about the system in finite linear time temporal logic (FLTL). These properties are translated to a special kind of finite state machines which are then efficiently checked on-the-fly during each simulation run. Properties may be placed anywhere in the system description and violations are immediately indicated to the designer.

Performance Improvement of Multi-Processor Systems Cosimulation Based on SW Analysis [p. 749]

J. Jung, S. Yoo and K. Choi

In this paper, we propose a method for performance improvement of multi-processor systems cosimulation by reducing synchronization overhead between multiple simulators. To reduce the amount of simulator synchronization, we predict synchronization time points based on a static analysis of application software running on each processor. In the experiments with real embedded systems, we obtained up to orders of magnitude higher performance in cosimulation runtimes.

Mixed-Level Cosimulation for Fine Gradual Refinement of Communication in SoC Design [p. 754]

G. Nicolescu, S. Yoo, and A. Jerraya

In this paper, we propose a method of mixed-level cosimulation that enables gradual refinement of SoC communication from protocol-neutral communication to protocol-fixed communication. For fine granularity in refinement, the method enables the designer to perform channel refinement and module refinement. Thus, the designer can perform more extensive design space exploration in communication refinement. We show the effectiveness of the proposed method in a case study of communication refinement in an IS-95 CDMA cellular phone system design.

A Framework for Fast Hardware-Software Co-Simulation [p. 760]

A. Hoffmann, T. Kogel, and H. Meyr

We present a new hardware-software co-simulation framework enabling fast prototyping in system-on-chip designs. On the software side, the machine description language LISA allows the generation of bit-true models of programmable architectures on various levels -- from instruction-set to phase accuracy. Based on these models, a complete tool-suite consisting of fast compiled processor simulator, assembler, linker, HLL-compiler as well as co-simulation interface can be generated automatically. On the hardware side, the SystemC simulation class library is employed and enhanced with our generic co-simulation interface that enables the coupling of hardware and software models specified at various levels of abstraction. Besides that, a hardware modeling strategy using abstract macro-cycle based C ++ processes to increase hardware modeling efficiency and simulation speed is presented.

10C: Embedded Tutorial: Analog Methods and Tools for SoC Integration

Moderators: J. Vital, IST, PT; A. Rueda, CNM, Seville U, ES; A. Vasquez, CNM, Seville U, ES

Analog/Mixed-Signal IP Modeling for Design Reuse [p. 766]

N. Madrid, E. Peralías, A. Acosta, and A. Rueda

The application of design reuse to analog and mixed-signal components for System-on-Chip (SoC) is an emerging and revolutionary field. This paper presents a methodological approach to this area illustrated with a mixed-signal case study.

A Skill^TM-Based Library for Retargetable Embedded Analog Cores [p. 768]

X. Jingnan, J. Vital, and N. Horta

This paper describes the automatic generation and re-usability of physical layouts of analog and mixed-signal blocks based on high-functionality pCells that are fully independent of technologies. The high-functionality pCell library presently contains over 42 pCells and is fully compliant with 7 different sets of technology design rules from 5 different foundries. Practical examples employed in industrial projects are illustrated.

Modelling SoC Devices for Virtual Test Using VHDL [p. 770]

M. Rona and G. Krampl

Virtual Test (VT) is a new technique to cut the time-to-market especially for SoC products that inherently contain complex mixed-signal blocks. VT allows debugging test programs in a simulation environment if a fast and sufficiently accurate IC model can be made available. VHDL behavioural models turned out to be a very promising approach to cover both the needs of designers for the sign-off simulation on chip level and of test engineers for VT. The trade-offs between modelling effort, simulation performance and accuracy of results will be discussed for VT applications based on an industrial example.

Retargeting of Mixed-Signal Blocks for SoCs [p. 772]

R. Castro-López, F. Fernández, M. Delgado-Restituto, and A. Rodríguez-Vázquez

This paper introduces a very efficient methodology for retargetability of embedded mixed-signal blocks for SoCs. The key parts of this methodology are: parameterised layout templates at different hierarchical levels, accurate behavioral modeling of mixed-signal blocks and appropriate mechanisms to tuning sized circuits to new sets of specs.

10E: Panel Session: Standard Bus vs. Bus Wrapper: What is the Best Solution for Future SoC Integration?

Organizer: C. Yeung, VSI Alliance, USA
Moderator: P. Clarke, Electronic Engineering Times, UK
Panellists: A. Haverinen, Nokia, FIN; USA; G. Matthews, STMicroelectronics, F; J. Morris, ARM, UK, and J. Zaidi, Palmchip Corp., USA

Standard Bus vs. Bus Wrapper: What is the Best Solution for Future SoC Integration? [p. 776]

A number of companies have promoted their on-chip busses as potential standards for the SoC industry. VSIA's On-Chip Bus Development Working Group chooses to develop a Standard Bus Wrapper (VCI) as opposed to endorsing a single bus as the standard. Standard Bus advocates claim Wrappers incur performance and area overhead. Bus Wrapper advocates claim no single On-Chip Bus will meet the needs of all SoCs. Will a single bus emerge, and if not where should a standard wrapper be used? Which is the correct approach for future SoC Integration? This panel will include experts from both of these perspectives, to discuss the pros and cons of their positions.

10F: Architectural Level Synthesis

Moderators: A. Brown, Southampton U, UK; P. Eles, Linkoping U, SE

Access Pattern Based Local Memory Customization for Low Power Embedded Systems [p. 778]

P. Grun, N. Dutt, and A. Nicolau

Memory accesses represent a major bottleneck in embedded systems power and performance. Traditionally, the local memory relied on a large cache to store all the variables in the application. However, especially in large real-life applications, different types of data exhibit divergent types of locality and access patterns, with diverse locality and bandwidth needs. Traditional caches had to compromise between the different types of locality required by the access patterns, and trade-off performance against bandwidth requirement. Instead, our approach customizes the local memory architecture matching the diverse access patterns and locality types present in the application, to reduce the main memory bandwidth requirement, and significantly improve power consumption, without sacrificing performance. Our approach generated an average 30% memory power reduction without de-grading performance on a set of large multimedia/general purpose applications and scientific kernels, over the best traditional cache configuration of similar size, demonstrating the utility of our algorithm.

Static Memory Allocation by Pointer Analysis and Coloring [p. 785]

J. Zhu

Modern systems-on-chips often allocate more silicon real-estate on memory than logic. The minimization of on-chip memory becomes increasingly important for the reduction of manufacturing cost. In this paper, we present a new technique that minimizes memory usage. Incoporated in a behavioral synthesis tool that synthesizes general-purpose C programs, this technique is fully automated and does not rely on users to explicitly specify dataflow information. Experimental results show that significant improvements can be achieved for the benchmark set.

Heuristic Datapath Allocation for Multiple Wordlength Systems [p. 791]

G. Constantinides, P. Cheung, and W. Luk

This paper introduces a heuristic to solve the combined scheduling, resource binding, and wordlength selection problem for multiple wordlength systems. The algorithm involves an iterative refinement of operator wordlength information, leading to a scheduled and bound data-flow graph. Scheduling is performed with incomplete wordlength information during the intermediate stages of this refinement process. Results show significant area savings over known alternative approaches.

Poster Session:

On the Verification of Synthesized Designs Using Automatically Generated Transformational Witnesses [p. 798]

E. Teica, R. Radhakrishnan, and R. Vemuri

This poster presents a new methodology for verifying the synthesized designs, and for debugging the software implementation of high-level synthesis algorithms. The methodology is based on a set of 7 RTL transformations which are able to emulate the effect of many scheduling and resource allocation algorithms.

Property-Specific Witness Graph Generation for Guided Simulation [p. 799]

A. Casavant, A. Gupta, S. Liu, A. Mukaiyama, K. Wakabayashi, and P. Ashar

A practical solution to the complexity of design validation is semi-formal verification, where the specification of correctness criteria is done formally, as in model checking, but checking is done using simulation, which is guided by directed vector sequences derived from knowledge of the design and/or the property being checked. Simulation vectors must be effective in targeting the types of bugs designers expect to find rather than some generic coverage metrics. The focus of our work is to generate property-specific testbenches for guided simulation, that are targeted either at proving the correctness of a full CTL property or at finding a bug. This is facilitated by generation of a property-specific model, called a "Witness Graph", which captures interesting paths in the design. Starting from an initial abstract model of the design, symbolic model checking, pruning, and refinement steps are applied in an iterative manner, until either a conclusive result is obtained or computing resources are exhausted. The witness graph is annotated with, e.g., state or transition priorities before testbench generation. The overall testbench generation flow, and the iterative flow for witness graph generation are shown in Figures 1 and 2.

Two Approaches for Developing Generic Components in VHDL [p. 800]

V. Stuikys, G. Ziberkas, R. Damasevicius, and G. Majauskas

We consider the one- and two-language approaches (1LA & 2LA) for developing generic components (GCs) for VHDL generators. By 1LA & 2LA we mean a generalization using "pure" VHDL, or using the VHDL abstractions mixed with Open PROMOL, the external scripting language we have developed for building GCs and generators, respectively. We present the evaluation of both approaches.

Annotated Data Types for Addressed Token Passing Networks [p. 801]

G. Cichon and W. Bunnbauer

Introduction to Annotated Data Types Annotated data types have proven to be a practical description form for interfaces of SoC components to random addressable buses (see [1]). The central idea behind this new approach is to define a component's interface to a random addressable bus in terms of a data structure it exposes to this bus. This data structure is modeled using a type system similar to that of computer languages, like C or VHDL. This data structure is annotated with additional information relevant for hardware description purposes. In [1], the underlying terminology and framework, as well as a method for synthesizing the functional adaptor part of hardware components, has been described.

Testability Trade-Offs for BIST RTL Data Paths: The Case for Three Dimensional Design Space [p. 802]

N. Nicolici and B. Al-Hashimi

Power dissipation during test application is an emerging problem due to yield and reliability concerns. This paper focuses on BIST for RTL data paths and discusses testability trade-offs in terms of test application time, BIST area overhead and power dissipation.

Towards a Better Understanding of Failure Modes and Test Requirements of ADCs [p. 803]

A. Lechner, A. Richardson, and B. Hermes

It is now widely recognised that Built-In Self-Test (BIST) techniques and Design-for-Testability (DfT) will be mandatory to meet test and quality specifications in next generation mixed signal ICs [1]. For evaluating, verifying, and comparing testability improvements, a more detailed understanding of circuit specific failure modes is essential. This paper presents fault simulation results for a 6-bit ADC and identifies typical failure modes the converter is likely to exhibit and hence must be tested for.

Exact Fault Simulation for Systems on Silicon that Protects Each Core's Intellectual Property (IP) [p. 804]

M. Quasem and S. Gupta

We present a fault simulation approach for multi-core systems on silicon (SOC) (a) that provides exact fault coverage for the entire SOC, (b) does so without revealing any intellectual property (IP) of core vendors, and (c) whose run time is comparable to that required by the existing approaches that require all IP to be revealed. This fault simulator assumes a full scan SOC design and is first in a suite of simulation, test generation, and DFT tools that are currently under development. The proposed approach allows flexibility in selection of a test methodology for SOC, reduces test application cost and area and performance overheads, and allows more comprehensive testing.

Using Mission Logic for Embedded Testing [p. 805]

R. Dorsch and H. Wunderlich

Testing logic cores of a system-on-a-chip causes a high test data volume which has to be stored on the external automatic test equipment (ATE), a high bandwidth requirement between ATE and the chip under test implying the need for high-speed ATE. This paper reduces these requirements by reusing embedded cores during test mode as embedded testers. Hard, firm, and soft cores may be reused, since only the functionality of the core in system mode is used.

A Regularity-Based Hierarchical Symbolic Analysis Method for Large-Scale Analog Networks [p. 806]

A. Doboli and R. Vemuri

The main challenge for any symbolic analysis method is the exponential size of the produced symbolic expressions [2] (10¹¹ terms for an op amp [1]). Current research considers two ways of handling this limitation: approximation of symbolic expressions and hierarchical methods. Approximation methods [2] retain only the significant terms of the symbolic expressions and eliminate the insignificant ones. The difficulty, however, lies in identifying what terms to eliminate and what the resulting approximation error could be. Hierarchical methods [1] tackle the symbolic analysis problem in a divide-and-conquer manner. They consider only one part of the global network at a time and then recombine partial expressions for finding overall symbolic formulas. Existing hierarchical methods have a main limitation in that they are not feasible for addressing networks that are built of tightly coupled blocks i.e. operational amplifiers [2].

An Improved Hierarchical Classification Algorithm for Structural Analysis of Integrated Circuits [p. 807]

M. Olbrich, A. Rein, and E. Barke

A new and efficient combination of signal tracing and block recognition techniques for circuit analysis is proposed. It utilizes the benefits of both approaches to solve problems such as signal flow or gate recognition. The analysis process is easily controlled by a user definable rule set where ports, nets and blocks are attributed with types. After structural investigation a hierarchical netlist is produced providing block information as subcircuits. As an important feature, the algorithm allows the handling of optional ports as well. Thus, this flexible approach is applicable to various circuit types and works on several abstraction levels.

Automatic Nonlinear Memory Power Modelling [p. 808]

E. Schmidt, G. Jochens, L. Kruse, F. Theeuwen, and W. Nebel

Power estimation and optimization is an increasingly important issue in IC design. The memory subsystem is a significant aspect, since memory power can dominate total system power. Estimation and optimization hence rely heavily on models for embedded memories. We present an effective black box modelling methodology for generating nonlinear memory models automatically. The resulting models are accurate, computationally modest, and in analytical form. They outperform linear models by far. Average absolute relative errors are below 6%.

An Operation Rearrangement Technique for Power Optimization in VLIW Instruction Fetch [p. 809]

D. Shin, J. Kim, and N. Chang

In VLIW machines where a single instruction contains multiple operations, the power consumption during instruction fetches varies significantly depending on how the operations are arranged within the instruction. In this paper, we describe a post-pass operation rearrangement method that reduces the power consumption from the instruction-fetch datapath. The proposed method modifies operation placement orders within VLIW instructions so that the switching activity between successive instruction fetches is minimized. Our experiment shows that the switching activity can be reduced by 34% on average for benchmark programs.

A Pseudo Delay-Insensitive Timing Model to Synthesizing Low-Power Asynchronous Circuits [p. 810]

O. Garnica, J. Lanchares, and R. Hermida

The aim of this paper is to present a new approach to creating high performance, low-power and low-area asynchronous circuits using high level design tools. In order to achieve this, we introduce the new timing model on which this approach is based on. Following this, we present the results from comparing, for a set of benchmarks, our implementation with other implementations.

A Register-Transfer-Level Fault Simulator for Permanent and Transient Faults in Embedded Processors [p. 811]

C. Rousselle, M. Pflanz, A. Behling, T. Mohaupt, and H. Vierhaus

HEARTLESS (Hierarchical Register-Transfer-Level Fault-Simulator for Permanent & Transient Faults) was developed to simulate the behavior of complex sequential designs like processor cores in case of transient and permanent faults. HEARTLESS can be enhanced by propagation over macros described in a C++-function. Available is a C-interface for access to internal signals during the simulation.

Efficient Finite Field Digit-Serial Multiplier Architecture for Cryptography Applications [p. 812]

G. Bertoni, L. Breveglieri, and P. Fragneto

Cryptographic applications in embedded systems for smart-cards require low-latency, low-complexity and low power dedicated hardware. In this work the GBB algorithm for finite field multiplication is optimised by recoding and the related digit-serial VLSI multiplier architecture is designed and evaluated [6].

Task Concurrency Management Methodology Summary [p. 813]

C. Wong, P. Marchal, P. Yang, F. Catthoor, H. De Man, A. Prayati, N. Cossement, R. Lauwereins, and D. Verkest

This paper summarizes a new methodology for the design of concurrent dynamic real-time embedded systems. The framework of our methodology is depicted in Fig. 1. An embedded system can be specified at a grey-box abstraction level in a combined MTG-CDFG model [6]. The grey-box model is different from both the detailed white-box model [1] where all the operations are considered during the mapping and where too much information is present to allow a system wide exploration, and the black-box model [2, 3] where insufficient information is available to accurately steer even the most crucial cost trade-offs. In contrast, the grey-box specification is functional in representing the concepts of concurrency, timing constraints and interaction at either an abstract or a more detailed level, depending on what is required to perform a thorough exploration of the decisions afterwards. We believe that task concurrency management can be implemented in four major steps [4]. Firstly, the grey-box model is built, including the necessary concurrency extraction. Then transformations are applied on the specified MTG-CDFG to increase the opportunities for concurrency exploration and cost minimization [5]. Then static scheduling will be applied on the design-time analyzable parts of the grey-box model, including processor assignment in the multiple processor context. Finally, a dynamic scheduler will schedule the dynamic and coarse-grain constructs at run time on the given platform while making trade-offs based on Pareto curves. The main driver for our work is the MPEG-4 IM1 player. Experiment results confirm the validity of our assumptions and the usefulness of our approach [4, 5].

Susceptibility of Analog Cells to Substrate Interference [p. 814]

F. Fiori

This paper deals with the susceptibility of smart power integrated circuits to substrate interference. In particular, propagation of RF interference through substrate and its effects on analog cells are investigated. A new method, developed to identify a parasitic substrate-coupling network in VLSI devices, has been customized for a smart power technology process. The layout view of a specific circuit is elaborated in order to extract a netlist composed of circuits in the die surface and the substrate parasitic network. Predictions are obtained by executing time-domain simulations. A simple test circuit composed of a power transistor and an OTA is considered. Investigations are carried out for various layout of the same test circuit and the effectiveness of shielding substrate contacts is evaluated.

Order Determination for Frequency Compensation of Negative-Feedback Systems [p. 815]

A. van Staveren and C. Verhoeven

To maximize the bandwidth of dedicated negative-feedback amplifiers by passive frequency compensation, the order of the amplifier needs to be known. Here a method is introduced to determine the order of a circuit with negative feedback. It is shown that the sum of poles in the negative-feedback loop, i.e. the loop poles, can be used to determine the order of the amplifier. These loop poles can be found relatively easily from the circuit diagram and thus the order of the circuit is also relatively easily found.

Minimizing the Number of Floating Bias Voltage Sources with Integer Linear Programming [p. 816]

E. Yildiz, A. van Staveren, and C. Verhoeven

Applying the non-heuristic biasing theory as described in [1] results in circuits which are optimally biased. However the resulting circuits will contain many floating voltage sources. This one page paper describes the use of Integer Linear Programming to minimize the number of these sources.

CMOS Sizing Rule for High Performance Long Interconnects [p. 817]

G. Cappuccino and G. Cocorullo

During the past fifteen years, the role of interconnects has turned to be the determining factor of the overall performance of VLSI circuits. In this work, the Authors present a new transistor sizing rule for long interconnect buffers. It is shown how transmission line properties of long interconnects alter the behaviour of the CMOS buffer, forcing transistors to work mainly in linear mode rather than in saturation as is usually assumed. This unusual condition leads to strong mismatching between predicted and actual driver output impedance if conventional sizing rules are used. The proposed sizing rule allows true line matching to be achieved, thus either minimizing delay or preserving signal integrity.

On Automatic Analysis of Geometrically Proximate Nets in VLSI Layout [p. 818]

S. Koranne and O. Gangwal

We address the problem of automatic analysis of geometrically proximate nets in VLSI layout by presenting a framework (named FASCL) which supports pairwise analysis of nets based on a geometric kernel. The exact form of the analysis function can be specified to the kernel, which assumes a coupling function based on pairwise interaction between geometrically proximate nets. The user can also attach these functions to conditions and FASCL will automatically apply the function to all pairs of nets which satisfy a condition. Our method runs with sub-quadratic time complexity, O(N^1+k), where N is the number of nets of we have proved that k < 1. We have successfully used the program to analyze circuits for bridging faults, coupling capacitance extraction, crosstalk analysis, signal integrity analysis and delay fault testing.

AnalogRouter: A New Approach of Current-Driven Routing for Analog Circuits [p. 819]

J. Lienig, G. Jerke, and T. Adler

We present a new automatic routing tool, named AnalogRouter, specifically developed to address the problems of current densities and electromigration in routing of multi-terminal, non-planar signal nets in analog circuits. The contributions of our work are:
a new current characterization method based on current vectors attached to each terminal,
current-driven Steiner tree generation which effectively determines all branch currents prior to detailed routing, and
a run-time and memory efficient detailed routing strategy which addresses all features of current- driven routing for analog circuits, particularly varying wire widths.

A Hardware-Software Operating System for Heterogeneous Designs [p. 820]

J. Moya, F. Moya, and J. López

Current embedded systems are made of multiple heterogeneous devices interconnected. These devices present a great variation of functionality, performance, and interfaces. Therefore, it is difficult to build applications for these platforms. In this paper we present some techniques to introduce component-based methodologies into hardware-software code-sign. We make special emphasis on the use of simple, homogeneous interfaces to hide the inherent complexity of current designs. A key contribution is the definition of a HW-SW Operating System that makes system resources available to application developers in a clean, homogeneous way. This greatly simplifies the task of designing complex heterogeneous embedded systems.

PRMDL: A Machine Description Language for Clustered VLIW Architectures [p. 821]

A. Terechko, E. Pol, and J. van Eijndhoven

PRMDL is a format of the central machine description file that contains parameters of the whole retargetable compiler-simulator framework. The format features separate software and hardware views on the processor and defines a wide scope of the framework retargetability, enabling platform-based processor design and vast design space exploration for clustered VLIW architectures.

Functional Units with Conditional Input/Output Behavior in VLIW Processors [p. 822]

M. Bekooij, L. Engels, A. van der Werf, and N. Busá

In this paper we extend the method to deal with coarse-grain operations in static scheduled VLIW Processors as is introduced by Busá [1]. We allow functional units with a controller that does not traverse its states in a predefined way. This makes it possible to execute a function that contains a conditional construct like an if-statement as a single operation on a functional unit. This way the performance penalty otherwise caused by branch instructions is reduced. By adding a valid input and output signals the problem is circumvented that during compilation it is for this type of functional units not known when and how many samples will be consumed or produced. We will refer to these units as Conditional Input/output Units (CIUs). The operations that are executed on CIUs are called Conditional In-put/ output Operations (CIOs). The difference with guarded operations is that the production of a result of a CIO depends on the state of the CIU.

Adaptation of an Event-Driven Simulation Environment to Sequentially Propagated Concurrent Fault Simulation [p. 823]

M. Zolfy, S. Mirkhani, and Z. Navabi

A new fault simulation method is presented here. The method relies on simulation cycle timing of event-driven simulators (delta delays in VHDL). This timing is used for propagation of faulty values in faulty sections of a circuit. This method is based on concurrent fault simulation and is implemented in VHDL. VHDL gate models that are capable of propagating faults in fault queues perform this fault simulation. Gate models process their fault queues and propagate them in delta time units. In these models, gates with faulty input values are expanded in delta time to evaluate faulty output values and propagate them to other sections of the circuit. Using ISCAS benchmarks, a performance improvement of up to 500X over serial fault simulation has been obtained. This work is useful for fault simulation of post-synthesis VHDL outputs.

Constraint Satisfaction for Storage Files with Fifos or Stacks during Scheduling [p. 824]

C. Alba Pinto, B. Mesman, K. van Eijk, and J. Jess

This paper presents a method that, during scheduling of DSP algorithms, handles constraints of storage files with fifos or stacks together with resource- and timing constraints. Constraint analysis techniques and the characteristics of the exact coloring of conflict graphs are used to identify values that are bottlenecks for storage assignment with the aim of ordering their accesses. This is done with pairs of values until it can guarantee that all constraints will be satisfied.