DATE 2002 ABSTRACTS

Sessions: [Keynote] [1A] [1B] [1C] [1D] [1E] [2A] [2B] [2C] [2D] [2E] [2F] [3A] [3B] [3C] [3D] [3E] [3F2] [4A] [4B] [4C] [4D] [4E] [5A] [5B] [5C] [5D] [5E] [6A] [6B] [6C] [6D] [6E] [6F] [7A] [7B] [7C] [7D] [7E] [8A] [8B] [8C] [8D] [8E] [9A] [9B] [9C] [9D] [9E] [9G] [10A] [10B] [10C] [10D] [10E] [Posters]


Plenary -- Keynote Session

Moderator: J. da Franca, ChipIdea, PT

On Nanoscale Integration and Gigascale Complexity in the Post .Com World [p. 12]
Hugo De Man, Professor, KU Leuven, Senior Research Fellow, IMEC, BE

While process technologists are obsessed to follow Moore's curve down to nanoscale dimensions, design technologists are confronted with gigascale complexity. On the other hand, post-PC and post dotcom products require zero cost, zero energy yet software programmable novel system architectures to be sold in huge volumes and to be designed in exponentially decreasing time. How do we cope with these novel silicon architectures? What challenges in research does this create? How to create the necessary tools and skills and how to organize research and education in a world driven by shareholders value? Can you spare half an hour to reflect on these challenges to the design community?

Global Responsibilities in SOC Design [p. 12]
Taylor Scanlon, President & CEO, Virtual Silicon Technology, US

The technical complexities of advanced SoC design are compounded by changes in the economic structure of the worldwide semiconductor industry. A look at some of the organizational and personal responsibilities that will be required to meet the challenges of SoC design in the Future


1A: Hot Topic -- How to Choose Semiconductor IP?

Organizer: Yervant Zorian, Virage Logic, US
Moderator: Nic Mokhoff, EE Times, US

How to Choose Semiconductor IP? -- Embedded Processors [p. 14]
I. Phillips

It is well recognised that the process of new product development, introduction and marketing is fraught with difficulty. Indeed the probability of achieving plan timescale, costs and budget are so low, that some degree of failure is inevitable. So whilst the primary role of the Manager is to identify and minimise all major risks, and make sure the ones remaining are adequately resourced: A secondary role is to make sure that what failure does occur, does not damage his/her reputation! The Virtual Component appears in the context of risk minimisation. CPU or UART, the motive is the same, get the right product from concept to customer as quickly as possible. The make or buy decision is a risk/cost trade-off, and as the cost of failure is normally so high, risk emerges as the dominant factor; Is it riskier to design, or buy-in?

Make Your SOC Design a Winner: Select the Right Memory IP [p. 15]
V. Ratford

The 2000 SIA roadmap shows over 50 % of the area in an SOC being occupied by embedded memory. The selection of the memory IP and supplier is critical to the success of the design and the ramp to volume. The Memory IP can determine yield, reliability, cost, speed and/or power. Mr. Ratford will help you navigate through the evaluation process by discussing key requirements and possible solutions when evaluating memory for your next SOC design.

How to Choose Semiconductor IP: Embedded Software [p. 16]
G. Martin

Embedded software Intellectual Property (IP) is becoming vital for today's complex System-on-Chips We first define the notion of Hardware-dependent Software, and then review the multidimensional criteria for choosing ESW IP, including retargetability and portability, flexibility, optimisation, validation and certification.

IP Day: How to Choose Semiconductor IP? [p. 17]
P. Bricaud

The semiconductor industry gave the most tremendous challenge to the electronic design community and EDA industry by making available a silicon capacity that exceeds by far what today's designer can utilise in a reasonable amount of time. Reasonable timeframes for System-on-a-Chip developments in the multimedia and communication markets are less than eighteen months, when not even nine! I would like to give credit to Gary Smith, Chief Analyst at Dataquest, to have raised a very pertinent media alert in his article, `The Revolution isn't Coming -- It's Already Here', in Virtual Chip Design, May 1997. It was clearly stated that in order to fill the design gap between available gates on silicon and design methodology, the solution was through system-level integration (SLI) using what was called at that time system-level macros (SLM). The electronic design community and EDA companies picked up the gauntlet and started what will be known as the Virtual Components creation through the industry organisation called the Virtual Socket Interface Alliance (VSIA). This was followed by Mentor Graphics and Synopsys who signed a Design Reuse Partnership, which led to the publishing of the "Reuse Methodology Manual for SoC Designs". The last stage was to create an industry accepted Virtual Component Quality Spreadsheet by merging the two efforts.


1B: Formal Verification of Complex Designs

Moderators: L Fix, Intel, ISR; T. Kropf, Bosch, DE

Formal Verification of the Pentium 4 Floating-Point Multiplier [p. 20]
R. Kaivola and N. Narasimhan

We present the formal verification of the floating-point multiplier in the Intel IA-32 Pentium© 4 microprocessor. The verification is based on a combination of theorem-proving and BDD based model-checking tasks performed in a unified hardware verification environment. The tasks are tightly integrated to accomplish complete verification of the multiplier hardware coupled with the rounder logic. The approach does not rely on specialized representations like Binary Moment Diagrams or its variants.

Using Rewriting Rules and Positive Equality to Formally Verify Wide-Issue Out-of-Order Microprocessors with a Reorder Buffer [p. 28]
M. Velev

Rewriting rules and Positive Equality [4] are combined in an automatic way in order to formally verify out-of-order processors that have a Reorder Buffer, and can issue/retire multiple instructions per clock cycle. Only register-register instructions are implemented, and can be executed out-of-order, as soon as their data operands can be either read from the Register File, or forwarded as results of instructions ahead in program order in the Reorder Buffer. The verification is based on the Burch and Dill correctness criterion [6]. Rewriting rules are used to prove the correct execution of instructions that are initially in the Reorder Buffer, and to remove them from the correctness formula. Positive Equality is then employed to prove the correct execution of newly fetched instructions. The rewriting rules resulted in up to 5 orders of magnitude speedup, compared to using Positive Equality alone. That made it possible to formally verify processors with up to 1,500 instructions in the Reorder Buffer, and issue/retire widths of up to 128 instructions per clock cycle.

Automatic Verification of In-Order Execution In Microprocessors with Fragmented Pipelines and Multicycle Functional Units [p. 36]
P. Mishra, N. Dutt, A. Nicolau, and H. Tomiyama

As embedded systems continue to face increasingly higher performance requirements, deeply pipelined processor architectures are being employed to meet desired system performance. System architects critically need modeling techniques that allow exploration, evaluation, customization and validation of different processor pipeline configurations, tuned for a specific application domain. We propose a novel Finite State Machine (FSM) based modeling of pipelined processors and define a set of properties that can be used to verify the correctness of in-order execution in the presence of fragmented pipelines and multicycle functional units. Our approach leverages the system architect's knowledge about the behavior of the pipelined processor, through Architecture Description Language (ADL) constructs, and thus allows a powerful top-down approach to pipeline verification. We applied this methodology to the DLX processor to demonstrate the usefulness of our approach.

A Case Study for the Verification of Complex Timed Circuits: IPCMOS [p. 44]
M. Peña, J. Cortadella, E. Pastor, and A. Smirnov

The verification of a n-stage pulse-driven IPCMOS pipeline, for any n > 0, is presented. The complexity of the system is 32n transistors and delay information is provided at the level of transistor. The correctness of the circuit highly depends on the timed behavior of its components and the environment. To verify the system, three techniques have been combined: (1) relative-timing-based verification from absolute timing information [13], (2) assume-guarantee reasoning to verify untimed abstractions of timed components and (3) mathematical induction to verify pipelines of any length. Even though the circuit can interact with pulse-driven environments, the internal behavior between stages commits a handshake protocol that enables the use of untimed abstractions. The verification not only reports a positive answer about the correctness of the system, but also gives a set of sufficient relative-timing constraints that determine delay slacks under which correctness can be maintained.


1C: Cooling Layout Arrangements

Moderators: R.H.J.M. Otten, TU Eindhoven, NL; M.D.F. Wong, Texas U, US

FPGA Placement by Thermodynamic Combinatorial Optimization [p. 54]
J. De Vicente, J. Lanchares, and R. Hermida

In this paper, the placement problem on FPGAs is faced using Thermodynamic Combinatorial Optimization (TCO). TCO is a new combinatorial optimization method based on both Thermodynamics and Information Theory. In TCO two kinds of processes are considered: microstate and macrostate transformations. Applying the Shannon's definition of Entropy to microstate reversible transformations, a probability of acceptance based on Fermi-Dirac statistics is derived. On the other hand, applying thermodynamic laws to reversible macrostate transformations, an efficient annealing schedule is provided TCO has been compared with Simulated Annealing (SA) on a set of benchmark circuits for the FPGA placement problem. TCO has achieved large time reductions with respect to SA, while providing interesting adaptive properties.

An Enhanced Q-Sequence Augmented with Empty-Room-Insertion and Parenthesis Trees [p. 61]
C. Zhuang, Y. Kajitani, K. Sakanushi, and L. Jin

After the discussion on the difference between floorplanning and packing in VLSI placement design, this paper adapts the floorplanner that is based on the Q-sequence to packing algorithm. For the purpose, some empty room insertion is required to guarantee not to miss the optimum packing. To increase the performance in packing, a new move that perturbs the floorplan is introduced in terms of the Parenthesis-Tree Pair . A Simulated Anealing based packing search algorithm was implemented. Experimental results showed the effect of empty room insertion.

Arbitrary Convex and Concave Rectilinear Module Packing Using TCG [p. 69]
J. Lin, H. Chen, and Y. Chang

In this paper, we deal with arbitrary convex and concave rectilinear module packing using the Transitive Closure Graph (TCG) representation. The geometric meanings of modules are transparent to TCG and its induced operations, which makes TCG an ideal representation for floorplanning/ placement with arbitrary rectilinear modules. We first partition a rectilinear module into a set of submodules and then derive necessary and sufficient conditions of feasible TCG for the submodules. Unlike most previous works that process each submodule individually and thus need post processing to fix deformed rectilinear modules, our algorithm treats a set of submodules as a whole and thus not only can guarantee the feasibility of each perturbed solution but also can eliminate the need of the post processing on deformed modules, implying better solution quality and running time. Experimental results show that our TCG-based algorithm is capable of handling very complex instances; further, it is very efficient and results in better area utilization than previous work.


1D: Defect Oriented Test

Moderators: J. Segura, Illes Balears U, ES; H. Manhaeve, Q-Star Test, BE

A Test Design Method for Floating Gate Defects (FGD) in Analog Integrated Circuits [p. 78]
M. Pronath, H. Graeb, and K. Antreich

A unified approach to fault simulation for FGDs is introduced. Instead of a direct fault simulation, the proposed approach calculates indirectly from the simulator output the sets of undetectable values of the trapped charge on the floating gate transistor. It covers all potential gate charges of an FGD at one or more transistors and allows the application of conventional circuit simulators for simulating DC, AC and transient test. Based on this fault simulation, a test design methodology is presented that can determine all test sets that detect all FGDs for all possible values of gate charge.

Exact Grading of Multiple Path Delay Faults [p. 84]
S. Padmanaban and S. Tragoudas

The problem of fault grading for multiple path delay faults is studied and a method of obtaining the exact coverage is presented. The faults covered are represented and manipulated as sets by zero-suppressed binary decision diagrams (ZBDD), which are shown to be able to store a very large number of path delay faults. For the extreme case of memory problem, a method to estimate the coverage of the test set is also presented. The problem of fault grading is solved with a polynomial number of BDD operations. Experimental results on the ISCAS'85 benchmark include test sets from ATPG tools and specifically designed tests in order to investigate the limitations and properties of the proposed method.

Modeling Techniques and Tests for Partial Faults in Memory Devices [p. 89]
Z. Al-Ars and A. van de Goor

It has always been assumed that fault models in memories are sufficiently precise for specifying the faulty behavior. This means that, given a fault model, it should be possible to construct a test that ensures detecting the modeled fault. This paper shows that some faults, called partial faults, are particularly difficult to detect. For these faults, more operations are required to complete their fault effect and to ensure detection. The paper also presents fault analysis results, based on defect injection and simulation, where partial faults have been observed. The impact of partial faults on testing is discussed and a test to detect these partial faults is given.
Key words: partial faults, DRAMs, fault models, defect simulation, memory testing, completing operations.

A New ATPG Algorithm to Limit Test Set Size and Achieve Multiple Detections of All Faults [p. 94]
S. Lee, B. Cobb, J. Dworak, M. Grimaila, and M. Mercer

Deterministic observation and random excitation of fault sites during the ATPG process dramatically reduces the overall defective part level. However, multiple observations of each fault site lead to increased test set size and require more tester memory. In this paper, we propose a new ATPG algorithm to find a near-minimal test pattern set that detects faults multiple times and achieves excellent defective part level. This greedy approach uses 3-value fault simulation to estimate the potential value of each vector candidate at each stage of ATPG. The result shows generation of a close to minimal vector set is possible only using dynamic compaction techniques in most cases. Finally, a systematic method to trade-off between defective part level and test size is also presented.


1E: Power Analysis and Management in Networks and Processors

Moderators: E. Macii, Politecnico di Torino, IT; K. Roy, Purdue U, US

Low Power Error Resilient Encoding for On-Chip Data Buses [p. 102]
D. Bertozzi, L. Benini, and G. De Micheli

As technology scales toward deep submicron, on-chip interconnects are becoming more and more sensitive to noise sources such as power supply noise, crosstalk, radiation induced effects, etc. Transient delay and logic faults are likely to reduce the reliability of data transfers across datapath bus lines. This paper investigates how to deal with these errors in an energy efficient way. We could opt for error correction, which exhibits larger decoding overhead, or for the retransmission of the incorrectly received data word. Provided the timing penalty associated with this latter technique can be tolerated, we show that retransmission strategies are more effective than correction ones from an energy viewpoint, both for the larger detection capability and for the minor decoding complexity. The analysis was performed by implementing several variants of a Hamming code in the VHDL model of a processor based on the Sparc V8 architecture, and exploiting the characteristics of AMBA bus slave response cycles to carry out retransmissions in a way fully compliant with this standard on-chip bus specification.

Managing Power Consumption in Networks on Chip [p. 110]
T. Simunic and S. Boyd

Systems on a chip (SOCs) are rapidly evolving into larger networks on a chip (NOCs). This work presents a new methodology for managing power consumption for NOCs. Power management problem is formulated using closed-loop control concepts, with the estimator tracking changes in the system parameters and recalculating the new power management policy accordingly. Dynamic voltage scaling and local power management are formulated in the node-centric manner, where each core has its local power manager that determines units power states. The local power manager's interaction with the other system cores regarding the power and the QoS needs enables network-centric power management. The new methodology for power management of NOCs is tested on a system consisting of four satellite units, each with the local power manager capable of both node and network centric power management. The results show large savings in power with good QoS.

Competitive Analysis of Dynamic Power Management Strategies for Systems with Multiple Power Savings States [p. 117]
S. Irani, R. Gupta, and S. Shukla

We present strategies for "online" dynamic power management(DPM) based on the notion of the competitive ratio that allows us to compare the effectiveness of algorithms against an optimal strategy. This paper makes two contributions: it provides a theoretical basis for the analysis of DPM strategies for systems with multiple power down states; and provides a competitive algorithm based on probabilistically generated inputs that improves the competitive ratio over deterministic strategies. Experimental results show that our probability-based DPM strategy improves the efficiency of power management over the deterministic DPM strategy by 25%, bringing the strategy to within 23% of the optimal offline DPM.

AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors [p. 124]
D. Ponomarev, G. Kucuk, and K. Ghose

This paper describes the AccuPower toolset -- a set of simulation tools accurately estimating the power dissipation within a superscalar microprocessor. AccuPower uses a true hardware level and cycle level microarchitectural simulator and energy dissipation coefficients gleaned from SPICE measurements of actual CMOS layouts of critical datapath components. Transition counts can be obtained at the level of bits within data and instruction streams, at the level of registers, or at the level of larger building blocks (such as caches, issue queue, reorder buffer, function units). This allows for an accurate estimation of switching activity at any desired level of resolution. The toolsuite implements several variants of superscalar datapath designs in use today and permits the exploration of design choices at the microarchitecture level as well as the circuit level, including the use of voltage and frequency scaling. In particular, the AccuPower toolsuite includes detailed implementations of currently used and proposed techniques for energy/power conservations including techniques for data encoding and compression, alternative circuit approaches, dynamic resource allocation and datapath reconfiguration. The microarchitectural simulation components of AccuPower can be used for accurate evaluation of datapath designs in a manner well beyond the scope of the widely-used Simplescalar tools.


2A: Panel -- What is the Right IP Business Model?

Organizer: Y. Zorian, Virage Logic, US
Moderator: K. Bartleson, Synopsys, US
Panellists: J. Tully, Gartner Dataquest, US; G. Toomajanian, Dain Rauscher Wessels, US;
E. Desai, Desaisive Technology Research, US; M. Hosseini, WIT Soundview, US; V. Essi, AH&H, UA

IP is All About Implementation and Customer Satisfaction [p. 132]
V. Essi, Jr.

Intellectual property, or IP, takes on many different meanings depending upon the context within which it is utilized. Our IP discussion focuses on the rapidly evolving world of technology IP and, more specifically, semiconductor IP. Our core belief is that in order to be successful, semiconductor IP must be more than an idea or innovation. It must be implemented seamlessly, with little resistance from the customer and have compelling value add to the customer upon implementation and thereafter. The heart of the customer's purchase decision is where we believe semiconductor IP models need to be the most focused. Is there a right model in every case? No. In fact, we would argue that the right model is the one that makes your customer's adoption the easiest. In some respects, we would compare most IP purchase decisions as fitting the classic make or buy scenario. Customers are only willing to embrace third party IP to save costs. Sure we can get off the track and discuss technology leads or other forms of "killer IP", but cost is at the root of almost every IP decision and, more precisely, a make or buy analysis.


2B: SAT and BDD Techniques

Moderators: T. Shiple, Synopsys, FR; R. Drechsler, Bremen U, DE

Using Problem Symmetry in Search Based Satisfiability Algorithms [p. 134]
E. Goldberg, M. Prasad, and R. Brayton

We introduce the notion of problem symmetry in searchbased SAT algorithms. We develop a theory of essential points to formally characterize the potential search-space pruning that can be realized by exploiting problem symmetry. We unify several search-pruning techniques used in modern SAT solvers under a single framework, by showing them to be special cases of the general theory of essential points. We also propose a new pruning rule exploiting problem symmetry. Preliminary experimental results validate the efficacy of this rule in providing additional searchspace pruning beyond the pruning realized by techniques implemented in leading-edge SAT solvers.

BerkMin: A Fast and Robust Sat-Solver [p. 142]
E. Goldberg and Y. Novikov

We describe a SAT-solver, BerkMin, that inherits such features of GRASP, SATO, and Chaff as clause recording, fast BCP, restarts, and conflict clause "aging". At the same time BerkMin introduces a new decision making procedure and a new method of clause database management. We experimentally compare BerkMin with Chaff, the leader among SAT-solvers used in the EDA domain. Experiments show that our solver is more robust than Chaff. BerkMin solved all the instances we used in experiments including very large CNFs from a microprocessor verification benchmark suite. On the other hand, Chaff was not able to complete some instances even with the timeout limit of 16 hours.

Dynamic Scheduling and Clustering in Symbolic Image Computation [p. 150]
G. Cabodi, P. Camurati, and S. Quer

The core computation in BDD-based symbolic synthesis and verification is forming the image and pre-image of sets of states under the transition relation characterizing the sequential behavior of the design. Computing an image or a pre-image consists of ordering the latch transition relations, clustering them and eventually re-ordering the clusters. Existing algorithms are mainly limited by memory resources. To make them as efficient as possible, we address a set of heuristics with the main target of minimizing the memory used during image computation. They include a dynamic heuristic to order the latch relations, a dynamic framework to cluster them, and the application of conjunctive partitioning during image computation. We provide and integrate a set of algorithms and we report references and comparisons with recent work. Experimental results are given to demonstrate the efficiency and robustness of the approach.


2C: Technology and Interconnect Issues in Low Power Design

Moderators: S. Huss, TU Darmstadt, DE; D. Auvergne, LIRMM, F

Wire Placement for Crosstalk Energy Minimization in Address Buses [p. 158]
L. Macchiarulo, E. Macii, and M. Poncino

We propose a novel approach to bus energy minimization that targets crosstalk effects. Unlike previous approaches, we try to reduce energy through capacitance optimization, by ad opting nonuniform spacing between wires. This allows reduction of power, and at the same time takes into account signal integrity. Therefore, performance is not degraded. Results show that the method saves up to 30% of total bus energy at no cost in performance or complexity of the design (no encoding-decoding circuitry is needed), and limited cost in area.

Dynamic VTH Scaling Scheme for Active Leakage Power Reduction [p. 163]
C. Kim and K. Roy

We present a Dynamic VTH Scaling (DVTS) scheme to save the leakage power during active mode of the circuit. The power saving strategy of DVTS is similar to that of the Dynamic VDD Scaling (DVS) scheme, which adaptively changes the supply voltage depending on the current workload of the system. Instead of adjusting the supply voltage, DVTS controls the threshold voltage by means of body bias control, in order to reduce the leakage power. The power saving potential of DVTS and its impact on dynamic and leakage power when applied to future technologies are discussed. Pros and cons of the DVTS system are dealt with in detail. Finally, a feedback loop hardware for the DVTS which tracks the optimal VTH for a given clock frequency, is proposed. Simulation results show that 92% energy savings can be achieved with DVTS for 70nm circuits.

Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints [p. 168]
A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, and A. Nicolau

Dynamic voltage scaling (DVS) is a known effective mechanism for reducing CPU energy consumption without significant performance degradation. While a lot of work has been done on inter-task scheduling algorithms to implement DVS under operating system control, new research challenges exist in intra-task DVS techniques under software and compiler control. In this paper we introduce a novel intra-task DVS technique under compiler control using program checkpoints. Checkpoints are generated at compile time and indicate places in the code where the processor speed and voltage should be re-calculated. Checkpoints also carry user-defined time constraints. Our technique handles multiple intra-task performance deadlines and modulates power consumption according to a run-time power budget. We experimented with two heuristics for adjusting the clock frequency and voltage. For the particular benchmark studied, one heuristic yielded 63% more energy savings than the other. With the best of the heuristics we designed, our technique resulted in 82% energy savings over the execution of the program without employing DVS.

Sizing Power/Ground Meshes for Clocking and Computing Circuit Components [p. 176]
A. Mukherjee, K. Wang, L. Chen, and M. Marek-Sadowska

This paper presents a new formulation and an efficient solution of the power and ground mesh sizing problem. We use the key observations that (1) the drops in power and ground node potentials are due not only to currents drawn by the computing blocks, but also to those drawn by the clock buffers, and (2) changes of circuit component delays are linearly proportional to the power/ground IR-drops. This leads to a linear quantification of the timing relations between the clocking and computing components in terms of the power/ground IR-drops. Our method removes all IR-drop related timing violations that occur in about 2% of paths when grids are sized using the existing methods that satisfy the maximum IR-drop constraints. In addition, we achieve supply mesh area improvements of the order of 30% while simultaneously reducing the power dissipated in the circuits by about 6.6% compared to traditional grid sizing methods.


2D: Advanced Mixed Signal Test

Moderators: J. Huertas, CNM-IMSE, ES; B. Kaminska, Fluence Technology, US

A Signature Test Framework for Rapid Production Testing of RF Circuits [p. 186]
R. Voorakaranam, S. Cherubal, and A. Chatterjee

Production test costs for today's RF circuits are rapidly escalating. Two factors are responsible for this cost escalation: (a) the high cost of RF ATEs and (b) long test times required by elaborate performance tests. In this paper, we propose a framework for low-cost signature test of RF circuits using modulation of a baseband test signal and subsequent demodulation of the DUT response. The demodulated response of the DUT is used as a "signature" from which all the performance specifications are predicted. The applied test signal is optimized in such a way that the error between the measured DUT performances and the predicted DUT performances is minimized. The proposed low-cost solution can be easily built into a load board that can be interfaced to an inexpensive tester.

Analog IP Testing: Diagnosis and Optimization [p. 192]
C. Guardiani, P. McNamara, L. Daldoss, S. Saxena, S. Zanella, W. Xiang, and S. Liu

In this paper, we present an innovative methodology to estimate and improve the quality of analog and mixed-signal circuit testing. We first detect and reduce the redundancy in the electrical test measurements (e-tests), then we identify the e-test acceptability regions by considering performance specifications as well as process parameter distributions. Finally, we provide an effective metric for the accurate assessment of the parametric test coverage of embedded analog IP. Experimental results confirm the validity of the proposed methodology and its broad applicability to analog, mixed-signal and RF applications for different process technologies.

A New Design Flow and Testability Measure for the Generation of a Structural Test and BIST for Analogue and Mixed-Signal Circuits [p. 197]
C. Hoffmann

For the generation of defect-oriented tests a system is developed that includes the synthesis of self-test structures. With the objective to generate a highly efficient analogue test, the fault simulation methods are greatly enhanced: (1) A new testability measure, (2)the possibility to distinguish between not-to-detect and hard-to-detect faults with respect to the tolerances of the respective measurement system. By presenting a new design flow and using the fault simulation in a very early design stage a tool-suite is developed. It allows to control the defect-robust layout and to eliminate those faults that limit the efficiency of a measurement system. This allows for economic self-test applications! It is demonstrated that the system finds the most efficient and less expense test for a given fault set. With the presented results it is possible to include the defect-oriented approach from the fault simulation to the automatic generation of layout rules and the test synthesis in an industrial design flow.

Built-In Dynamic Current Sensor for Hard-to-Detect Faults in Mixed-Signal ICs [p. 205]
Y. Lechuga, R. Mozuelos, M. Martínez, and S. Bracho

There are some types of faults in analogue and mixed signal circuits which are very difficult to detect using either voltage or current based test methods. However, it is possible to detect these faults if we add to the conventional dynamic power supply current test methods IDDT, the analysis of the changes in the slope of this dynamic power supply current. In this work, we present a Built-In Current Sensor (BICS) which is able to process the highest frequency components in the dynamic power supply current of the circuit under test (CUT). The BICS add to the resistive sensor an inductance made from a gyrator and a capacitor to carry out the current to voltage conversion. Moreover, the proposed test method improves the fault coverage in continuous circuits and switched current circuits as well.


2E: Collaborative Design -- Web-Services, Infrastructure, Applications

Moderators: A. Sauer, FhG EAS/IIS, DE; A. Pawlak, ITE Warsaw, PL

E-Design Based on the Reuse Paradigm [p. 214]
L. Ghanmi, A. Ghrab, M. Hamdoun, B. Missaoui, K. Skiba, and G. Saucier

This paper gives an overview on a Virtual electronic component or IP (Intellectual Property) exchange infrastructure whose main components are a XML "well structured IP e-catalog Builder TM" and a" XML IP profiler TM While the first module is a e_publishing and an exchange management module the second has as role to extract from the design directories the IP files and to trigger their transfer to the user site possibly via an IP distribution server under the catalog control. Direct Design file extraction from commercial configuration systems such as CVS and Clearcase is supported; notice also that the architecture supports if required a network of IP distribution servers preventing from a performance bottleneck when exchanging IPs; both modules have been implemented respectively in Java Servlet and as a Java client/server application.

Internet-Based Collaborative Test Generation with MOSCITO [p. 221]
A. Schneider, K. Diener, E. Ivask, J. Raik, R. Ubar, P. Miklos, T. Cibáková, and E. Gramatová

This paper offers an Internet-based environment for enhancing problem-specific design flows with test pattern generation and fault simulation capabilities. Automatic Test Pattern Generation (ATPG) and fault simulation tools at structural and hierarchical levels available at geographically different places running under the virtual environment using the MOSCITO system are presented. These tools can be used separately, or in multiple applications, for test pattern generation of digital circuits. In order to link different tools together and with commercial design systems, respectively a set of translators was developed. The functionality of the integrated design and test system was verified by several benchmark circuits.

A Two-Tier Distributed Electronic Design Framework [p. 227]
T. Kazmierski and N. Clayton

We present the concept of a distributed, web-based electronic design framework. The salient feature of our system is the extension of the client-server architecture to two-tiers, with the web server serving client requests whilst acting as client to the tool servers. In the sample application of the framework, developed in Java, any of the servers can be based on Linux, MS Windows or Sun-SPARC server. The web server that has been used to demonstrate the framework for on-line access to VAMS (a VHDL-AMS parser) and Avant! HSPICE is currently available for Linux but has been developed with a truly platform independent implementation in mind.

Embedded System Design Based On Webservices [p. 232]
A. Rettberg and W. Thronicke

The structure of Internet applications and scenarios is changing rapidly today. This offers new potential for established technologies and methods to expand their area of application. New technologies encourage new methodologies to design processes and business-to-business applications. The application of such new advancements should be extended into the domain of the electronic design automation (EDA) industry. In this paper we present an approach to use webservices in the field of embedded system design.


2F: Panel -- Who Owns the Platform?

Organizer/Moderator: W. Wolf, Princeton U, US
Panellists: M. Pinto, Agere, US; P. Paulin, STMicroelectronics, CA; C. Rowen, Tensilica, US;
O. Levia, Improv Systems, US; G. Saucier, Design-Reuse, FR; V. Gerousis, Infineon, DE

Who Owns the Platform? [p. 238]

As VLSI technology advances, it forces changes in the business organization of the industry. Traditional vertically integrated semiconductor manufacturers are concentrating less on manufacturing as foundries such as TSMC, UMC, and Chartered grow. These foundries supply capacity not only to fables houses but also to even large semiconductor manufacturers. As a result, these semiconductor houses are spending more time creating novel platforms for important applications. This puts them in competition with the systems houses that traditionally were their customers.
In the middle, fabless semiconductor companies try to create new and improved platforms as well, generally with fewer resources than are available to established semiconductor houses.
At the other end, IP companies provide platforms without themselves designing chips. They must rely on persuading customers to license IP rather than designing it internally.


3A: Embedded Tutorial -- The Need for Infrastructure IP in SoCs

Organizer: D. Gizopoulos, Piraeus U, GR
Moderator: G. Smith, Gartner Dataquest, US
Speakers: M. Milligan, HPL Technologies, US; Y. Zorian, Virage Logic, US; S. Pateras, LogicVision, US; M. Nicolaidis, iRoC Technologies, FR

IP for Embedded Robustness [p. 240]
M. Nicolaidis

Drastic device shrinking, power supply reduction, and increasing operating speeds that accompany the technological evolution to very deep submicron, reduce significantly the noise margins and affect the reliability of very deep submicron ICs. Timing faults escaping timing closure analysis and/or manufacturing testing, as well as soft-errors, are creating reliability issues in the field. Soft Errors: In this context, single event upsets (SEUs) are becoming one of the major signal integrity problems. Atmospheric neutrons have become a major source of SEUs in modern VDSM technologies. An SEU is the consequence of a single event transient (SET) created on a sensitive node by a particle striking an integrated circuit. When an SET occurs on a memory-cell node and flips the state of the cell it is transformed to a single event upset (SEU). An additional problem is that in today technologies, soft errors concern not only memories (which has been the case so far) but also logic. An SET, occurring on a node of a logic network, is transformed to an SEU when a latch captures it.

Embedded Diagnosis IP [p. 242]
S. Pateras

Today's market conditions are driving increasingly shorter time to market requirements for semiconductor devices. Effective techniques for achieving quick and accurate debug and fault diagnosis of increasingly complex SOC devices are therefore becoming indispensable. This presentation covers new embedded test based IP and related software tools that provide the desired level of debug and diagnosis.

Embedded Robustness IPs [p. 244]
E. Dupont, M. Nicolaidis, and P. Rohr

Due to the VDSM evolution and an electronic systems market starving for performance, the semiconductor industry is used to hit big technology walls. Challenge after challenge, brand new domains of competencies are popping up followed by fast and accurate tools. Synthesis, routers, verification, DFT, embedded systems, SoC, ... are well established as standard competencies to achieve high quality, high performance and high yield chip production. In recent roadmaps (ITRS, Medea, D&T), signal integrity has been pointed out as a major challenge. More and more causes can affect signal integrity as geometries are shrinking. One of the growing effects is the so-called "transient errors" which are due to temporary condition of use and environment. Cross-coupling, ground bounce, external terrestrial radiations create more and more unpredictable transient and soft errors which affect system reliability in unacceptable ways. In addition, reliability in devices like memories become a critical issue: the MTBF (mean time before failure) level decreasing the global system FIT ( Failure in Time) rate approaching the critical border line for the end user. Hence, for memories and for logic blocks as well using high-end process technologies, self-correcting intelligence embedded in SoC is needed to enable electronic systems to react against unpredictable and insidious errors.


3B: Advances in Logic Synthesis

Moderators: M. Berkelaar, Magma Design Automation, NL; W. Kunz, Kaiserslautern U, DE

CHESMIN: A Heuristic for State Reduction in Incompletely Specified Finite State Machines [p. 248]
S. Gören and F. Ferguson

A heuristic is proposed for state reduction in incompletely specified finite state machines (ISFSMs). The algorithm is based on checking sequence generation and identification of sets of compatible states. We have obtained results as good as the best exact method in the literature but with significantly better run-times. In addition to finding a reduced FSM, our algorithm also generates an I/O sequence that can be used as test vectors to verify the FSM's implementation.

Generalized Early Evaluation in Self-Timed Circuits [p. 255]
M. Thornton, K. Fazel, R. Reese, and C. Traver

Phased logic has been proposed as a technique for realizing self-timed circuitry that is delay-insensitive and requires no global clock signals. Early evaluation techniques have been applied to asynchronous circuits in the past in order to achieve throughput increases. A general method for computing early evaluation functions is presented for this design style. Experimental results are given that show the increase in throughput of various benchmark circuits. The results show that as much as a 30% speedup can be achieved in some cases.

Dual Threshold Voltage Domino Logic Synthesis for High Performance with Noise and Power Constraint [p. 260]
S. Jung, K. Kim, and S. Kang

We introduce a new dual threshold voltage technique for domino logic. Since domino logic is much more sensitive to noise, noise margins have to be taken into account when applying dual threshold voltages to domino logic. To guarantee the signal integrity in domino logic, we carefully consider the effect of transistor sizing and threshold voltage selection. For optimal design, tradeoffs need to be made among noise margin, power, and performance. Based on the characteristics of each logic gate, we propose noise and power constrained domino logic synthesis for high performance. ISCAS85 benchmark results show that performance can be improved up to 18.62% with 2% active power increase, while maintaining noise margin.


3C: Novel Applications of Symbolic Techniques to Analogue and Digital Circuit Design

Moderators: F. Férnandez, IMSE-CNM, ES; A. Konczykowska, Alcatel R&I, FR

A Fitting Approach to Generate Symbolic Expressions for Linear and Nonlinear Analog Circuit Performance Characteristics [p. 268]
W. Daems, G. Gielen, and W. Sansen

This paper presents a novel method to automatically generate symbolic expressions for both linear and nonlinear circuit characteristics using a template-based fitting of numerical, simulated data. The aim of the method is to generate convex, interpretable expressions. The posynomiality of the generated expressions enables the use of efficient geometric programming techniques when using these expressions for circuit sizing and optimization. Attention is paid to estimating the relative `goodness-of-fit' of the generated expressions. Experimental results illustrate the capabilities of the approach.

Parameter Controlled Automatic Symbolic Analysis of Nonlinear Analog Circuits [p. 274]
R. Popp, J. Oehmen, L. Hedrich, and E. Barke

In this paper we introduce an approach for parameter controlled symbolic analysis of nonlinear analog circuits. Based on a state-of-the-art algorithm, it enables the removal of specific circuit parameters from a symbolic circuit description, given as a set of nonlinear differential algebraic equations (DAEs). During the removal, singularities are considered, which includes structural changes of the set of DAEs. The feasibility of our approach is shown by several circuit examples.

Constructing Symbolic Models for the Input/Output Behavior of Periodically Time-Varying Systems Using Harmonic Transfer Matrices [p. 279]
P. Vanassche, G. Gielen, and W. Sansen

A new technique is presented for generating symbolic expressions for the harmonic transfer functions of linear periodically time-varying (LPTV) systems, like mixers and PLL's. The algorithm, which we call Symbolic HTM, is based on the organisation of the harmonic transfer functions into a harmonic transfer matrix. This representation allows to manipulate LPTV systems in a way that is similar to linear time-invariant (LTI) systems, making it possible to generate symbolic expressions which relate the overall harmonic transfer functions to the characteristics of the building blocks. These expressions can be used as design equations or as parametrized models for use in simulations. The algorithm is illustrated for a downconversion mixer.

Taylor Expansion Diagrams: A Compact, Canonical Representation with Applications to Symbolic Verification [p. 285]
M. Ciesielski, P. Kalla, Z. Zeng, and B. Rouzeyre

This paper presents a new, compact, canonical graph-based representation, called Taylor Expansion Diagrams (TEDs). It is based on a general non-binary decomposition principle using Taylor series expansion. It can be exploited to facilitate the verification of high-level (RTL) design descriptions. We present the theory behind TEDs, comment upon its canonicity property and demonstrate that the representation has linear space complexity. Its application to equivalence checking of high-level design descriptions is discussed.


3D: Hot Topic -- EDA Tools for RF: Myth or Reality?

Organizers: L. Guarnirei, Barcelona Design, US; E. Chen, Celestry Design Technologies, US
Moderator: C. Ajluni, Wireless Systems Design, US
Presenters: S. Savage, Cypress Semiconductors, US; M. Hershenson, Barcelona Design, US; X. Zhang, Celestry Design Technologies, US

EDA Tools for RF: Myth or Reality? [p. 292]

Designing circuits that operate at radio frequencies (above 1 GHz) is a challenge for many reasons. Nearly every aspect of producing chips is stressed at high frequency, including technology development, modeling, CAD, design, integration, and packaging. From a device modeling perspective, devices have shrink to extreme dimensions to achieve the required high frequency performance metrics, while exotic materials are being added to the process. This is straining the limits of industry standard models, as newer, more capable device models struggle to reach the level of generic support necessary to achieve widespread adoption. Substrate currents and losses, device and substrate noise, and device mismatch all need to be accurately modeled as well in RF design. Electromagnetic effects (both desirable and parasitic) are also much more significant as operating frequencies rise. Lumped RC networks are no longer sufficient to represent interconnect parasitics. Inductive coupling is now significant on chip, while packages and boards are larger today (relative to the wavelength of operation) than ever before, requiring fullwave electromagnetic simulation. Integrated passives (on chips and packages) have significantly reduced integration costs, but require accurate high frequency models that can be incorporated into analog simulators.
Finally, hierarchical, block based, mixed signal design methodologies are very complicated and not currently well integrated into EDA tools. The models for interaction between blocks is often too simplistic and the coupling between analog and digital components on a chip is often ignored. The result can be resignation to designing in silicon, which keeps design cycle time and the cost of advanced RF chips high.
This presentation will present details of the issues mentioned above to help the audience understand the complexity and depth of the problems, and serve as an invitation to the EDA industry to present solutions to the issues.


3E: Platform-Based Design and Virtual-Component Reuse

Moderators: W. Wolf, Princeton U, US; N. Mártinez Madrid, FZI Karlsruhe, DE

Dynamic Runtime Re-Scheduling Allowing Multiple Implementations of a Task for Platform-Based Designs [p. 296]
T. Lee, W. Wolf, and J. Henkel

This paper introduces an extension to the RMS scheduling technique that we call "Hot Swapping". Hot Swapping enables a system to choose between various selected implementations of one task on-the-fly and thus to optimize the system's cost (e.g. power savings). The on-the-fly swapping between those implementations requires extra time to save and/or transform states of a certain task implementation. Even if the two steady-state schedules before and after the swapping are feasible, the transient schedule with the additional swapping computation time may exceed the system's capacity. Our technique is an extension to Rate Monotonic Scheduling (RMS). While maintaining and meeting performance requirements, our technique shows an average reduction of 31% in power consumption compared to systems using a pure static scheduling approach (RMS) that cannot make use of task swapping. We have evaluated our algorithm through simulation of five real-world task sets and in addition by use of a large number of generated task sets.

Techniques to Evolve a C++ Based System Design Language [p. 302]
R. Pasko, S. Vernalde, and P. Schaumont

Complex systems-on-chip present one of the most challenging design problems of today. To meet this challenge, new design languages capable to model such heterogeneous, dynamic systems are needed. For implementation of such a language, the use of an object oriented C++ class library has proven to be a promising approach, since new classes dealing with design- and platform-specific problems can be added in a conceptual and seamlessly reusable way. This paper shows the development of such an extension aimed to provide a platform-independent high-level structured storage object through hiding of the low-level implementation details. It results in a completely virtualised, user-extendible component, suitable for use in heterogeneous systems.

A Mixed-Signal Design Reuse Methodology Based on Parametric Behavioural Models with Non-Ideal Effects [p. 310]
A. Ginés, E. Peralías, A. Rueda, N. Madrid, and R. Seepold

Current System-on-Chip (SoC) designs incorporate an increasing number of mixed-signal components. Design reuse techniques have proved successful for digital design but these rules are difficult to transfer to mixed-signal design. A top-down methodology is missing but the low level of abstraction in designs makes system integration and verification a very difficult, tedious and complex task. This paper presents a contribution to mixed-signal design reuse where a design methodology is proposed based on modular and parametric behavioural components. They support a design process where non-ideal effects can be incorporated in an incremental way, allowing easy architectural selection and accurate simulations. A working example is used through the paper to highlight and validate the applicability of the methodology.


3F2: Analogue Circuit Characterisation and Simulation

Moderators: A. Ródriguez-Vázquez, IMSE-CNM, ES; D. Leenaerts, Philips, NL

Test Structure for IC(VBE) Parameter Determination of Low Voltage Applications [p. 316]
W. Rahajandraibe, C. Dufaza, D. Auvergne, B. Cialdella, B. Majoux, and V. Chowdhury

The temperature dependence of the IC(VBE) relationship can be characterised by two parameters: EG and XTI. The classical method to extract these parameters consists in a "best fitting" from measured VBE(T) values, using least square algorithm at constant collector current. This method involves an accurate measurement of VBE voltage and an accurate value of the operating temperature. We propose in this paper, a configurable test structure dedicated to the extraction of temperature dependence of IC(VBE) characteristic for BJT designed with bipolar or BiCMOS processes. This allows a direct measurement of die temperature and consequently an accurate measurement of VBE(T). First, the classical extraction method is explained. Then, the implementation techniques of the new method are discussed, the improvement of the design is presented.

Global Optimization Applied to the Oscillator Problem [p. 322]
S. Lampe and S. Laur

The oscillator problem consists of determining good initial values for the node voltages and the frequency of oscillation and the avoidance of the DC solution. Standard approaches for limit cycle calculations of autonomous circuits exhibit poor convergence behavior in practice. By introducing an additional periodic probe voltage source to the oscillator circuit, the system of autonomous differential algebraic equations (DAEs) can be reformulated as a system of non-autonomous DAEs with the constraint, that the current through the source has to be zero for the limit cycle. Using a two stage approach leads to a greater range of convergence as the standard approach, but the success of the algorithm is heavily dependent on the initial amplitude of the probe source and the frequency of oscillation. This paper presents a fast and reliable optimization based initialization procedure which overcomes the initialization problem of the two stage algorithm.


4A: Panel -- MEDEA+ and ITRS Roadmaps

Organizer: W. Rosenstiel, FZI/Tuebingen U, DE
Moderator: G. Mathéron, Director of MEDEA+ Office, FR
Panellists: J. Borel, STMicroelctronics, US; G. Matheron, MEDEA+ Office; A. Jerraya, TIMA, Grenoble, FR;
S. Resve, UC Berkeley, US; M. Rogers, Intel, US; W. Rosenstiel, FZI/Tuebingen U, DE;
I. Rugen-Herzig, Infineon Technologies, DE; F. Theeuwen, Philips Research, NL

MEDEA+ and ITRS Roadmaps [p. 328]

The ITRS Technology Roadmap recent revision has shown again an acceleration of the Very Deep Submicron process availability with design capabilities forecasted in hundred millions of gates per square centimeter in 2010. This will again raise the question on how to cope with such complexities and functionalities (A-D, HWSW, MEMS ...) in EDA solutions. In this panel will be discussed what are the main priorities in EDA as seen through the applications specificities in USA (ITRS-2001 DESIGN ITWG) and in Europe (The MEDEA EDA Roadmap). The panelists will present the strategies in their respective fields of interest, resulting from their working groups conclusions. They will underline the breakthroughs and potential developments of solutions and the milestones to reduce design times and increase design quality. The focus will be on application driven solutions, mostly in the SoC domains (covering both hardware, embedded and application software).


4B: Asynchronous Circuits and Clock Scheduling

Moderators: M. Renaudin, TIMA, Grenoble, FR; L. Lavagno, Politecnico di Torino, IT

A Burst-Mode Oriented Back-End for the Balsa Synthesis System [p. 330]
T. Chelcea, S. Nowick, A. Bardsley, and D. Edwards

This paper introduces several new component clustering techniques for the optimization of asynchronous systems. In particular, novel "Burst-Mode aware" restrictions are imposed to limit the cluster sizes and to ensure synthesizability. A new control specification language, CH, is also introduced which facilitates the manipulation and optimization of handshake control components. The new method has been fully integrated into a comprehensive asynchronous synthesis package, Balsa. Experimental results on several substantial design examples, including an 32-bit microprocessor core, indicate significant performance improvements for the optimized circuits.

Detecting State Coding Conflicts in STGs Using Integer Programming [p. 338]
V. Khomenko, M. Koutny, and A. Yakovlev

The paper presents a new method for checking Unique and Complete State Coding, the crucial conditions in the synthesis of asynchronous control circuits from Signal Transition Graphs (STGs). The method detects state coding conflicts in an STG using its partial order semantics (unfolding prefix) and an integer programming technique. This leads to huge memory savings compared to methods based on reachability graphs, and also to significant speedups in many cases. In addition, the method produces execution paths leading to an encoding conflict. Finally, the approach is extended to checking the normalcy property of STGs, which is a necessary condition for their implementability using gates whose characteristic functions are monotonic.

Verifying Clock Schedules in the Presence of Cross Talk [p. 346]
S. Hassoun, E. Calvillo-Gámez, and C. Cromer

This paper addresses verifying the timing of circuits containing level-sensitive latches in the presence of cross talk. We show that three consecutive periodic occurrences of the aggressor's input switching window must be compared with the victim's input switching window. We propose a new phase shift operator to allow aligning the aggressor's three relevant switching windows with the victim's input signals. We solve the problem iteratively in polynomial time, and show an upper bound on the number of iterations equal to the number of capacitors in the circuit. Our experiments demonstrate that eliminating false coupling results in finding a smaller clock period at which a circuit will run.


4C: Analogue and Mixed-Signal Systems

Moderators: A. Kaiser, ISEN, FR; P. Wambacq, IMEC, BE

Analysis of Nonlinearities in RF Front-End Architectures Using a Modified Volterra Series Approach [p. 352]
M. Goffioul, P. Wambacq, G. Vandersteen, and S. Donnay

RF front-end architectures of today's wireless applications need to meet tough requirements on nonlinear distortion to minimize unwanted effects such as crosstalk. An analysis of the nonlinear behavior of analog communication circuits or architectures is not straightforward. This paper presents a modified Volterra series approach to the simulation of nonlinear systems described at the architectural level. The total computed response is decomposed in its nonlinear contributions and the main nonlinearities can be identified. This yields a better insight into the system's nonlinear behavior and allows simplifications. The simplified system can then be simulated more efficiently. The implementation is only based on vector calculation to minimize the computation time, and has been applied to a complete 5 GHz WLAN receiver front-end.

Systematic Design of a 200 Ms/S 8-bit Interpolating A/D Converter [p. 357]
J. Vandenbussche, E. Lauwers, K. Uyttenhove, M. Steyaert, and G. Gielen

The systematic design of a high-speed, high-accuracy Nyquist A/D converter is proposed. The presented design methodology covers the complete flow and is supported by software tools. A generic behavioral model is used to explore the A/D converter's specifications during high level design and exploration. The inputs are the specifications of the A/D converter and the technology process. The result is a generated layout and the corresponding extracted behavioral model. The approach has been applied to a real-life test case, where a Nyquistrate 8-bit 200MS/s 4-2 interpolating A/D converter was developed for a WLAN application.

Bio-Inspired Analog VLSI Design Realizes Programmable Complex Spatio-Temporal Dynamics on a Single Chip [p. 362]
R. Carmona, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo, and A. Rodríguez-V&accute;zquez

A bio-inspired model for an analog parallel array processor (APAP), based on studies on the vertebrate retina, permits the realization of complex spatio-temporal dynamics in VLSI. This model mimics the way in which images are processed in the visual pathway what renders a feasible alternative for the implementation of early vision tasks in standard technologies. A prototype chip has been designed in CMOS. Design challenges, trade-offs and the building blocks of such a high-complexity system ( transistors, most of them operating in analog mode) are presented in this paper.


4D: BIST Diagnosis and DFT

Moderators: M. Flottes, LIRMM, FR; A. Benso, Politecnico di Torino, IT

An Incremental Algorithm for Test Generation in Illinois Scan Architecture Based Designs [p. 368]
A. Pandey and J. Patel

As the complexity of VLSI circuits is increasing due to the exponential rise in transistor count per chip, testing cost is becoming an important factor in the overall integrated circuit (IC) manufacturing cost. This paper addresses the issue of decreasing test cost by lowering the test data bits and the number of clock cycles required to test a chip. We propose a new incremental algorithm for generating tests for Illinois Scan Architecture (ILS) based designs and provide analysis of test data and test time reduction. This algorithm is very efficient in generating tests for a number of ILS designs in order to find the most optimal configuration.

Gate Level Fault Diagnosis in Scan-Based BIST [p. 376]
I. Bayraktaroglu and A. Orailoglu

A gate level, automated fault diagnosis scheme is proposed for scan-based BIST designs. The proposed scheme utilizes both fault capturing scan chain information and failing test vector information and enables location identification of single stuck-at faults to a neighborhood of a few gates through set operations on small pass/fail dictionaries. The proposed scheme is applicable to multiple stuck-at faults and bridging faults as well. The practical applicability of the suggested ideas is confirmed through numerous experimental runs on all three fault models.

An Interval-Based Diagnosis Scheme for Identifying Failing Vectors in a Scan-BIST Environment [p. 382]
C. Liu, K. Chakrabarty, and M. Goessel

We present a new scan-BIST approach for determining failing vectors for fault diagnosis. This approach is based on the application of overlapping intervals of test vectors to the circuit under test. Two MISRs are used in an interleaved fashion to generate intermediate signatures, thereby obviating the need for multiple test sessions. The knowledge of failing and non-falling intervals is used to obtain a set S of candidate failing vectors that includes all the actual (true) failing vectors. We present analytical results to determine an appropriate interval length and the degree of overlap, an upper bound on the size of S, and a lower bound on the number of true failing vectors; the latter depends only on the knowledge of failing and non-failing intervals. Finally, we describe two pruning procedures that allow us to reduce the size of S, while retaining most true failing vectors in S. We present experimental results for the ISCAS 89 benchmark circuits to demonstrate the effectiveness of the proposed scan-BIST diagnosis approach.

Reducing Test Application Time Through Test Data Mutation Encoding [p. 387]
S. Reda and A. Orailoglu

In this paper we propose a new compression algorithm geared to reduce the time needed to test scan-based designs. Our scheme compresses the test vector set by encoding the bits that need to be flipped in the current test data slice in order to obtain the mutated subsequent test data slice. Exploitation of the overlap in the encoded data by effective traversal search algorithms results in drastic overall compression. The technique we propose can be utilized as not only a stand-alone technique but also can be utilized on test data already compressed, extracting even further compression. The performance of the algorithm is mathematically analyzed and its merits experimentally confirmed on the larger examples of the ISCAS'89 benchmark circuits.


4E: Code and Memory Optimization in Co-Design

Moderators: R. Leupers, TU Aachen, DE; R. Ernst, TU Braunschweig, DE

Hardware/Software Trade-Offs for Advanced 3G Channel Coding [p. 396]
H. Michel, A. Worm, N. Wehn, and M. Münch

Third generation's wireless communications systems comprise advanced signal processing algorithms that increase the computational requirements more than ten-fold over 2G's systems. Numerous existing and emerging standards require flexible implementations ("software radio"). Thus efficient implementations of the performance-critical parts as Turbo decoding on programmable architectures are of great interest. Besides high-performance DSPs, application-customized RISC cores offer the required performance while still maintaining the aspired flexibility. This paper presents for the first time Turbo decoder implementations on customized RISC cores and compares the results with implementations on state-of-the-art VLIW DSPs. The results of our studies show that the Log-MAP performance is about 50% higher than on an ST120, a current VLIW architecture.

An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs [p. 402]
A. Halambi, A. Shrivastava, P. Biswas, N. Dutt, and A. Nicolau

For many embedded applications, program code size is a critical design factor. One promising approach for reducing code size is to employ a "dual instruction set", where processor architectures support a normal (usually 32 bit) Instruction Set, and a narrow, space-efficient (usually 16 bit) Instruction Set with a limited set of op-codes and access to a limited set of registers. This feature, however, requires compilers that can reduce code size by compiling for both Instruction Sets. Existing compiler techniques operate at the function-level granularity and are unable to make the trade-off between the increased register pressure (resulting in more spills) and decreased code size. We present a profitability based compiler heuristic that operates at the instruction-level granularity and is able to effectively take advantage of both Instruction Sets. We also demonstrate improved code size reduction, for the MIPS 32/16 bit ISA, using our technique. Our approach more than doubles the code size reduction achieved by existing compilers.

Assigning Program and Data Objects to Scratchpad for Energy Reduction [p. 409]
S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel

The number of embedded systems is increasing and a remarkable percentage is designed as mobile applications. For the latter, the energy consumption is a limiting factor because of today's battery capacities. Besides the processor, memory accesses consume a high amount of energy. The use of additional less power hungry memories like caches or scratchpads is thus common. Caches incorporate the hardware control logic for moving data in and out automatically. On the other hand, this logic requires chip area and energy. A scratchpad memory is much more energy efficient, but there is a need for software control of its content. In this paper, an algorithm integrated into a compiler is presented which analyses the application and selects program and data parts which are placed into the scratchpad. Comparisons against a cache solution show remarkable advantages between 12% and 43% in energy consumption for designs of the same memory size.


5A: Hot Topic -- Network on a Chip

Moderator/Organizer: G. De Micheli, Stanford U, US

Networks on Chip: A New Paradigm for Systems on Chip Design [p. 418]
G. De Micheli and L. Benini

This paper is meant to be a short introduction to a new paradigm for systems on chip (SoC) design. We refer the interested reader to an extended overview of this problem [1] and to some recent results in this area in industry [21, 10] and academia [4, 5]. The premises are that a component-based design methodology will prevail in the future, to support component re-use in a plug-and-play fashion. At the same time, SoCs will have to provide a functionally-correct, reliable operation of the interacting components. The physical interconnections on chip will be a limiting factor for performance and energy consumption. The international technology roadmap for semiconductors (ITRS) [23] projects that we will be designing multi-billion transistor chips by the end of this decade, with feature sizes around 50nm and clock frequencies around 10GHz. Delays on wires will dominate: global wires spanning a significant fraction of the chip size will carry signals whose propagation delay will exceed the clock period. Whereas relatively large delays can be managed with wire pipelining techniques, timing uncertainty will be more problematic for designers. Moreover, synchronization of chips with a single clock source and negligible skew will be extremely hard or impossible. The most likely synchronization paradigm for future chips is globally-asynchronous locally synchronous (GALS), with many different clocks. Global wires will span multiple clock domains, and synchronization failures in communicating between different clock domains will be rare but unavoidable events [7].

Communication Mechanisms for Parallel DSP Systems on a Chip [p. 420]
J. Williams, N. Heintze, and B. Ackland

We consider the implication of deep sub-micron VLSI technology on the design of communication frameworks for parallel DSP systems-on-chip. We assert that distributed data transfer and control mechanisms are necessary to manage many independent processing subsystems and software tasks. An example of a parallel DSP architecture is given and used to demonstrate these mechanisms at work. We show the similarity of these mechanism and those used in large scale computing networks.

Networks on Silicon: Combining Best-Effort and Guaranteed Services [p. 423]
K. Goossens, P. Wielage, A. Peeters, and J. van Meerbergen

We advocate a network on silicon (NOS) as a hardware architecture to implement communication between IP cores in future technologies, and as a software model in the form of a protocol stack to structure the programming of NOSs. We claim guaranteed services are essential. In the ÆTHEREAL NOS they pervade the NOS as a requirement for hardware design, and as foundation for software programming.


5B: Low Power Architectures and Software

Moderators: W. Nebel, OFFIS, DE; M. Miranda, IMEC, BE

Data Reuse Exploration Techniques for Loop-Dominated Applications [p. 428]
T. Van Achteren, G. Deconinck, F. Catthoor, and R. Lauwereins

Efficient exploitation of temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in embedded data dominated applications. The effective use of an optimized custom memory hierarchy or a customized software controlled mapping on a predefined hierarchy, is crucial for this. Only recently effective systematic techniques to deal with this specific design step have begun to appear. They were still limited in their exploration scope. In this paper we introduce an extended formalized methodology based on an analytical model of the data reuse of a signal. The cost parameters derived from this model define the search space to explore and allow us to exploit the maximum data reuse possible. The result is an automated design technique to find power efficient memory hierarchies and generate the corresponding optimized code.

EAC: A Compiler Framework for High-Level Energy Estimation and Optimization [p. 436]
I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. Irwin, and A. Sivasubramaniam

This paper presents a novel Energy-Aware Compilation (EAC) framework that can estimate and optimize energy consumption of a given code taking as input the architectural and technological parameters, energy models, and energy/performance constraints. The framework has been validated using a cycle-accurate architectural-level energy simulator and found to be within 6% error margin while providing significant estimation speedup. The estimation speed of EAC is the key to the number of optimization alternatives that can be explored within a reasonable compilation time.

Power Savings in Embedded Processors through Decode Filer Cache [p. 443]
W. Tang, R. Gupta, and A. Nicolau

In embedded processors, instruction fetch and decode can consume more than 40% of processor power. An instruction filter cache can be placed between the CPU core and the instruction cache to service the instruction stream. Power savings in instruction fetch result from accesses to a small cache. In this paper, we introduce decode filter cache to provide decoded instruction stream. On a hit in the decode filter cache, fetching from the instruction cache and the subsequent decoding is eliminated, which results in power savings in both instruction fetch and instruction decode. We propose to classify instructions into cacheable or uncacheable depending on the decoded width. Then sectored cache design is used in the decode filter cache so that cacheable and uncacheable instructions can coexist in a decode filter cache sector. Finally, a prediction mechanism is presented to reduce the decode filter cache miss penalty. Experimental results show average 34% processor power reduction and less than 1% performance degradation.

Hardware-Assisted Data Compression for Energy Minimization in Systems with Embedded Processors [p. 449]
L. Benini, D. Bruni, A. Macii, and E. Macii

In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of core-based embedded systems. We propose a novel and efficient architecture on-the-y data compression and decompression whose field operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. We explore two classes of compression methods, profile-driven and differential, since they are characterized by compact HW implementations, and we compare their performance to those provided by some state-of-the-art compression methods (e.g., we have considered a few variants of the Lempel-Ziv encoder We present experimental results about memory traffic and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The achieved average energy savings range from 4.2% to 35.2%, depending on the selected compression algorithm.


5C: Nitty Gritty Details of Layout Design

Moderators: E. Barke, Hannover U, DE; P. Groeneveld, Magma Design Automation, NL

Analysis of Noise Avoidance Techniques in DSM Interconnects Using a Complete Crosstalk Noise Model [p. 456]
M. Becer, V. Zolotov, D. Blaauw, R. Panda, and I. Hajj

Noise estimation and avoidance are becoming critical, "must have" capabilities in today's high performance IC design. An accurate yet efficient crosstalk noise model which contains as many driver/interconnect parameters as possible, is necessary for any sensitivity based noise avoidance approach. In this paper, we present a complete analytical crosstalk noise model which incorporates all physical properties including victim and aggressor drivers, distributed RC characteristics of interconnects and coupling locations in both victim and aggressor lines. We present closed-form analytical expressions for peak noise and noise width as well as sensitivities to all model parameters. We then use these model parameter sensitivities to analyze and evaluate various noise avoidance techniques such as driver sizing, wire sizing, wire spacing and layer assignment. Both our model and noise avoidance evaluations are verified using realistic circuits in 0.13µ technology. We also present effectiveness of discussed noise avoidance techniques on a high performance microprocessor core.

Hierarchical Current Density Verification for Electromigration Analysis in Arbitrary Shaped Metallization Patterns of Analog Circuits [p. 464]
G. Jerke and J. Lienig

Electromigration is caused by high current density stress in metallization patterns and is a major source of breakdown in electronic devices. It is therefore an important reliability issue to verify current densities within all stressed metallization patterns. In this paper we propose a new methodology for hierarchical verification of current densities in arbitrarily shaped analog circuit layouts, including a quasi-3D model to verify irregularities such as vias. Our approach incorporates thermal simulation data to account for the temperature dependency of electromigration. The described methodology, which can be integrated into any IC design flow as a design rule check (DRC), has been successfully tested and verified in commercial design flows.

A Polynomial Time Optimal Diode Insertion/Routing Algorithm for Fixing Antenna Problem [p. 470]
L. Huang, X. Tang, H. Xiang, D. Wong, and I. Liu

Antenna problem is a phenomenon of plasma induced gate oxide degradation. It directly affects manufacturability of VLSI circuits, especially in deep-submicron technology using high density plasma. Diode insertion is a very effective way to solve this problem. Ideally diodes are inserted directly under the wires that violate antenna rules. But in today's high-density VLSI layouts, there is simply not enough room for "under-the-wire" diode insertion for all wires. Thus it is necessary to insert many diodes at legal "off-wire" locations and extend the antenna-rule violating wires to connect to their respective diodes. Previously only simple heuristic algorithms were available for this diode insertion and routing problem. In this paper, we show that the diode insertion and routing problem for an arbitrary given number of routing layers can be optimally solved in polynomial time. Our algorithm guarantees to find a feasible diode insertion and routing solution whenever one exists. Moreover, we can guarantee to find a feasible solution to minimize a cost function of the form alpha . L + beta . N where L is the total length of extension wires and N is the total number of vias on the extension wires. Experimental results show that our algorithm is very efficient.


5D: SoC and System Test

Moderators: Y. Zorian, LogicVision, US; D. Gizopoulos, Piraeus U, GR

Test Planning and Design Space Exploration in a Core-Based Environment [p. 478]
E. Cota, L. Carro, M. Lubaszewski, and A. Orailoglu

This paper proposes a comprehensive model for test planning in a core-based environment. The main contribution of this work is the use of several types of TAMs and the consideration of different optimization factors (area, pins and test time) during the global TAM and test schedule definition. This expansion of concerns makes possible an efficient yet fine-grained search in the huge design space of a reuse-based environment. Experimental results clearly show the variety of trade-offs that can be explored using the proposed model, and its effectiveness on optimizing the system test design.

A Hierarchical Test Scheme for System-On-Chip Designs [p. 486]
J. Li, H. Huang, J. Chen, C. Su, C. Wu, C. Cheng, S. Chen, C. Hwang, and H. Lin

System-on-chip (SOC) design methodology is becoming the trend in the IC industry. Integrating reusable cores from multiple sources is essential in SOC design, and different design-for-testability methodologies are usually required for testing different cores. Another issue is test integration. The purpose of this paper is to present a hierarchical test scheme for SOC with heterogeneous core test and test access methods. A hierarchical test manager (HTM) is proposed to generate the control signals for these cores, taking into account the IEEE P1500 Standard proposal. A standard memory BIST interface is also presented, linking the HTM and the memory BIST circuit. It can control the BIST circuit with the serial or parallel test access mechanism. The hierarchical test control scheme has low area and pin overhead, and high flexibility. An industrial case using this scheme has been designed, showing an area overhead of only about 0.63%.

Efficient Wrapper/TAM Co-Optimization for Large SOCs [p. 491]
V. Iyengar, K. Chakrabarty, E. Marinissen

Core test wrappers and test access mechanisms (TAMs) are important components of a system-on-chip (SOC) test architecture. Wrapper/TAM co-optimization is necessary to minimize the SOC testing time. Most prior research in wrapper/TAM design has addressed wrapper design and TAM optimization as separate problems, thereby leading to results that are sub-optimal. We present a fast heuristic technique for wrapper/TAM co-optimization, and demonstrate its scalability for several industrial SOCs. This extends recent work on exact methods for wrapper/TAM co-optimization based on integer linear programming and exhaustive enumeration. We show that the SOC testing times obtained using the new heuristic algorithm are comparable to the testing times obtained using exact methods. Moreover, more than two orders of magnitude reduction can be obtained in the CPU time compared to exact methods. Furthermore, we are now able to design efficient test access architectures with a larger number of TAMs.

Beyond UML to an End-of-Line Functional Test Engine [p. 499]
A. Baldini, A. Benso, P. Prinetto, S. Mo, and A. Taddei

In this paper, we analyze the use of UML as a starting point to go from design issues to end of production testing of complex embedded systems. The first point is the analysis of the big gap between system signals and UML messages; then the paper focuses on the additional information necessary to fill such gap; different test types are considered, focusing on the application software test; finally the actuation and observation are both analyzed inside the test environment, with particular care to the black -box requirement for behavioral testing. The emphasis of the work is on the resulting test engine definition, verified on a complex case study of a top-of-the-line automotive application; this application is a modern car console, grouping many controls of carrelated devices, such as phone, navigation, radio, CD. The testing of GSM capabilities of such device is studied in particular.


5E: Modelling and Synthesis of Embedded Systems

Moderators: J. López, Castilla-La Mancha U, ES; F. Rousseau, TIMA, Grenoble, FR

Event Model Interfaces for Heterogeneous System Analysis [p. 506]
K. Richter and R. Ernst

Complex embedded systems consist of hardware and software components from different domains, such as control and signal processing, many of them supplied by different IP vendors. The embedded system designer faces the challenge to integrate, optimize and verify the resulting heterogeneous systems. While formal verification is available for some subproblems, the analysis of the whole system is currently limited to simulation or emulation. In this paper, we tackle the analysis of global resource sharing, scheduling, and buffer sizing in heterogeneous embedded systems. For many practically used preemptive and non-preemptive hardware and software scheduling algorithms of processors and busses, semi-formal analysis techniques are known. However, they cannot be used in system level analysis due to incompatibilities of their underlying event models. This paper presents a technique to couple the analysis of local scheduling strategies via an event interface model. We derive transformation rules between the most important event models and provide proofs where necessary. We use expressive examples to illustrate their application.

Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Systems [p. 514]
M. Schmitz, B. Al-Hashimi, and P. Eles

In this paper, we present an efficient two-step iterative synthesis approach for distributed embedded systems containing dynamic voltage scalable processing elements (DVS-PEs), based on genetic algorithms. The approach partitions, schedules, and voltage scales multi-rate specifications given as task graphs with multiple deadlines. A distinguishing feature of the proposed synthesis is the utilisation of a generalised DVS method. In contrast to previous techniques, which "simply" exploit available slack time, this generalised technique additionally considers the PE power profile during a refined voltage selection to further increase the energy savings. Extensive experiments are conducted to demonstrate the efficiency of the proposed approach. We report up to 43.2% higher energy reductions compared to previous DVS scheduling approaches based on constructive techniques and total energy savings of up to 82.9% for mapping and scheduling optimised DVS systems.

A Layered, Codesign Virtual Machine Approach to Modeling Computer Systems [p. 522]
J. Paul and D. Thomas

By using a macro/micro state model we show how assumptions on the resolution of logical and physical timing of computation in computer systems has resulted in design methodologies such as component-based decomposition, where they are completely coupled, and function/architecture separation, where they are completely independent. We discuss why these are inappropriate for emerging programmable, concurrent system design. By contrast, schedulers layered on hardware in concurrent systems already couple logical correctness with physical performance when they make effective resource sharing decisions. This paper lays a foundation for understanding how layered logical and physical sequencing will impact the design process, and provides insight into the problems that must be solved in such a design environment. Our layered approach is that of a virtual machine. We discuss our MESH research project in this context.

Automatic Evaluation of the Accuracy of Fixed-Point Algorithms [p. 529]
D. Menard and O. Sentieys

The minimization of cost, power consumption and time-to-market of DSP applications requires the development of methodologies for the automatic implementation of floating-point algorithms in fixed-point architectures. In this paper, a new methodology for evaluating the quality of an implementation through the automatic determination of the Signal to Quantization Noise Ratio (SQNR) is under consideration. The theoretical concepts and the different phases of the methodology are explained. Then, the ability of our approach for computing the SQNR efficiently and its beneficial contribution in the process of data word-length minimization are shown through some examples.


6A: Panel -- Power Crisis in SoC Design: Strategies for Constructing Low-Power, High-Performance SoC Designs

Organizer: K. Brock, Virtual Silicon Technology, US
Moderator: C. Edwards, Electronic Times, UK
Panellists: R. Lannoo, Alcatel, BE; U. Schlichtmann, Infineon Technologies, DE; A. Domic, Synopsys, US;
J. Benkoski, Monterey, US; D. Overhauser, Simplex, US; M. Kliment, Virtual Silicon, US

Power Crisis in SoC Design: Strategies for Constructing Low-Power, High-Performance SoC Designs [p. 538]

This special panel session brings together several leading technologists to discuss the challenges and solutions in constructing SoC designs that achieve their performance goals within a very tight power budget. These challenges are addressed from the often conflicting perspectives of semiconductor design teams and commercial solutions providers of EDA construction tools, EDA analysis tools and semiconductor IP (SIP).


6B: Reconfigurable Architectures

Moderators: R.. Hartenstein, Kaiserslautern U, DE; U. Kebschull, Leipzig U, DE

A Video Compression Case Study on a Reconfigurable VLIW Architecture [p. 540]
D. Rizzo and O. Colavin

In this paper, we investigate the benefits of a flexible, application-specific instruction set by adding a run-time Reconfigurable Functional Unit (RFU) to a VLIW processor. Preliminary results on the motion estimation stage in an MPEG4 video encoder are presented. With the RFU modeled at functional level and under realistic assumptions on execution latency, technology scaling and reconfiguration penalty, we explore different RFU instructions at fine-grain (instruction-level) and coarsegrain (loop-level) granularity to speedup the application execution. The memory bandwidth bottleneck, typical for streaming applications, is alleviated through the combined adoption of custom prefetch pattern instructions and an extent of local memory. Performance evaluations indicate up to 8x improvement, with looplevel optimizations is achieved under various architectural assumptions.

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures [p. 547]
M. Sánchez-Élez, M. Férnandez, R. Maestre, R. Hermida, N. Bagherzadeh, and F. Kurdahi

A new technique is presented in this paper to improve the efficiency of data scheduling for multi-context reconfigurable architectures targeting multimedia and DSP applications. The main goal is to improve the applications execution time minimizing external memory transfers. Some amount of on-chip data storage is assumed to be available in the reconfigurable architecture. Therefore the Complete Data Scheduler tries to optimally exploit this storage, saving data and result transfers between on-chip and external memories. In order to do this, specific algorithms for data placement and replacement have been designed. We also show that a suitable data scheduling could decrease the number of transfers required to implement the dynamic reconfiguration of the system.

Highly Scalable Dynamically Reconfigurable Systolic Ring-Architecture for DSP Applications [p. 553]
G. Sassatelli, L. Torres, P. Benoit, T. Gil, C. Diou, G. Cambon, and J. Galy

Microprocessors are today getting more and more inefficient for a growing range of applications. Its principles -The Von Neumann paradigm[3]- based on the sequential execution of algorithms will no longer be able to cope with the kind of highly computing intensive applications of multimedia world. Nowadays approaches to deal with these limitations consist in the following:
- The first, and most natural way to increase the computing power is obviously to decrease the cycle execution time, thanks to new silicon technology: The functional frequencies for the newcomers CPUs are now getting on the way to 2 GHz.
- The second approach is co-design. The intended general purpose CPU will confide the computation of the most time demanding applications to a dedicated core. The most famous example are PC graphic cards which manage all the 2D and 3D display operations that even high-end CPUs are not able to handle efficiently. Both methods are not satisfying. The first one quickly finds its limitations in however limited functional frequencies and power consumption reduction, as the second requires the design of a new core for each intended algorithm. New parallel execution based machine paradigms must be considered. Thanks to their high level of flexibility structurally programmable architectures are potentially interesting candidates to overcome classical CPUs limitations.
Based on a parallel execution model, we present in this paper a new dynamically reconfigurable architecture, dedicated to data oriented applications acceleration. Principles, realizations and comparative results will be exposed for some classical applications, targeted on different architectures.

(Self-)reconfigurable Finite State Machines: Theory and Implementation [p. 559]
J. Teich and M. Köster

In this paper, we introduce the concept of (self-)reconfigurable finite state machines as a formal model to describe state-machines implemented in hardware that may be reconfigured during operation. By the advent of reconfigurable logic devices such as FPGAs, this model may become important to characterize and implement (self-)reconfigurable hardware. An FSM is called (self-)reconfigurable if reconfiguration of either output function or transition function is initiated by the FSM itself and not based on external reconfiguration events. We propose an efficient hardware realisation and give algorithmic solutions and bounds for the reconfiguration overhead of migrating a given FSM specification into a new target FSM.


6C: Analogue Modelling, Layout and Sizing

Moderators: H. Graeb, TU Munich, DE; G. Gielen, KU Leuven, BE

A Linear-Centric Simulation Framework for Parametric Fluctuations [p. 568]
E. Acar, S. Nassif, and L. Pileggi

The relative tolerances for interconnect and device parameter variations have not scaled with feature sizes which have brought about significant performance variability. As we scale toward 10nm technologies, this problem will only worsen. New circuit families and design methodologies will emerge to facilitate construction of reliable systems from unreliable nanometer scale components. Such methodologies require new models of performance which accurately capture the manufacturing realities. Recently, one step toward this goal was made via a new variational reduced order interconnect model that efficiently captures large scale fluctuations in global parameter values. Using variational calculus the linear interconnect systems are represented by analytical models that include the global variational parameters explicitly. In this work we present a framework which extends the previous work to a linear-centric simulation methodology with accurate nonlinear device models and their fluctuations. The framework is applied to generate path delay distributions under nonlinear and linear parameter fluctuations.

Automatic Generation of Common-Centroid Capacitor Arrays with Arbitrary Capacitor Ratio [p. 576]
M. Dessouky and D. Sayed

The key performance of many analog circuits is directly related to accurate capacitor ratios. It is well known that capacitor ratio precision is greatly enhanced by paralleling identical size unit capacitors in a commoncentroid geometry. In this paper, a general algorithm for fitting arbitrary capacitor ratios in a common-centroid unit-capacitor array is presented. The algorithm gives special care to both non-integer and identical ratios in order to minimize mismatch. A method for capacitance mismatch estimation based upon an oxide gradient model is also introduced. It enables the comparison of different unit-capacitor array assignments. Layout issues are discussed with emphasis on a generic routing model. Both the algorithm and the mismatch estimation method are implemented in an automatic capacitor array generation tool.

Analog Circuit Sizing Using Adaptive Worst-Case Parameter Sets [p. 581]
R. Schwencker, F. Schenkel, M. Pronath, and H. Graeb

In this paper, a method for nominal design of analog integrated circuits is presented that includes process variations and operating ranges by worst-case parameter sets. These sets are calculated adaptively during the sizing process based on sensitivity analyses. The method leads to robust designs with high parametric yield, while being much more efficient than design centering methods.

High-Frequency Nonlinear Amplifier Model for the Efficient Evaluation of Inband Distortion Under Nonlinear Load-Pull Conditions [p. 586]
G. Vandersteen, P. Wambacq, S. Donnay, and F. Verbeyst

Designing complex analog systems needs different abstraction levels to reduce the overall complexity. The required level of abstraction depends on the accuracy and the purpose of the model. High-frequency amplifier models can vary from simple transfer functions for efficient biterror-rate analysis up to detailed transistor level descriptions for accurate load-pull prediction. This paper introduces a nonlinear black-box model for high-frequency amplifiers. It extends the linear S-parameter representation to enable both efficient system-level simulations and loadpull prediction. Both are demonstrated on the measurements of a high-frequency amplifier excited using WLAN-OFDM modulation.


6D: Test Resource Partitioning for Embedded Cores

Moderators: Z. Peng, Linköping U, SE; B. Rouzeyre, LIRMM, FR

Effective Software Self-Test Methodology for Processor Cores [p. 592]
N. Kranitis, A. Paschalis, D. Gizopoulos, and Y. Zorian

Software self-testing for embedded processor cores based on their instruction set, is a topic of increasing interest since it provides an excellent test resource partitioning technique for sharing the testing task of complex Systems-on-Chip (SoC) between slow, inexpensive testers and embedded code stored in memory cores of the SoC. We introduce an efficient methodology for processor cores self-testing which requires knowledge of their instruction set and Register Transfer (RT) level description. Compared with functional testing methodologies proposed in the past, our methodology is more efficient in terms of fault coverage, test code size and test application time. Compared with recent software based structural testing methodologies for processor cores, our methodology is superior in terms of test development effort and has significantly smaller code size and memory requirements, while virtually the same fault coverage is achieved with an order of magnitude smaller test application time.

Test Resource Partitioning and Reduced Pin-Count Testing Based on Test Data Compression [p. 598]
A. Chandra and K. Chakrabarty

We present a new test resource partitioning (TRP) technique for reduced pin-count testing of system-on-a-chip (SOC). The proposed technique is based on test data compression and on-chip decompression. It makes effective use of frequency-directed run-length codes, internal scan chains, and boundary scan chains. The compression/ decompression scheme decreases test data volume and the amount of data that has to be transported from the tester to the SOC. We show via analysis as well as through experiments that the proposed TRP scheme reduces testing time and allows the use of a slower tester with fewer I/O channels. Finally, we show that an uncompacted test set applied to an embedded core after on-chip decompression is likely to increase defect coverage.

Improving Compression Ratio, Area Overhead, and Test Application Time for System-on-a-Chip Test Data Compression/Decompression [p. 604]
P. Gonciari, B. Al-Hashimi, and N. Nicolici

This paper proposes a new test data compression/ decompression method for systems-on-a-chip. The method is based on analyzing the factors that influence test parameters: compression ratio, area overhead and test application time. To improve compression ratio, the new method is based on a Variable-length Input Huffman Coding (VIHC), which fully exploits the type and length of the patterns, as well as a novel mapping and reordering algorithm proposed in a pre-processing step. The new VIHC algorithm is combined with a novel parallel on-chip decoder that simultaneously leads to low test application time and low area overhead. It is shown that, unlike three previous approaches [2, 3, 10] which reduce some test parameters at the expense of the others, the proposed method is capable of improving all the three parameters simultaneously. For example, the proposed method leads to similar or better compression ratio when compared to frequency directed run-length coding [2], however with lower area overhead and test application time. Similarly, there is comparable or lower area overhead and test application time with respect to Golomb coding [3], with improvements in compression ratio. Finally, there is similar or improved test application time when compared to selective coding [10], with reductions in compression ratio and significantly lower area overhead. An experimental comparison on benchmark circuits validates the proposed method.

Problems Due to Open Faults in the Interconnections of Self-Checking Data-Paths [p. 612]
M. Favalli and C. Metra

In this work, the problem of open faults affecting the interconnections of SC circuits composed by data-path and control is analyzed. In particular, it is shown that, in case opens affect control signals, some problems may arise even if both control and data-path signals are concurrently checked. In particular, wrong codewords may be generated at the outputs of multiplexers and registers. To address this problem, new registers and multiplexers are proposed which allow the design data-paths which are TSC with respect to opens (and resistive opens). These components are also TSC with respect to stuck-at, transistor and gross delay faults. They present a good testability with respect to resistive bridgings.


6E: System Level Simulation and Modelling

Moderators: B. Al-Hashimi, Southampton U, UK; P. Schwarz, FhG IIS/EAS Dresden, DE

Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design [p. 620]
S. Yoo, G. Nicolescu, L. Gauthier, and A. Jerraya

To enable fast and accurate evaluation of HW/SW implementation choices of on-chip communication, we present a method to automatically generate timed OS simulation models. The method generates the OS simulation models with the simulation environment as a virtual processor. Since the generated OS simulation models use final OS code, the presented method can mitigate the OS code equivalence problem. The generated model also simulates different types of processor exceptions. This approach provides two orders of magnitude higher simulation speedup compared to the simulation using instruction set simulators for SW simulation.

Window-Based Susceptance Models for Large-Scale RLC Circuit Analyses [p. 628]
Z. Zheng, L. Pileggi, M. Beattie, and B. Krauter

Due to the increasing operating frequencies and the manner in which the corresponding integrated circuits and systems must be designed, the extraction, modeling and simulation of the magnetic couplings for final design verification can be a daunting task. In general, when modeling inductance and the associated return paths, one must consider the on-chip conductors as well as the system packaging. This can result in an RLC circuit size that is impractical for traditional simulators. In this paper we demonstrate a localized, window-based extraction and simulation methodology that employs the recently proposed susceptance (the inverse of inductance matrix) concept. We provide a qualitative explanation for the efficacy of this approach, and demonstrate how it facilitates pre-manufacturing simulations that would otherwise be intractable. A critical aspect of this simulation efficiency is owed to a susceptance-based circuit formulation that we prove to be symmetric positive definite. This property, along with the sparsity of the susceptance matrix, enables the use of some advanced sparse matrix solvers. We demonstrate this extraction and simulation methodology on some industrial examples.

A Linear-Centric Modeling Approach to Harmonic Balance Analysis [p. 634]
P. Li and L. Pileggi

In this paper we propose a new harmonic balance simulation methodology based on a linear-centric modeling approach. A linear circuit representation of the nonlinear devices and associated parasitics is used along with corresponding time and frequency domain inputs to solve for the nonlinear steady-state response via successive chord (SC) iterations. For our circuit examples this approach is shown to be up to 60x more run-time efficient than traditional Newton-Raphson (N-R) based iterative methods, while providing the same level of accuracy. This SC-based approach converges as reliably as the N-R approaches, including for circuit problems which cause alternative relaxation-based harmonic balance approaches to fail[1][2]. The efficacy of this linear-centric methodology further improves with increasing model complexity, the inclusion of interconnect parasitics and other analyses that are otherwise difficult with traditional nonlinear models.

An Energy Estimation Method for Asynchronous Circuits with Application to an Asynchronous Microprocessor [p. 640]
P. Pénzes and A. Martin

This paper presents a simulator operating on a logical representation of an asynchronous circuit that gives energy estimates within 10% of electrical (hspice) simulation. Our simulator is the first such tool in the literature specifically targeted to efficient energy estimation of QDI asynchronous circuits. As an application, we show how the simulator has been used to accurately estimate the energy consumption in different parts of an asynchronous MIPS R3000 microprocessor. This is the first energy breakdown of an asynchronous microprocessor in the literature.


6F: Hot Topic -- Deep Submicron Design and Timing Closure

Moderator/Organizer: R. Otten, TU Eindhoven, NL
Speakers: R. Camposano, Synopsys, US; P. Groeneveld, Magma Design Automation, US; R. Otten, TU Eindhoven, NL

Design Automation for Deepsubmicron: Present and Future [p. 650]

Advancing technology drives design technology and thus design automation (EDA). How to model interconnect, how to handle degradation of signal integrity and increasing power density are changing now, and have led to integrating logic and layout synthesis. Aggressive gate sizing to control timing has become part of any modern back-end. From 0.13µ and down, chips will be more susceptive to breakdown during fabrication (antenna effect) or to wear out over time (electromigration) and dealing with these issues will require careful planning. More integration of fast and accurate analysis with a complete design ow (chip planning, synthesis, placement and routing) will be needed, and still, advancing complexity will affect design and verification. Using hundreds of millions of devices effectively will be possible only by reusing pre-designed intellectual property (IP) effectively and by addressing system-level issues in EDA. In the long term only more radical changes will keep us on Moore's track, changes that ultimately will have us depart from the two+-dimensional confinement and lead to multiple active layers, and changes that will affect deeply the face of EDA altogether.


7A: Panel -- Reconfigurable SoC -- What Will it Look Like?

Organizer: D. Davis, Actel, US
Moderator: B. Lewis, Gartner/Dataquest, US
Panellists: I. Bolsens, Xilinx, US; B. Gupta, STMicroelectronics, US; R. Lauwereins, IMEC, BE; Y. Tanurhan, Actel Corporation, US; C. Wheddon, Quicksilver Technology, US

Reconfigurable SoC . What Will it Look Like? [p. 660]

The argument against ASIC SoCs is that they have always taken too long and cost too much to design. As new process technologies come on line, the issue of inflexible, unyielding designs fixed in silicon becomes a serious concern. Without the flexibility of reconfigurable logic, will standard cell ASICs disappear and go the way of gate arrays? Will ASIC manufacturers lose their edge in providing intellectual value and become mere purveyors of square die area? The argument in favor of FPGAs is that they have always provided great design flexibility because they were configurable. The argument against FPGAs is that compared to ASICs they have always been larger, slower and more expensive. Will FPGAs ever become efficient enough to replace ASICs in volume production applications? ASSPs can be designed with partial reconfigurability. Will they become the norm? Or, will new reconfigurable logic cores change the SoC game completely? The answers to these questions will clearly impact system designers throughout the world and shape the future of the electronics industry. A panel of key industry executives each coming from a different area of the market with unique views will debate these highly controversial topics.


7B: Layout Aware Logic Synthesis

Congestion-Aware Logic Synthesis [p. 664]
D. Pandini, L. Pileggi, and A. Strojwas

In this era of Deep Sub-Micron (DSM) technologies, the impact of interconnects is becoming increasingly important as it relates to integrated circuit (IC) functionality and performance. In the traditional top-down IC design flow, interconnect effects are first taken into account during logic synthesis by way of wireload models. However, for technologies of 0.25mm and below, the wiring capacitance dominates the gate capacitance and the delay estimation based on fanout and design legacy statistics can be highly inaccurate. In addition, logic block size is no longer dictated solely by total cell area, and is often limited by wiring area resources. For these reasons, wiring congestion is an extremely important design factor, and should be taken into consideration at the earliest possible stages of the design flow. In this paper we propose a novel methodology to incorporate congestion minimization within logic synthesis, and present results for industrial circuits that validate our approach.

Layout Driven Decomposition with Congestion Consideration [p. 672]
T. Kutzschebauch and L. Stok

We present a novel algorithm that applies physical layout information during common subexpression extraction to improve wiring congestion and delay, resulting in improved design closure. As feature sizes decrease and chip sizes increase, the traditional separation of physical design and logic synthesis proves to be increasingly detrimental. Interconnect delay and wiring congestion, among the most critical objective functions to meet design closure, are not considered during logic synthesis. On the other hand, physical design is too deep in the design process to be able to significantly restructure the already technology mapped netlist. While this problem has been addressed previously, the existing solutions only apply simple synthesis transforms during physical design. Hence they are generally unable to reverse decisions made during logic restructuring which have a major negative impact on the circuit structure. In our novel approach, we propose a layout driven algorithm for the concurrent extraction of common subexpressions, one of the most important steps that affect the overall circuit structure, and consequently congestion and wire length during logic synthesis. In addition, we consider dependency relations between cube divisors to improve the extraction process. As a result, our layout driven decomposition algorithm combines logic synthesis and physical layout information to effectively decrease wire length and improve congestion for improved design closure.

Improving Placement under the Constant Delay Model [p. 677]
K. Sulimma, W. Kunz, I. Neumann, and L. van Ginneken

In this paper, we show that under the constant delay model the placement problem is equivalent to minimizing a weighted sum of wire lengths. The weights can be efficiently computed once in advance and still accurately reflect the circuit area throughout the placement process. The existence of an efficient and accurate cost function allows us to directly optimize circuit area. This leads to better results compared to heuristic edge weight estimates or optimization for secondary criteria such as wire length. We leverage this property to improve a recursive partitioning based tool flow. We achieve area savings of 27% for some circuits and 15% on average. The use of the constant delay model additionally enables timing closure without iterations.

Crosstalk Alleviation for Dynamic PLAs [p. 683]
T. Tien, T. Tsai, and S. Chang

The dynamic PLA style has become popular in designing high performance microprocessors because of its high speed and predictable routing delay. However, like all other dynamic circuits, dynamic PLAs have suffered from the crosstalk noise problem. In this paper,we propose two techniques to alleviate crosstalk noise for dynamic PLAs. The first technique makes use of the fact that depending on the ordering of product lines, some crosstalk does not cause errors in outputs. A proper ordering can greatly reduce the number of lines affected by crosstalk noise. For those product lines which can be affected by crosstalk, we attempt to reduce the parallel length by re-ordering the input and output lines. We have performed experiments on a large set of MCNC benchmark circuits. The results show that after re-ordering, 86.7% of product lines become crosstalk immune and need not be considered for crosstalk prevention.


7C: Buffering and Tapering

Moderators: J. Lienig, Bosch, DE; F. Johannes, TU Munich, DE

Flip-Flop and Repeater Insertion for Early Interconnect Planning [p. 690]
R. Lu, G. Zhong, C. Koh, and K. Chao

We present a unified framework that considers flip-flop and repeater insertion and the placement of flip-flop/repeater blocks during RT or higher level design. We introduce the concept of independent feasible regions in which flip-flops and repeaters can be inserted in an interconnect to satisfy both delay and cycle time constraints. Experimental results show that, with flip-flop insertion, we greatly increase the ability of interconnects to meet timing constraints. Our results also show that it is necessary to perform interconnect optimization at early design steps as the optimization will have even greater impact on the chip layout as feature size continually scales down.

Congestion Estimation with Buffer Planning in Floorplan Design [p. 696]
W. Wong, C. Sham, and F. Young

In this paper, we study and implement a routability-driven floorplanner with buffer block planning. It evaluates the routability of a floorplan by computing the probability that a net will pass through each particular location of a floorplan taken into account buffer locations and routing blockages. Experimental results show that our congestion model can optimize congestion and delay (by successful buffer insertions) of a circuits better with only a slight penalty in area.

Maze Routing with Buffer Insertion under Transition Time Constraints [p. 702]
L. Huang, M. Lai, D. Wong, and Y. Gao

In this paper, we address the problem of simultaneous routing and buffer insertion. Recently in [12, 22], the authors considered simultaneous maze routing and buffer insertion under the Elmore delay model. Their algorithms can take into account both routing obstacles and restrictions on buffer locations. It is well known that Elmore delay is only a first-order approximation of signal delay and hence could be very inaccurate. Moreover, we cannot impose constraints on the transition times of the output signal waveform at the sink or at the buffers on the route. In this paper we extend the algorithm in [12] so that accurate delay models (e.g., transmission line model, delay look-up table from SPICE, etc.) can be used. We show that the problem of finding a minimum-delay buffered routing path can be formulated as a shortest path problem in a specially constructed weighted graph. By including only the vertices with qualifying transition times in the graph, we guarantee that all transition time constraints are satisfied. Our algorithm can be easily extended to handle buffer sizing and wire sizing. It can be applied iteratively to improve any given routing tree solution. Experimental results show that our algorithm performs well.

Optimal Transistor Tapering for High-Speed CMOS Circuits [p. 708]
L. Ding and P. Mazumder

Transistor tapering is a widely used technique applied to optimize the geometries of CMOS transistors in high-performance circuit design with a view to minimizing the delay of a FET network. Currently, in a long series-connected FET chain, the dimensions of the transistors are decreased from bottom transistor to the top transistor in a manner where the width of transistors is tapered linearly or exponentially. However, it has not been mathematically proved whether either of these tapering schemes yields optimal results in terms of minimization of switching delays of the network. In this paper, we rigorously analyze MOS circuits consisting of long FET chains under the widely used Elmore delay model and derive the optimality of transistor tapering by employing variational calculus. Specifically, we demonstrate that neither linear nor exponential tapering alone minimizes the discharge time of the FET chain. Instead, a composition of exponential and constant tapering actually optimizes the delay of the network. We have also corroborated our analytical results by performing extensive simulation of FET networks and showing that both analytical and simulation results are always consistent.


7D: Automatic Design Debug and TPG

Moderators: P. Teixeira, INESC-IST, PT; B. Straube, FhG IIS/EAS Dresden, DE

Incremental Diagnosis and Correction of Multiple Faults and Errors [p. 716]
A. Veneris, J. Liu, M. Amiri, and M. Abadir

An incremental simulation-based approach to fault diagnosis and logic debugging is presented. During each iteration of the algorithm, a single suspicious location is identified and fault modeled such that the functionality of the new design becomes "closer" to its specification. The method is based on a simple and, at a first glance, counter-intuitive theoretical result along with a number of heuristics which help avoid the exponential complexity inherent to the problems. Experiments on multiple design errors and multiple stuck-at faults confirm its effectiveness and accuracy, which scales well with increasing number of errors.

Test Enrichment for Path Delay Faults Using Multiple Sets of Target Faults [p. 722]
I. Pomeranz and S. Reddy

Test sets for path delay faults in circuits with large numbers of paths are typically generated for path delay faults associated with the longest circuit paths. We show that such test sets may not detect faults associated with the next-to-longest paths. This may lead to undetected failures since shorter paths may fail without any of the longest paths failing. In addition, paths that appear to be shorter may actually be longer than the longest paths if the procedure used for estimating path length is inaccurate. We propose a test enrichment procedure that increases significantly the number of faults associated with the next-to-longest paths that are detected by a (compact) test set. This is achieved by allowing the underlying test generation procedure the flexibility of detecting or not detecting the faults associated with the next-to-longest paths. Faults associated with next-to-longest paths are detected without increasing the number of tests beyond that required to detect the faults associated with the longest paths. The proposed procedure thus improves the quality of the test set without increasing its size.

FACTOR: A Hierarchical Methodology for Functional Test Generation and Testability Analysis [p. 730]
V. Vedula and J. Abraham

This paper develops an improved approach for hierarchical functional test generation for complex chips. In order to deal with the increasing complexity of functional test generation, hierarchical approaches have been suggested wherein functional constraints are extracted for each module under test (MUT) within a design. These constraints describe a simplified ATPG view for the MUT and thereby speed up the test generation process. This paper develops an improved approach which applies this technique at deeper levels of hierarchy, so that effective tests can be developed for large designs with complex submodules. A tool called FACTOR (FunctionAl ConsTraint extractOR), which implements this methodology is described in this work. Results on the ARM design prove the effectiveness of FACTOR-ising large designs for test generation and testability analysis.


7E: Object Oriented System Specification and Design

Moderators: W. Grass, Passau U, DE; E. Villar, Cantabria U, ES

An Environment for Dynamic Component Composition for Efficient Co-Design [p. 736]
F. Doucet, S. Shukla, R. Gupta, and M. Otsuka

This article describes the Balboa component integration environment that is composed of three parts: a script language interpreter, compiled C++ components, and a set of Split-Level Interfaces to link the interpreted domain to the compiled domain. The environment applies the notion of split-level programming to relieve system engineers of software engineering concerns and to let them focus on system architecture. The script language is a Component Integration Language because it implements a component model with introspection and loose typing capabilities. Component wrappers use split-level interfaces that implement the composition rules, dynamic type determination and type inference algorithms. Using an interface description language compiler automatically generates the split-level interfaces. The contribution of this work is two fold: an active code generation technique, and a three-layer environment that keeps the C++ components intact for reuse. We present an overview of the environment; demonstrate our approach by building three simulation models for an adaptive memory controller, and comment on code generation ratios.

Functional Verification for SystemC Descriptions Using Constraint Solving [p. 744]
F. Ferrandi, M. Rendine, and D. Sciuto

This paper addresses the problem of test vectors generation starting from an high level description of the system under test, specified in SystemC. The verification method considered is based upon the simulation of input sequences. The system model adopted is the classical Finite State Machine model. Then, according to different strategies, a set of sequences can be obtained, where a sequence is an ordered set of transitions. For each of these sequences, a set of constraints is extracted. Test sequences can be obtained by generating and solving the constraints, by using a constraint solver (GProlog). A solution of the constraint solver yields the values of the input signals for which a sequence of transitions in the FSM is executed. If the constraints cannot be solved, it implies that the corresponding sequence cannot be executed by any test. The presented algorithm is not based on a specific fault model, but aims at reaching the highest possible path coverage.

The Modelling of Embedded Systems Using HASoC [p. 752]
M. Edwards and P. Green

We present a design method (HASoC) for the lifecycle modelling of embedded systems that are targeted primarily, but not necessarily, at SoC implementations. The object-oriented development technique is based on our experiences of using an existing modelling technique (MOOSE) and supports a lifecycle that explicitly separates the behaviour of a system from its hardware and software implementation technologies. The design process, which uses a UML-RT-based notation, begins with the incremental development and validation of an executable model of a system. This model is then partitioned into hardware and software to create a committed model, which is mapped onto a system platform. The methodology emphasises the reuse of preexisting hardware and software platforms to ease the development process. An example application is presented in order to illustrate the main concepts in HASoC.

A Functional Specification Notation for Co-Design of Mixed Analog-Digital Systems [p. 760]
A. Dobol and R. Vemuri

This paper discusses aBlox -- a specification notation for high-level synthesis of mixed-signal systems. aBlox addresses three important aspects of mixed-signal system specification: (1) description of functionality and (2) performance issues and (3) expression of analog-digital interactions. The semantics of aBlox embeds concepts and rules of a functional computational model, and uses a declarative style to denote performance elements. The paper shows some mixed-signal specifications that we developed in aBlox. Finally, we describe a high-level analog synthesis experiment that used aBlox specifications as inputs.


8A: Hot Topic -- UML: Using the Unified Modeling Language for Embedded System Specification

Moderator/Organizer: L. Lavagno, Politecnico di Torino, IT

The Real-Time UML Standard: Definition and Application [p. 770]
B. Selic

This very short paper describes the objectives, content, and usage of a real-time UML profile that has been standardized by the Object Management Group. This profile defines a common framework for describing the quantitative aspects of software systems. In addition, it provides specific facilities for analysing real-time systems for schedulability or performance.

UML for Embedded Systems Specification and Design: Motivation and Overview [p. 773]
G. Martin

The specification, design and implementation of embedded systems demands new approaches which go beyond traditional hardware-based notations such as HDLs. The growing dominance of software in embedded systems design requires a careful look at the latest methods for software specification and analysis. The development of the Unified Modeling Language (UML), and a number of extension proposals in the realtime domain holds promise for the development of new design flows which move beyond static and traditional partitions of hardware and software. However, UML as currently defined lacks several key capabilities. In this paper, we will survey the requirements for system-level design of embedded systems, and give an overview of the extensions required to UML that will be dealt with in more detail in the related papers. In particular, we will discuss how the notions of platform-based design intersect with a UML based development approach.

A UML-Based Design Methodology for Real-Time and Embedded Systems [p. 776]
G. de Jong

The fast growing complexity of today's real time embedded systems necessitates new design methods and tools to face the problems of design, analysis, integration and validation of complex systems. We present a system level design method for embedded real-time systems combining the informal strengths of UML with the formal strengths of SDL. We demonstrate our flow by the design example of a telecommunications application from the wireless or access domain, showing the applicability of the flow to control and data - dominated types of systems. Finally we will show how the application results and other end-user needs and requirements influenced the current UML 2.0 proposal with support for real-time and embedded systems.


8B: Real-Time Embedded Systems

Moderators: Z. Peng, Linköping U, SE; J. Sifakis, VERIMAG, FR

Minimum Energy Fixed-Priority Scheduling for Variable Voltage Processor [p. 782]
G. Quan and X. Hu

To fully exploit the benefit of variable voltage processors, voltage schedules must be designed in the context of work load requirement. In this paper, we present an approach to finding the least-energy voltage schedule for executing realtime jobs on such a processor according to a fixed priority, preemptive policy. The significance of our approach is that the theoretical limit in terms of energy saving for such systems is established, which can thus serve as the standard to evaluate the performance of various heuristic approaches. Two algorithms for deriving the optimal voltage schedule are provided. The first one explores fundamental properties of voltage schedules while the second one builds on the first one to further reduce the computational cost. Experimental results are shown to compare the results of this paper with previous ones.

A Dynamic Voltage Scaling Algorithm for Dynamic-Priority Hard Real-Time Systems Using Slack Time Analysis [p. 788]
W. Kim, J. Kim, and S. Min

Dynamic voltage scaling (DVS), which adjusts the clock speed and supply voltage dynamically, is an effective technique in reducing the energy consumption of embedded realtime systems. The energy efficiency of a DVS algorithm largely depends on the performance of the slack estimation method used in it. In this paper, we propose a novel DVS algorithm for periodic hard real-time tasks based on an improved slack estimation algorithm. Unlike the existing techniques, the proposed method takes full advantage of the periodic characteristics of the real-time tasks under priority-driven scheduling such as EDF. Experimental results show that the proposed algorithm reduces the energy consumption by 2040% over the existing DVS algorithm. The experiment results also show that our algorithm based on the improved slack estimation method gives comparable energy savings to the DVS algorithm based on the theoretically optimal (but impractical) slack estimation method.

Extending Synchronous Languages for Generating Abstract Real-Time Models [p. 795]
G. Logothetis and K. Schneider

We present an extension of synchronous programming languages that can be used to declare program locations irrelevant for verification. An efficient algorithm is proposed to generate from the output of the usual compilation an abstract real-time model by ignoring the irrelevant states, while retaining the quantitative information. Our technique directly generates a single real-time transition system, thus overcoming the known problem of composing several real-time models. A major application of this approach is the verification of real-time properties by symbolic model checking.


8C: Interconnect Modelling

Moderators: J. Phillips, Cadence Berkeley Labs, US; L. Silveira, IST/INESC, PT

An Interconnect-Aware Methodology for Analog and Mixed Signal Design, Based on High Bandwidth (Over 40 Ghz) On-Chip Transmission Line Approach [p. 804]
D. Goren, M. Zelikson, T. Galambos, R. Gordin, B. Livshitz, A. Amir, A. Sherman, and I. Wagner

This paper presents an on-chip, interconnect-aware methodology for high-speed analog and mixed signal (AMS) design which enables early incorporation of on-chip transmission line (T-line) components into AMS design flow. The proposed solution is based on a set of parameterized T-line structures, which include single and two coupled microstrip lines with optional side shielding, accompanied by compact true transient models. The models account for frequency dependent skin and proximity effects, while maintaining passivity requirements due to their pure RLC nature. The signal bandwidth supported by the models covers a range from DC to 100 GHz. The models are currently verified in terms of S-parameter data against hardware (up to 40 GHz) and against EM solver (up to 100 GHz). This methodology has already been used for several designs implemented in SiGe (Silicon-Germanium) BiCMOS technology.

Closed-Form Crosstalk Noise Metrics for Physical Design Applications [p. 812]
L. Chen and M. Marek-Sadowska

In this paper we present efficient closed-form formulas to estimate capacitive coupling-induced crosstalk noise for distributed RC coupling trees. The efficiency of our approach stems from the fact that only the five basic operations are used in the expressions: addition (x + y), subtraction (x − y), multiplication (x × y), division (x/y) and square root (√x). The formulas do not require exponent computation or numerical iterations. We have developed closed-form expressions for the peak crosstalk noise amplitude, the peak noise occurring time and the width of the noise waveform. Our approximations are conservative and yet achieve acceptable accuracy. The formulas are simple enough to be used in the inner loops of performance optimization algorithms or as cost functions to guide routers. They capture the influence of coupling direction (near-end and far-end coupling) and coupling location (near-driver and near-receiver).

Formulation of Low-Order Dominant Poles for Y-Matrix of Interconnects [p. 820]
Q. Xu and P. Mazumder

This paper presents an efficient approach to compute the dominant poles for the reduced-order admittance (Y parameter) matrix of lossy interconnects. Using the global approximation technique, the efficient frameworks are constructed to transform the frequency-domain Telegrapher's equations into compact linear algebraic equations. The dominant poles and residues can be extracted by directly solving the linear equations. The closed-form formulas are derived to compute the low-order dominant poles. Due to high accuracy of the global approximation, the extracted poles can accurately represent the exact admittance matrices in a wide frequency range. By using the recursive convolution technique, the pole-residue models can be represented by companion models, which have linear complexity with respect to the computational time. The presented modeling approaches are shown to preserve passivity. Numerical experiments of transient simulation show that the presented modeling approaches lead to higher efficiency, while maintaining comparable accuracy.

Library Compatible Ceff for Gate-Level Timing [p. 826]
B. Sheehan

Accurate gate-level static timing analysis in the presence of RC loads has become an important problem for modern deep-submicron designs. Non-capacitive loads are usually analyzed using the concept of an effective capacitance, Ceff. Most published algorithms for Ceff, however, require special cell characterization or supplemental information that is not part of standard timing libraries. In this paper we present a novel Ceff algorithm that is strictly compatible with existing timing libraries. It is also fast, easily implemented, and quite accurate--within 3% of transistor-level simulation in our tests. The method is based on approximating a gate by a current source, estimating the delay difference when the gate drives the actual RC load and a reference capacitor, and then converting the delay discrepancy into a Ceff value. Central to carrying out this program is the innovative concept of delay correction transfer function.


8D: On-Line Testing and Fault Tolerance

Moderators: L. Bouzaida, STMicroelectronics, FR; A. Singh, Auburn U, US

Self-Checking Scheme for the On-Line Testing of Power Supply Noise [p. 832]
C. Metra, L. Schiano, B. Riccò, and M. Favalli

We propose a self-checking scheme for the on-line testing of power supply noise exceeding a tolerance bound to be chosen accordingly to system's constraints. Upon the occurrence of such a noise, our scheme provides an output error message, which can be exploited for diagnosis purposes or to recover from the detected noise (thus guaranteeing the system's correct operation). As far as we are concerned, no on-line testing scheme for power supply noise has been proposed up to now. Our scheme negligibly impacts system's performance, features self-checking ability with respect to a wide set of possible internal faults and keeps on revealing on-line the occurrence of power supply noise, despite the possible presence of noise affecting also ground.

Automatic Modifications of High Level VHDL Descriptions for Fault Detection or Tolerance [p. 837]
R. Leveugle

The need for integrated mechanisms providing on-line error detection or fault tolerance is becoming a major concern due to the increasing sensitivity of the circuits to their environment. This paper reports on a tool automating the implementation of such mechanisms by modifying high-level VHDL descriptions. The modifications are compatible with industrial design flows based on commercial synthesis and simulation tools. The results demonstrate the feasibility and the efficiency of the approach.

Exploiting Idle Cycles for Algorithm Level Re-Computing [p. 842]
K. Wu and R. Karri

Although algorithm level re-computing techniques can trade-off the detection capability of Concurrent Error Detection (CED) vs. time overhead, it results in 100% time overhead when the strongest CED capability is achieved. Using the idle cycles in the data path to do the re-computation can reduce this time overhead. However dependencies between operations prevent the recomputation from fully utilizing the idle cycles. Deliberately breaking some of these data dependencies can further reduce the time overhead associated with algorithm level re-computing.

New Techniques for Speeding-Up Fault-Injection Campaigns [p. 847]
L. Berrojo, I. Gónzález, F. Corno, M. Sonza Reorda, G. Squillero, L. Entrena, and C. López

Fault-tolerant circuits are currently required in several major application sectors, and a new generation of CAD tools is required to automate the insertion and validation of fault-tolerant mechanisms. This paper outlines the characteristics of a new fault-injection platform and its evaluation in a real industrial environment. It also details techniques devised and implemented within the platform to speed-up fault-injection campaigns. Experimental results are provided, showing the effects of the different techniques, and demonstrating that they are able to reduce the total time required by fault-injection campaigns by at least one order of magnitude.


8E: Design Space Evaluation

Moderators: J. Teich, Paderborn U, DE; W. Kruijtzer, Philips Research, NL

System Design for Flexibility [p. 854]
C. Haubelt, J. Teich, K. Richter, and R. Ernst

With the term flexibility, we introduce a new design dimension of an embedded system that quantitatively characterizes its feasibility in implementing not only one, but possibly several alternative behaviors. This is important when designing systems that may adopt their behavior during operation, e.g., due to new environmental conditions, or when dimensioning a platform-based system that must implement a set of different behaviors. A hierarchical graph model is introduced that allows to model flexibility and cost of a system formally. Based on this model, an efficient exploration algorithm to find the optimal flexibility/cost-tradeoff-curve of a system using the example of the design of a family of Set-Top boxes is proposed.

Accurate Area and Delay Estimators for FPGAs [p. 862]
A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee

We present an area and delay estimator in the context of a compiler that takes in high level signal and image processing applications described in MATLAB and performs automatic design space exploration to synthesize hardware for a Field Programmable Gate Array (FPGA) which meets the user area and frequency specifications. We present an area estimator which is used to estimate the maximum number of Configurable Logic Blocks (CLBs) consumed by the hardware synthesized for the Xilinx XC4010 from the input MATLAB algorithm. We also present a delay estimator which finds out the delay in the logic elements in the critical path and the delay in the interconnects. The total number of CLBs predicted by us is within 16% of the actual CLB consumption and the synthesized frequency estimated by us is within an error of 13% of the actual frequency after synthesis through Synplify logic synthesis tools and after placement and routing through the XACT tools from Xilinx. Since the estimators proposed by us are fast and accurate enough, they can be used in a high level synthesis framework like ours to perform rapid design space exploration.

A Powerful System Design Methodology Combining OCAPI and Handel-C for Concept Engineering [p. 870]
K. Buchenrieder, A. Pyttel, and A. Sedlmeier

In this paper, we present an efficient methodology to validate high performance algorithms and prototype them using reconfigurable hardware. We follow a strict topdown Hardware/Software Codesign paradigm using stepwise refinement techniques. Starting from a performance evaluation on the data-flow level using the OCAPI system, we partition the simulated high-level data-flow description into hardware and software modules. The hardware parts, described in Handel-C, are compiled and mapped to Xilinx Virtex 2000E FPGAs, and the software is executed on a PC processor that hosts the Virtex boards. Hardware/software interfacing and communication between processor and FPGA is established via the PCI bus by shared memory DMA transfers. This paper presents the methodology and illustrates the method with an example of a channel coder.

Automated Concurrency Re-Assignment in High Level System Models for Efficient System-Level Simulation [p. 875]
N. Savoiu, S. Shukla, and R. Gupta

Simple and powerful modeling of concurrency and reactivity along with their efficient implementation in the simulation kernel are crucial to the overall usefulness of system level models using the C++-based modeling frameworks. However, the concurrency alignment in most modeling frameworks is naturally expressed along hardware units, being supported by the various language constructs, and the system designers express concurrency in their system models by providing threads for some modules/units of the model. Our experimental analysis shows that this concurrency model leads to inefficient simulation performance, and a concurrency alignment along dataflow gives much better simulation performance, but changes the conceptual model of hardware structures. As a result, we propose an algorithmic transformation of designs written in these C++-based environments with concurrency alignment along units/modules. This transformation, provided as a compiler front-end, will re-assign the concurrency along the dataflow, as opposed to threading along concurrent hardware/software modules, keeping the functionality of the model unchanged. Such a front-end transformation strategy will relieve hardware system designers from concerns about software engineering issues such as, threading architecture, and simulation performance, while allowing them to design in the most natural manner, whereas, the simulation performance can be enhanced up to almost two times as shown in our experiments.


9A: Hot Topic -- From System Specification to Layout: Seamless Top-Down Design Methods for Analogue and Mixed Signal Applications

Moderators/Organizers: I. Rugen-Herzig, Infineon Technologies, DE; R. Sommer, Infineon Technologies, DE

From System Specification To Layout: Seamless Top-Down Design Methods for Analog and Mixed-Signal Applications [p. 884]
R. Sommer, I. Rugen-Herzig, E. Hennig, U. Gatti, P. Malcovati, F. Maloberti, K. Einwich, C. Clauss, P. Schwarz, and G. Noessing

Design automation for analog/mixed-signal (A/MS) circuits and systems is still lagging behind compared to what has been reached in the digital area. As System-on-Chip (SoC) designs include analog components in most cases, these analog parts become even more a bottleneck in the overall design process. The paper is dedicated to latest R&D activities within the MEDEA+ project ANASTASIA+. Main focus will be the development of seamless top-down design methods for integrated analog and mixed-signal systems and to achieve a high level of automation and reuse in the A/MS design process. These efforts are motivated by the urgent need to close the current gap in the industrial design flow between system specification and design on the one hand and block-level circuit design on the other hand. The paper will focus on three subtopics starting with the topdown design flow with applications from circuit sizing, design centering, and automated behavioral modeling. The next part focuses on modeling and simulation of specific functionalities in sigma-delta design while the last section is dedicated to a mixed-signal System-on-Chip design environment.


9B: Architectural Level Synthesis

Moderators: P. Eles, Linköping U, SE; B. Mesman, Philips/TU Eindhoven, NL

Memory System Connectivity Exploration [p. 894]
P. Grun, N. Dutt, and A. Nicolau

In programmable embedded systems, the memory subsystem represents a major cost, performance and power bottleneck. To optimize the system for such different goals, the designer would like to perform Design Space Exploration, evaluating different memory modules from a memory IP library, and selecting the most promising designs. However, while the memory modules are important, the rate at which the memory system can produce the data for the CPU is significantly impacted by the connectivity architecture between the memory subsystem and the CPU. Thus, it is critical to consider the connectivity architecture early in the design flow, in conjunction with the memory architecture. We present a connectivity architecture exploration approach, evaluating a wide range of cost, performance, and energy connectivity architectures. When coupled with our memory modules exploration approach, we can significantly improve the system behavior. We present experiments on a set of large real-life benchmarks, showing significant performance improvements for varied cost and power characteristics, allowing the designer to tailor the performance, cost and power of the programmable embedded system.

Performance-Area Trade-Off of Address Generators for Address Decoder-Decoupled Memory [p. 902]
S. Hettiaratchi, P. Cheung, and T. Clarke

Multimedia applications are characterized by a large number of data accesses and complex array index manipulations. The built-in address decoder in the RAM memory model commonly used by most memory synthesis tools, unnecessarily restricts the freedom of address generator synthesis. Therefore a memory model in which the address decoder is decoupled from the memory cell array is proposed. In order to demonstrate the benefits and limitations of this alternative memory model, synthesis results for a Shift Register based Address Generator that does not require address decoding are compared to those for a counter-based address generator that requires address decoding. Results show that delay can be nearly halved at the expense of increased area.

Multiple-Precision Circuits Allocation Independent of Data-Objects Length [p. 909]
M. Molina, J. Mendias, and R. Hermida

This paper presents an heuristic method to solve the combined resource selection and binding problems for the high-level synthesis of multiple-precision specifications. Traditionally, the number of functional (and storage) units in a datapath is determined by the maximum number of operations scheduled in the same cycle, with their respective widths depending on the number of bits of the wider operations. When these wider operations are not scheduled in such "busy" cycle, this way of acting could produce a considerable waste of area. To overcome this problem, we propose the selection of the set of resources taking into account the only truly relevant aspect: the maximum number of bits calculated and stored simultaneously in a cycle. The implementation obtained is a multiple-precision datapath, where the number and widths of the resources are independent of the specification operations and data objects.


9C: Advanced Linear Modelling Techniques

Moderators: P. Feldmann, Celight Inc, US; G. Vandersteen, IMEC, BE

Efficient Model Reduction of Linear Time-Varying Systems via Compressed Transient System Function [p. 916]
E. Gad and M. Nakhla

This paper presents a new approach for model-order reduction of linear time varying system based on expanding the time-varying system in the right half plane of the s-domain. The proposed algorithm is developed through introducing Krylov subspace-based reduction to time-varying transfer functions. The proposed algorithm does not require solution of large system of equations to construct a basis for the time-varying moments. Instead, it computes such a basis through time-domain integration of the corresponding linear time-varying differential algebraic equations. Numerical experiments show that expanding in the right-half plane compresses the transient phase of the response of these equations by several orders of magnitude.

Passive Constrained Rational Approximation Algorithm Using Nevanlinna-Pick Interpolation [p. 923]
C. Coelho, L. Silveira, and J. Phillips

As system integration evolves and tighter design constraints must be met, it becomes necessary to account for the non-ideal behavior of all the elements in a system. For high-speed digital, and microwave systems, it is increasingly important to model previously neglected frequency domain effects. In this paper, results from Nevanlinna-Pick interpolation theory are used to develop a bounded real matrix rational approximation algorithm. A method is presented that allows for the generation of guaranteed passive rational function models of passive systems by approximating their scattering parameter matrices. Since the order of the models may in some cases be high, an incremental fitting strategy is also proposed that allows for the generation of smaller models while still meeting the required passivity and accuracy requirements. Results of the application of the proposed method to several real-world examples are also shown.

Model Reduction in the Time-Domain Using Laguerre Polynomials and Krylov Methods [p. 931]
Y. Chen, V. Balakrishnan, C. Koh, and K. Roy

We present a new passive model reduction algorithm-based on the Laguerre expansion of the time response of interconnect networks. We derive expressions for the Laguerre coefficient matrices that minimize a weighted square of the approximation error, and show how these matrices can be computed efficiently using Krylov subspace methods. We discuss the connections between our method and other methods such as PRIMA [4]. Numerical simulations show that our method can better approximate the original model as compared to PRIMA.


9D: Memory Testing and ATPG Issues

Moderators: H. Obermeir, Infineon Technologies, DE; M. Sonza Reorda, Politecnico di Torino, IT

An Optimal Algorithm for the Automatic Generation of March Tests [p. 938]
A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto

This paper presents an innovative algorithm for the automatic generation of March Tests. The proposed approach is able to generate an optimal March Test for an unconstrained set of memory faults in very low computation time.

Minimal Test for Coupling Faults in Word-Oriented Memories [p. 944]
A. van de Goor, M. Abadir, and A. Carlin

Most industrial memories have an external word-width of more than one bit. However, most published memory test algorithms assume 1-bit memories; they will not detect coupling faults between the cells of a word. This paper improves upon the state of the art in testing word-oriented memories by presenting a new method for detecting state coupling faults between cells of the same word, based on the use of m-out-of-n codes. The result is a reduction in test time, which varies between 20% and 30%.
Key words: State coupling faults, word-oriented memories, data backgrounds, m-out-of-n codes. The result is a reduction in test time, which varies between 20% and 30%.
Keywords: State coupling faults, word-oriented memories, tests, data backgrounds, m-out-of-n codes.

Maximizing Impossibilities for Untestable Fault Identification [p. 949]
M. Hsiao

This paper presents a new fault-independent method for maximizing local conflicting value assignments for the purpose of untestable faults identification. The technique first computes a large number of logic implications across multiple time-frames and stores them in an implication graph. Then, by maximizing conflicting scenarios in the circuit, the algorithm identifies a large number of untestable faults that require such impossibilities. The proposed approach identifies impossible combinations locally around each Boolean gate in the circuit, and its complexity is thus linear in the number of nodes, resulting in short execution times. Experimental results for both combinational and sequential benchmark circuits showed that many more untestable faults can be identified with this approach efficiently.

Automated Modeling of Custom Digital Circuits for Test [p. 954]
S. Bose

Models meant for logic verification and simulation are often used for ATPG. For custom digital circuits, these models contain many tristate devices, which leads to lower fault coverage. Unlike other research in the literature, the modeling algorithms presented in this paper analyze each channel connected component in the context of its environment, thereby capturing the relationship among its input signals. This reduces the number of tristates and increases the modeling efficiency, as measured by faults coverage. Experimental results demonstrate the superiority of this approach.


9E: Embedded Software Performance Analysis and Optimization

Moderators: H. Hsieh, UC Riverside, US; R. Lauwereins, IMEC, BE

False Path Elimination in Quasi-Static Scheduling [p. 964]
G. Arrigoni, L. Duchini, C. Passerone, L. Lavagno, and Y. Watanabe

We have developed a technique to compute a Quasi Static Schedule of a concurrent specification for the software partition of an embedded system. Previous work did not take into account correlations among run-time values of variables, and therefore tried to find a schedule for all possible outcomes of conditional expressions. This is advantageous on one hand, because by abstracting data values one can find schedules in many cases for an originally undecidable problem. On the other hand it may lead to exploring false paths, i.e., paths that can never happen at run-time due to constraints on how the variables are updated. This affects the applicability of the approach, because it leads to an explosion in the running time and the memory requirements of the compile-time scheduler itself. Even worse, it also leads to an increase in the final code size of the generated software. In this paper, we propose a semi-automatic algorithm to solve the problem of false paths: the designer identifies and tags critical expressions, and synchronization channels are automatically added to the specification to drive the search of a schedule. As a proof of concept, the proposed technique has been applied to a subsystem of an MPEG-2 decoder, and allowed us to find a schedule that previous techniques could not identify.

A Data Analysis Method for Software Performance Prediction [p. 971]
G. Bontempi and W. Kruijtzer

This paper explores the role of data analysis methods to support system-level designers in characterising the performance of embedded applications. In particular, we address the performance modelling of software applications running on an embedded microprocessor. We propose a data analysis method, which, on the basis of a parameterisation of the software functionality and the hardware architecture, is able to predict the number of execution cycles on an embedded processor. Experiments with standard computational code (sorting, mathematical computation) and with MPEG variable length decoding are presented to support this claim.

A Code Transformation-Based Methodology for Improving I-Cache Performance of DSP Applications [p. 977]
N. Liveris, N. Zervas, D. Soudris, and C. Goutis

This paper focuses on I-cache behaviour enhancement through the application of high-level code transformations. Specifically, a flow for the iterative application of the I-Cache performance optimizing transformations is proposed. The procedure of applying transformation is driven by a set of analytical equations, which receive parameters related to code and I-cache structure and predict the number of I-cache misses. Experimental results from a real-life demonstration application shows that order of magnitude reductions of the number of Icache misses can be achieved by the application of the proposed methodology.

A Compiler-Based Approach for Improving Intra-Iteration Data Reuse [p. 984]
M. Kandemir

Intra-iteration data reuse occurs when multiple array references exhibit data reuse in a single loop iteration. An optimizing compiler can exploit this reuse by clustering (in the loop body) array references with data reuse as much as possible. This reduces the number of intervening references between references to the same array and improves overall execution time and energy consumption. In this paper, we present a strategy where inter-statement and intrastatement optimizations are used in concert for optimizing intra-iteration data reuse. The objective is to cluster (within the loop body) the array references with spatial or temporal reuse. Using four array-intensive applications from image processing domain, we show that our approach improves the cache behavior of programs by 13.8% on the average.


9G: Technical Plenary -- 40 Years of EDA

Moderator: A. Jerraya, TIMA, Grenoble, FR

European CAD from the 60's to the New Millenium [p. 992]
Joseph Borel, J.B.-R&D Consulting, FR

CAD has always been hardly understood by the CEO's of companies because it obeys rules (if any) very different from the process. A rich variety of CAD and TCAD solutions have been developed in Europe in the early days of the CAD industry. These solutions have come to introduce real innovations in the field, but because they were mostly internal to the companies they have never reached the proper engineering level that would have enabled their introduction in the market. A review of the CAD history activity in Europe will be presented in this Plenary Session, together with some prospects on how it could evolve in the coming years and change from its lackluster industrial visibility.


10A: Hot Topic -- Design Technology for Networked Reconfigurable FPGA Platforms

Organizer/Moderator: I. Bolsens, Xilinx, US Speakers: D. Verkest, IMEC, BE; S. Guccione, Xilinx, US; S. Singh, Xilinx, US

Design Technology for Networked Reconfigurable FPGA Platforms [p. 994]
S. Guccione, D. Verkest, and I. Bolsens

Future networked appliances should be able to download new services or upgrades from the network and execute them locally. This flexibility is typically achieved by processors that can download new software over the network, using JAVA technology. This paper demonstrates that FPGAs are a realistic implementation platform for thin server or client applications. FPGAs can offer the same end-user experience as software based systems, combined with more computational power and lower cost.


10B: High-Level Synthesis and Asynchronous Pipelines

Moderators: N. Dutt, UC Irvine, US; M. Renaudin, TIMA, Grenoble, FR

High-Speed Non-Linear Asynchronous Pipelines [p. 1000]
R. Ozdag, P. Beerel, M. Singh, and S. Nowick

Many approaches recently proposed for high-speed asynchronous pipelines are applicable only to linear datapaths. However, real systems typically have non-linearities in their datapaths, i.e. stages may have multiple inputs ("joins") or multiple outputs ("forks"). This paper presents several new pipeline templates that extend existing high-speed approaches for linear dynamic logic pipelines, by providing efficient control structures that can accommodate forks and joins. In addition, constructs for conditional computation are also introduced. Timing analysis and SPICE simulations show that the performance overhead of these extensions is fairly low (5% to 20%).

Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding [p. 1008]
M. Ferretti and P. Beerel

This paper presents a new fast and templatized family of fine-grain asynchronous pipeline stages based on the single-track protocol. No explicit control wires are required outside of the datapath and the data is 1- of-N encoded. With a forward latency of 2 transitions and a cycle time of 6 for most configurations, the new family can run at 1.6 GHz using MOSIS TSMC 0.25 µm process. This is significantly faster than all known quasi-delay-insensitive templates and has less timing assumptions than the recently proposed ultra-highspeed GasP bundled-data circuits.

Power-Manageable Scheduling Technique for Control Dominated High-Level Synthesis [p. 1016]
C. Chen and M. Sarrafzadeh

Optimizing power consumption at high-level is a critical step towards power-efficient digital system designs. This paper addresses the power management problem by scheduling a given control-dominated data flow graph. We discuss delay and power issues with scheduling, and propose an improvement algorithm for insertion of so called soft edges which enable power optimization under timing constraints. Power savings obtained by our approach on tested circuits range between 15% and 30% of the initial power dissipation.

Practical Instruction Set Design and Compiler Retargetability Using Static Resource Models [p. 1021]
Q. Zhao, B. Mesman, and T. Basten

The design of application (-domain) specific instruction-set processors (ASIPs), optimized for code size, has traditionally been accompanied by the necessity to program assembly, at least for the performance critical parts of the application. The highly encoded instruction sets simply lack the orthogonal structure present in e.g. VLIW processors, that allows efficient compilation. This lack of efficient compilation tools has also severely hampered the design space exploration of code-size efficient instruction sets, and correspondingly, their tuning to the application domain. In [13] a practical method is demonstrated to model a broad class of highly encoded instruction sets in terms of virtual resources easily interpreted by classic resource constrained schedulers (such as the popular list-scheduling algorithm), thereby allowing efficient compilation with well understood compilation tools. In this paper we will demonstrate the suitability of this model to also enable instruction set design (-space exploration) with a simple, well-understood and proven method long used in the High-Level Synthesis (HLS) of ASICs. A small case study proves the practical applicability of the method.


10C: Coupling and Switching Noise Modelling within Integrated Circuits

Moderators: E. Sicard, INSA, FR; G. Vandenbosch, KU Leuven, BE

Hierarchical Simulation of Substrate Coupling in Mixed-Signal ICs Considering the Power Supply Network [p. 1028]
T. Brandtner and R. Weigel

This paper presents a novel substrate coupling simulation tool that is well suited to floorplanning of large mixed signal IC designs. The IC layout may consist of several subcircuits, hence a hierarchical design flow, which is usually used for IC circuit design and layout, is supported. Coupling data modelling the substrate inside subcircuits are precalculated and subsequently used during floorplanning leading to shorter simulation time. In addition, the impedance model of the power grid is considered as well making it possible to provide estimation results of substrate coupling quickly after only one simulation step. The approach is verified by experimental results in 0.13µm CMOS and 0.25µm BiCMOS technologies.

Fast Method to Include Parasitic Coupling in Circuit Simulations [p. 1033]
B. Van Thielen and G. Vandenbosch

S-parameter based circuit simulators are used a lot for the design of microwave circuits. The accuracy of these simulators is limited by the fact that they do not take the electromagnetic coupling between the components and transmission lines that compose a circuit into account. In this article we present a technique that enables us to take this coupling into account without increasing the calculation time too much.

Accurate Estimating Simultaneous Switching Noises by Using Application Specific Device Modeling [p. 1038]
L. Ding and P. Mazumder

In this paper, we study the simultaneous switching noise problem by using an application-specific modeling method. A simple yet accurate MOSFET model is proposed in order to derive closed-form formulas for simultaneous switching noise voltage waveforms. We first derive a simple formula assuming that the inductances are the only parasitics. And through HSPICE simulation, we show that the new formula is more accurate than previous results based on the same assumption. We then study the effect of the parasitic capacitances of ground bonding wires and pads. We show that the maximum simultaneous switching noise should be calculated using four different formulas depending on the value of the parasitic capacitances and the slope of the input signal. The proposed formulas, modeling both parasitic inductances and capacitances, are within 3% of HSPICE simulation results.

Macromodeling of Digital I/O Ports for System EMC Assessment [p. 1044]
I. Stievano, F. Canavero, I. Maio, Z. Chen, D. Becker, and G. Katopis

This paper addresses the development of accurate and efficient behavioral models of digital integrated circuit input and output ports for EMC and signal integrity simulations. A practical modeling process is proposed and applied to some example devices. The modeling process is simple and efficient, and it yields models performing at a very high accuracy level.


10D: Panel -- Formal Verification Techniques: Industrial Status and Perspectives

Organizer: I. Moussa, TNI-Valiosys, FR
Moderator: R. Pacalet, ENST Paris, FR
Panellists: J. Blasquez, Texas Instruments, Villeneuve-Loubet, FR; M. van Hulst, Philips, Eindhoven, NL;
A. Fedeli, STMicroelectronics, Agrate, IT; J. Lambert, TNI-Valiosys, FR; D. Borrione, TIMA-UJF, FR;
C. Hanoch, Verisity, FR; P. Bricaud, Mentor Graphics, FR

Formal Verification Techniques: Industrial Status and Perspectives [p. 1050]

Research in applied formal verification has become a hot topic in circuit and system design due to rising circuit complexity. Design verification presents the biggest bottleneck in digital hardware design. Major hardware bugs found in ASIC design may cause expensive project delays when they are discovered during system test on the real silicon chip. The consequences are severe, from cost overruns to lost market opportunity. Simulation and emulation tools, which are traditionally used to find bugs in a design, often cannot find the corner cases or hard-to-find bugs that may occur only after hundreds of thousands of cycles, and are well beyond the reach of conventional simulation and emulation technologies. Formal methods have emerged as an alternative approach to ensure the quality and correctness of hardware designs, overcoming some of the limitations of traditional validation techniques such as simulation and testing.
But, the use of formal methods in the industry is still quite limited, due to the difficulty of use of many formal methods available nowadays and the lack of integration between them. In order to provide insight into the scope and limitations of currently available formal verification techniques, this panel will address questions such as the following:
ASIC's have been designed for more than twenty years without formal methods. Are formal methods really necessary? How can the research community convince designers to use formal methods? Is it easy to integrate them into traditional design flow?
Not all domains seem suitable for formal methods. Is it possible to isolate those application domains that are best suited for formal methods?
Formal verification requires specially trained people who understand how to apply the mathematical techniques to verify the design. Is there a re-education requirement for the design community in order to benefit from these tools?
The panel will also examine the use of formal verification in the design of SOC's. The questions here are: can formal methods be very effective for finding errors at high levels of abstraction before a large design time is invested in implementing a flawed system architecture? are verification tools ready for System-on-Chip design verification? are they mature enough to give IP credibility and robustness?


10E: Power Optimization for Embedded Processors

Moderators: W. Fornaciari, Politecnico di Milano, IT; L. Lavagno, Politecnico di Torino, IT

Low Power Embedded Software Optimization Using Symbolic Algebra [p. 1052]
A. Peymandoust, T. Simunic, and G. De Micheli

The market demand for portable multimedia applications has exploded in the recent years. Unfortunately, for such applications current compilers and software optimization methods often require designers to do part of the optimization manually. Specifically, the high-level arithmetic optimizations and the use of complex instructions are left to the designers' ingenuity. In this paper, we present a tool flow, SymSoft, that automates the optimization of power-intensive algorithmic constructs using symbolic algebra techniques combined with energy profiling. SymSoft is used to optimize and tune the algorithmic level description of an MPEG Layer III (MP3) audio decoder for the SmartBadge [2] portable embedded system. We show that our tool lowers the number of instructions and memory accesses and thus lowers the system power consumption. The optimized MP3 audio decoder software meets real-time constraints on the SmartBadge system with low energy consumption. Furthermore, the performance improves by a factor of 7.27 and the energy consumption decreases by a factor of 4.45 over the original executable specification.

An Adaptive Dictionary Encoding Scheme for SOC Data Buses [p. 1059]
T. Lv, W. Wolf, J. Henkel, and H. Lekatsas

As bus lengths on multi-hundred-million transistor SOCs (Systems-On-a-Chip) gro and as inter-wire capacitances of sub-0.10u technologies increase, the resulting high switching capacitances of buses (and interconnects in general) have a non-negligible impact on the power consumption of a whole SOC. In this paper, we address this problem by introducing our bus encoding technique 'ADES' that minimizes the power consumption of data buses through a dictionary-based encoding technique. We show that our technique saves between 18% and 40% of bus energy compared to the non-encoded cases using a large set of (freely-accessible) real-world applications. Furthermore, we compare our technique to the best-known data bus encoding techniques to date and it exceeds all of them in energy savings for the same set of applications. The additional hardware effort for our bus en/decoder is thereby very small.

Power Efficient Embedded Processor IP's through Application-Specific Tag Compression in Data Caches [p. 1065]
P. Petrov and A. Orailoglu

In this paper, we present a methodology for power minimization by data cache tag compression. The set of tags being accessed by the major application loops is analyzed statically during compile time and an efficient and optimal compression scheme is proposed. Only a very limited number of tag bits are stored in the tag array for cache conflict identification, thus achieving a significant reduction in the number of active bitlines, sense amps, and comparator cells. The underlying hardware support for dynamically compressing the tags consists of a highly cost and power efficient programmable encoder, which lies outside the cache access path, thus not affecting the processor cycle time. A detailed VLSI implementation has been performed and a number of experimental results on a set of embedded applications and numerical kernels is reported. Energy dissipation decreases of up to 95% can be observed for the tag arrays, while significant energy reductions in the range of 10%-50% are observed when amortized across the overall cache subsystem.

Systematic Power-Performance Trade-Off in MPEG-4 by Means of Selective Function Inlining Steered by Address Optimization Opportunities [p. 1072]
M. Palkovic, M. Miranda, and F. Catthoor

The hierarchical structure of real-life data dominated applications limits the exploration space for high level optimisations. This limitation is often overcome by function inlining. However, it increases the basic block code size, which causes a significant growth of instruction cache misses and thus performance slow-down. This effect has been confirmed on experiments with our applications. We have developed a novel methodology for selective function inlining steered by cost/gain balance to trade-off power and performance. Although this results in a speed up, the increase of the instruction cache misses is still present, i.e. the memory power consumption is higher. This implies the possibility of the Pareto-optimal trade-offs between memory power and performance. Our methodology is demonstrated on an MPEG-4 video decoder.


Poster Sessions

An Approach to Model Checking for Nonlinear Analog Systems [p. 1080]
W. Hartong, L. Hedrich, and E. Barke

We present the first approach to model checking for nonlinear analog systems. Based on digital CTL model checking ideas, results in hybrid model checking and special needs in analog verification, a new model checking tool has been implemented. Published model checking tools for hybrid systems require discrete or partly linear system descriptions. Our focus is on nonlinear analog behavior, therefore a new approach is necessary. There are mainly two aspects to be considered. Firstly, a discrete model retaining the essential nonlinear analog behavior has to be developed. Secondly, model checking for analog systems requires extensions of the language to define analog system properties in a reasonable way.

Speeding up SAT for EDA [p. 1081]
S. Pilarski and G. Hu

This paper presents performance results for a new SAT solver designed specifically for EDA applications. The new solver significantly outperforms most efficient SAT solvers -- Chaff[2], SATO[3], and GRASP[1] -- on a large set of benchmarks. Performance improvements for standard benchmark groups vary from 1.5x to 60x. They were achieved through a new decision-making strategy and more efficient boolean constraint propagation (BCP).

Search-Based SAT Using Zero-Suppressed BDDs [p. 1082]
F. Aloul, M. Mneimneh, and K. Sakallah

We introduce a new approach to Boolean satisfiability (SAT) that combines backtrack search techniques and zero-suppressed binary decision diagrams (ZBDDs). This approach implicitly represents SAT instances using ZBDDs, and performs search using an efficient implementation of unit propagation on the ZBDD structure. The adaptation of backtrack search algorithms to such an implicit representation allows for a potential exponential increase in the size of problems that can be handled.

An Encoding Technique for Low Power CMOS Implementations of Controllers [p. 1083]
M. Martínez, M. Avedillo, J. Quintana, M. Koegst, S. Rülke, and H. Süße

Power consumption is becoming one of the most critical parameters in VLSI design. In this paper we describe a novel state assignment algorithm targeting towards low power CMOS realizations of controllers. The main features of the new approach can be summarized as follows: 1) flexible column encoding strategy which allows handling the area and the register activity cost functions separately and 2) preliminary analysis of the FSM to control relative weight of each cost function. Experimental results show that on average there is a 25% reduction in power consumption compared to an standard tool and without area penalty.

Composition Trees in Finding Best Variable Orderings for ROBDDs [p. 1084]
E. Dubrova

The algorithms for static reordering of Reduced Ordered Binary Decision Diagrams (ROBDDs) rely on dependable properties for grouping of variables. Two such properties have been studied so far: keeping symmetric variables adjacent [1] and minimizing the ROBDD's width [2]. However, counterexamples have been found for the both cases [1], [3]. In this paper, we introduce a new condition for grouping of variables, suggesting to keep adjacent the variables from all bound sets of the function which are explicitly given by its composition tree. Bound set is a proper subset Y of the variables X of a function f : {0,1}|X| -> {0,1} resulting in the decomposition of type f(X) =g(h(Y),Z), where Z = X − Y. Composition tree of is a structure reflecting all its non-overlapping bound sets [4]- [6]. Bound-set-preserving ordering (X) of the variables of a ROBDD for f(X) is a vector, describing the variables of X in order from top to bottom of the ROBDD, in which the variables of any node of T(f) are adjacent in (X). For example, if a function f(x1, x2, x3) has a single non-trivial bound set {x1, x2}, then the orderings (x1, x2, x3), (x3, x1, x2), (x3, x2, x1) are bound-set-preserving ones, while the orderings (x1, x3, x2) and (x2, x3, x1) are not. A composition tree T(f) is unique for f (up to isotopy) and therefore any Boolean function has a unique bound-set-preserving ordering. We prove that the intersection of the set of bound-set-preserving orderings and the set of best orderings in non-empty for any Boolean function:

A Direct Mapping System for Datapath Module and FSM Implementation into LUT-Based FPGAs [p. 1085]
J. Abke and E. Barke

Today's high capacity Field-Programmable Gate Arrays (FPGAs) and the upcoming trend to System-On-Programmable-Chip (SOPC) require novel implementation strategies. These have to overcome long implementation times of traditional synthesis approaches. In this poster, a unique approach for technology mapping of both datapath modules and controller descriptions into Look-Up Table (LUT)-based FPGAs is presented. The proposed method starts at Register-Transfer-Level (RTL) and follows the Library of Parameterized Modules (LPM) standard. The mapping environment includes an implicit state minimization algorithm for FSMs.

Concurrent and Selective Logic Extraction with Timing Consideration [p. 1086]
P. Rezvani and M. Pedram

We study the problem of concurrent and selective logic extraction in a Boolean circuit. We first model the problem using graph theory, prove it to be NP-hard, and subsequently formulate it as a Maximum-Weight Independent Set problem in a graph. We then use efficient heuristics for solving the MWIS problem. Concurrent logic extraction not only allows us to achieve larger literal saving and smaller area due to a more global view of the extraction space, but also provides us with a framework for reducing the circuit delay.

Improved Technology Mapping for PAL-Based Devices Using a New Approach to Multi-Output Boolean Functions [p. 1087]
D. Kania

The effective technology mapping for PAL-based devices is presented in this paper. The aim of this method is to cover a multiple-output function by a minimal number of PAL-based logic blocks. The product terms included in a logic block can be shared by several functions. Experimental results are compared to the classical technology mapping method.

Efficient and Effective Redundancy Removal for Million-Gate Circuits [p. 1088]
M. Berkelaar and K. van Eijk

Redundancy removal of combinational circuits has been the subject of many papers over the last decades. Most of these papers work with the relatively small circuits available as benchmarks in the logic synthesis community. In Magma's BlastFusion and BlastChip software, very large blocks of logic (millions of gates) are handled flat (Blast-Fusion and BlastChip are registered trademarks of Magma Design Automation). We implemented redundancy removal in a way that will allow it to run efficiently (fast, low memory usage) and robustly (no run time or memory explosion on any netlist) on industrial designs of up to several million gates. We achieve this without resorting to partitioning. Other than most published approaches we do not try to identify all redundancies in a circuit, as an exact solution to this NP-hard problem is infeasible for the large circuits we face. Instead we try to identify as many as possible in a reasonable run time. We use a carefully engineered combination of Fault Collapsing, Random Test Generation (RTG) and the good old D-algorithm. As the goal is finding redundancies, and not sets of test vectors, these algorithms need changes and adaptations for optimal efficiency and robustness. Fault Collapsing can be more aggressive than for test generation. RTG was implemented with a novel dynamic control of the bitparallelism employed. The D-algorithm's effort control was not implemented with a traditional backtrack limit, but on a more fine-grain level, to increase robustness. For details, please refer to [1]. Results on 11 industrial netlists are shown in table 1. All tests were run on a Sun Ultra-80 workstation. A comparison is shown to a state-of-the-art SAT-based approach. Our approach is clearly faster while identifying more redundancies.

Visualization of Partial Order Models in VLSI Design Flow [p. 1089]
A. Bystrov, M. Koutny, and A. Yakovlev

A new method, algorithms and tool for the visualisation of a finite complete prefix (FCP) of a Petri net (PN) or a signal transition graph are presented. A transformation is defined that converts such a prefix into a two-level model. At the top level, it has a finite state machine (FSM), describing modes of operation and transitions between them. At the low level, there are marked graphs, which can be drawn as waveforms, embedded into the top level nodes. The models of both levels are abstractions traditionally used by electronics engineers. The resultant model is completed trace equivalent to the original prefix. Moreover, the branching structure of the latter is preserved as much as possible.

High-Level Modeling and Design of Asynchronous Arbiters for On-Chip Communication Systems [p. 1090]
J. Rigaud, L. Fesquet, M. Renaudin, and J. Quartana

This poster presents the design of complex arbitration modules, like those required in SoC communication systems. Clock-less, delay-insensitive arbiters are studied in the perspective of making easier and more practical the design of future GALS or GALA SoCs. This work focuses on high-level modeling and delay-insensitive implementations of low-power and reliable fixed and dynamic priority arbiters.

Power-Efficient Trace Caches [p. 1091]
J. Hu, N. Vijaykrishnan, M. Kandemir, and M. Irwin

The paper exploits the drawbacks of wasting power when accessing the instruction cache that stores only static sequence of instructions. Although trace cache is first introduced to catch the dynamic characteristics of instructions in execution, conventional trace cache (CTC) does increase the power consumption in fetch unit. A Sequential Trace Cache (STC) has been investigated for its power efficiency in this paper.

Reducing Cache Access Energy in Array-Intensive Applications [p. 1092]
M. Kandemir and I. Kolcu

Cache memories are known to consume a large percentage of on-chip energy in current microprocessors. For example, [1] reports that the on-chip cache in DEC Alpha 21264 consumes approximately 25% of the on-chip energy. Both sizes and complexities of state-of-the-art caches play a major role in their energy consumption. Direct-mapped caches are, in general, more energy efficient (from a per access energy consumption viewpoint) as they are simpler as compared to set-associative caches, and require no complex line replacement mechanisms (i.e., there is no decision concerning which line has to be evicted when a new line is to be loaded).
While there exists a large body of compiler-based techniques to manipulate access pattern of a given code to improve its cache utilization, there are not many compiler techniques that try to improve cache energy consumption of a given code. Rather, in many cases, a reliance is placed upon the observation that optimizing cache locality also optimizes cache energy. This is true to some extent as optimizing locality (performance) of memory accesses reduces the activity between cache and off-chip memory, and consequently, decreases the number of writes into cache. Recent work (e.g., [2]) also shows that the classical performance-oriented compiler optimizations (e.g., loop-level transformations) can be very effective in reducing overall memory system energy.

The Use of Runtime Configuration Capabilities for Networked Embedded Systems [p. 1093]
C. Nitsch and U. Kebschull

Reconfiguration is a very helpful feature that can improve the design life cycle of an embedded system and its quality. Reconfiguration means that software AND hardware parts may be updated in the field. The update of system hardware implies the use of FPGAs in a shipped system. Normally, the update is done server controlled, which means that the active role comes from an external instance. We present a new automatic reconfiguration approach that stores all system configuration data in XML format. The system itself searches for the related components a component broker, and sets up during start up. A case study shows that especially when dealing with permanently connected devices, we achieve promising results while spending a reasonable price.

A SAT Solver Using Software and Reconfigurable Hardware [p. 1094]
I. Skliarova and A. Ferrari

In this paper we propose a novel approach for solving the Boolean satisfiability problem by combining software and reconfigurable hardware. The suggested technique avoids instance-specific hardware compilation and, as a result, achieves a higher performance than pure software approaches. Moreover, it permits problems that exceed the resources of the available reconfigurable hardware to be solved.

A New Time Model for the Specification, Design, Validation and Synthesis of Embedded Real-Time Systems [p. 1095]
R. Münzenberger, M. Dörfel, F. Slomka, and R. Hofmann

An essential characteristic of embedded systems is realtime, but the commonly used specification techniques do not consider temporal aspects in general like fulfillment of high level timing requirements or dynamic reactions on timing violations. We show a new formal time model that fills this gap: Timing requirements specify the timing behaviour of real-time systems. Different models allow the specification of clock properties and the relations between clocks. With this time model, timing requirements as well as the desired properties of the involved clocks can be specified within a formal description technique.

Improved Constraints for Multiprocessor System Scheduling [p. 1096]
M. Grajcar and W. Grass

MILP-based models are useful for finding optimal schedules and for proving their optimality. Because of the problem complexity, model improvements have to be investigated. We analyze the constraints necessary for precluding resource conflicts, present novel formulations, and evaluate them. The efficiency of the solution process can be improved significantly by selecting the proper formulation.

A Fast Johnson-Mobius Encoding Scheme for Fault Secure Binary Counters
K.S. Papadomanolakis, A.P. Kakarountas, N. Sklavos and C.E. Goutis

The major characteristic of a counting unit is its performance. The basic properties that a fast counter must have are: i) high counting rate, preferably independent of the counter size, ii) a binary output; read on-the-fly, iii) sampling rate equal to the counting rate, and iv) a regular implementation suitable for VLSI. For safety critical applications, the synchronous operation of a fault-secure binary counter makes reading the counter's value difficult and reduces the counting rate proportionally to counter's size. In this paper an implementation of a fault-secure binary counter using the Johnson-Mobius encoding scheme is presented.

Maximizing Conditional Reuse by Pre-Synthesis Transformations [p. 1097]
O. Penalba, J. Mendias, and R. Hermida

The property called mutual exclusiveness, responsible for the degree of conditional reuse achievable after a high-level synthesis (HLS) process, is intrinsic to the systems behavior. But sometimes it is only partially reflected in the actual description written by a designer. Our algorithm performs a transformation of the input description that exploits the maximum conditional reuse of the behavior, independently of description style, allowing the HLS tools to obtain circuits with less area.

Control Circuit Templates for Asynchronous Bundled-Data Pipelines [p. 1098]
S. Tugsinavisut and P. Beerel

This paper proposes the use of templatized asynchronous control circuits with single-rail datapaths to create low-power bundled-data non-linear pipelines. First, we adapt an existing templatized control style for 1-of-N rail pipelines, the Pre-Charged Full Buffer PCFB [1], to bundled-data pipelines. Then, we present a novel true 4-phase template (T4PFB) that has lower control overhead. Simulation results indicate 12%-44% higher throughput for the pipeline stage equivalent to 8 to 40 gates.

Transforming Arbitrary Structures into Topologically Equivalent Slicing Structures [p. 1099]
O. Peyran and W. Zhuang

Floorplanning is an important step of IC design. Traditionally, floorplan representation has been segregated between slicing and non-slicing structures. We present a heuristic that translates any arbitrary structure into a slicing one, topologically equivalent to the initial one after a 1-D compaction.

A New Formulation for SOC Floorplan Area Minimization Problem [p. 1100]
C. Lee, Y. Lin, W. Fu, C. Chang, and T. Hsieh

In this poster, we presented a new formulation by introducing the concept of block partition such that the shape of modules can be automatically determined based on the goal of optimization. Experimental results from MCNC benchmarks indicate that the zero dead space solutions can be obtained for most test cases under our formulation.

Non-Rectangular Shaping and Sizing of Soft Modules in Floorplan Design [p. 1101]
C. Chu and F. Young

In this paper, we study the problem of changing the shapes and dimensions of the flexible modules to fill up the unused area of a preliminary floorplan, while keeping the relative positions between the modules unchanged. The selection of modules and empty spaces is made by the users interactively. We formulate the problem as a mathematical program. We use the Lagrangian relaxation technique [1, 2] to solve the problem. The formulation is in such a perfect way that the dimensions of all the rectangular and non-rectangular modules can be computed by closed form equations efficiently.

EZ Encoding: A Class of Irredundant Low Power Codes for Data Address and Multiplexed Address Buses [p. 1102]
Y. Aghaghiri, M. Pedram, and F. Fallah

In this paper, we introduce a class of irredundant low power encoding techniques for memory address buses. For a data address bus, the proposed encoding techniques make use of two working zones in the memory address space, whereas for a multiplexed data and instruction address bus, up to four working zones can be supported. The zones are dynamically updated to increase the saving in switching activity. Our techniques decrease the switching activity of data address and multiplexed address buses by an average of 55% and 77%, respectively, up from 25% and 64% achieved by previous methods.

Estimation of Power Consumption in Encoded Data Buses [p. 1103]
A. Garcia, L. Kabulepa, and M. Glesner

Because of the increasing importance of cross coupled capacitances in deep submicron technologies [1], it is of great interest to extend the existing high-level power estimation techniques by considering the spatial correlation between adjacent lines. This work addresses the modeling and estimation of power dissipation in on-chip buses based on the statistical properties of data sequences. Using the derived models, a power estimation technique is proposed and evaluated for various coding schemes. For different DSP applications, our results depict less than 5 % discrepancy with precise bit level estimations.

Optimization Techniques for Design of General and Feedback Linear Analog Amplifier with Symbolic Analysis [p. 1104]
T. Hieu

The analysis of linear analog amplifiers at the beginning of the design process shows in some cases an unwanted resonance in the amplitude response or an unwanted overshooting in the time domain. It is important for the designer to know design methods for compensating this effect. An approach of the symbolic analysis, that supports the representation of a signal-flow graph with feedback for an amplifier circuit, will be introduced. The method is based on the node analysis and mathematical handling of symbolic expressions. Using the proposed approach the feedback, the open-loop gain and the loop gain can be analyzed and calculated. With the analysis of pole-zero of the symbolic loop gain, parameters of the amplifier can be determined for the compensation of the amplitude response.

Critical Comparison among Some Analog Fault Diagnosis Procedures Based on Symbolic Techniques [p. 1105]
A. Luchetta, S. Manetti, and M. Piccirilli

The parametric fault diagnosis techniques hold an important part in the field of analog fault diagnosis. These techniques, starting from a series of measurements carried out on a previously selected test point set, given the circuit topology and the nominal values of the components, are aimed at determining the effective values of the circuit parameters by solving a set of equations nonlinear with respect to the component values. Here the role of symbolic techniques in the automation of parametric fault diagnosis of analog circuits is investigated. Being in fact the actual component values the unknown quantities, symbolic approach results particularly suitable for the automation of parametric fault diagnosis techniques, as shown, for example, in [1]. Obviously all this is applicable to linear analog circuits or to nonlinear circuits suitably linearized. On the other hand, present trend is moving as much as possible to techniques of design that lead to linear analog circuits, so this is not a so serious restriction [2].

The Selective Pull-Up (SP) Noise Immunity Scheme for Dynamic Circuits [p. 1106]
M. Stan and A. Panigrahi

Noise is an important consideration in the design of integrated circuits. Increased immunity to noise, however, typically comes at the expense of increased delay. So, it is very important to have an adequate noise immunity with a minimum penalty in performance. "Global" noise immunity schemes can be used when the noise is approximately the same on all nodes in the circuit; but when a few nodes are noisier then others much better results can be obtained by selective noise immunity schemes. The Selective Pull-up (SP) technique for dynamic circuits is a method for improving the noise immunity of inputs selectively, so that the least penalty in delay is paid for inputs that intrinsically have higher noise immunity.

Substrate Parasitic Extraction for RF Integrated Circuits [p. 1107]
A. Cathelin, D. Saias, D. Belot, Y. Leclercq, and F. Clement

Accurately predicting the impact of substrate parasitics in Radio Frequency design with simulations is one of the major concerns to ensure first silicon success in a System on Chip approach. The practical design experience of a 2GHz RF front-end circuit (designed in a 0.35 mm SiGe Bicmos technology), presented here, illustrates how measurements results can be accurately predicted using a substrate parasitic extractor.

A Complete Phase-Locked Loop Power Consumption Model [p. 1108]
D. Duarte, N. Vijaykrishnan, and M. Irwin

A PLL power model that accurately estimates the power consumption during both lock and acquisition states is presented. The model is within 5% of circuit level simulation (SPICE) values. No significant power overhead (+/- 5% of the power consumed at the final frequency) is incurred during the acquisition process.

Statistical Timing Driven Partitioning for VLSI Circuits [p. 1109]
C. Ababei and K. Bazargan

In this poster we present statistical-timing driven partitioning for performance optimization. We show that by using the concept of node criticality we can enhance the Fiduccia-Mattheyses (FM) partitioning algorithm to achieve, on average, around 20% improvements in terms of timing, among partitions with the same cut size. By incorporating mechanisms for timing optimization at the partitioning level, we facilitate wire-planning at high levels of the design process.

DAISY-CT: A High-Level Simulation Tool for Continuous-Time DeltaSigma Modulators [p. 1110]
K. Francken, M. Vogels, E. Martens, and G. Gielen

To reduce the long circuit-level simulation time of 16 modulators, a variety of techniques and tools exist that use high-level models for discrete-time (DT) 16 modulators. There is, however, no rigorous methodology implemented in a tool for the continuous-time (CT) counterpart. Therefore, we have developed a methodology for the high-level simulation of CT 16 modulators and implemented this method in a user-friendly tool. Key features are the simulation speed, accuracy and extensibility. Non idealities such as finite gain, finite GBW, output impedance and also the important effect of jitter are modelled. Finally, experiments were carried out using the tool, exploring important design trade-offs.

Automated Optimal Design of Switched-Capacitor Filters [p. 1111]
A. Hassibi and M. Hershenson

We present a method for automated design of CMOS switched-capacitor filters (SCFs) from user-defined top-level specifications to component sizes and physical layout. In other words, we present a complete top-down design ow for SCFs. The method is based on careful analysis and modeling of the SCF using analog circuit design and system engineering expertise, formulating design constraints in a special convex form, and numerical optimization (geometric programming).

On-Chip Inductance Models: 3D or Not 3D? [p. 1112]
T. Lin, M. Beattie, and L. Pileggi

Full 3D lumped partial inductance models usually contain a tremendous amount of forward coupling terms. To reduce the complexity of simulation and analysis, a simplified model that excludes the forward coupling terms is often adopted in practice [3][4]. This paper addresses the question whether ignoring forward couplings is always an acceptable choice or if full 3D models are necessary in certain cases. We show that the significance of the forward coupling inductance depends on various aspects of the design.

Simple and Efficient Approach for Shunt Admittance Parameters Calculations of VLSI On-Chip Interconnects on Semiconducting Substrate [p. 1113]
H. Ymeri, B. Nauwelaers, K. Maex, D. De Roest, M. Stucchi, and S. Vandenbergheo

The purpose of this paper is a slight modification of a recently proposed series expansion method [1, 2], developed for the electrical modeling of lossy-coupled multilayer interconnection lines, that does not involve iterations and yields solutions of sufficient accuracy for most practical interconnections as used in common VLSI chips. We use here a Fourier series restricted to cosine functions. The solution for the layered medium is found by matching the potential expressions in the different homogeneous layers with the help of boundary conditions. In the plane of conductors, the boundary conditions are satisfied only at a finite, discrete set of points (point matching procedure).

Compact Macromodel for Lossy Coupled Transmission Lines [p. 1114]
R. Khazaka and M. Nakhla

This paper describes a systematic algorithm for obtaining passive time domain reduced order transmission line macromodels. The proposed algorithm makes use of a new order reduction technique that removes the redundant poles obtained using conventional order reduction methods. The reduced macromodel is passive by construction.

An EMC-Compliant Design Method of High-Density Integrated Circuits [p. 1115]
J. Levant and M. Ramdani

This paper deals with an innovative method of EMC-compliant design. This technique helps to optimize emission level as soon as in the design phase, and provides noise-related solutions which will be evaluated and integrated into the silicon. This method allows to model the activity of thousand-gate circuits thanks to only two current generators which represent supply current consumption in the VDD and the VSS rails. This allows EMC evaluation and optimization (conducted noise) for a packaged integrated circuit within its electrical environment.

Finding a Common Fault Response for Diagnosis during Silicon Debug [p. 1116]
I. Pomeranz, J. Rajski, and S. Reddy

When a design is manufactured for the first time, it may suffer from timing-related errors that result from inaccuracies in the timing analysis tool used during the design process. Such errors will appear as delay faults in all (or many) of the manufactured chips. In addition, variations that occur during the manufacturing process may cause delay defects that vary across chips. It necessary to diagnose and correct failures of the first type (in the presence of failures of the second type) before the chip can be manufactured again. This may have to be repeated until design errors are eliminated.

IDDT Testing of Embedded CMOS SRAMs [p. 1117]
S. Kumar, R. Makki, and D. Binkley

This paper presents an iDDT test method for embedded CMOS SRAMs. A total of 192 faults were inserted and simulated using parameters from a 0.35 um process. The SRAM model includes realistic effects such as wire bonding inductance and resistance parameters as well as bypass capacitance. A sensor is introduced and incorporated into the SRAM cell array to detect abnormal iDDT switching. Figure 1 shows a 1-bit SRAM organized into 64 128 x 128 cell blocks with an iDDT sensor monitoring each cell block. The SRAM model includes the following parameters:
On-chip wire bonding inductance of 2 nH
On-chip wire bond resistance of .01 Ohms
On-chip bypass capacitance of 1 pF
Bitline capacitance of 3 pF
Power line capacitance of 40 pF
The results of the fault simulations comparing voltage, IDDQ and iDDT test methods are given in Table 1.

Fault Detection and Diagnosis Using Wavelet Based Transient Current Analysis [p. 1118]
S. Bhunia and K. Roy

We present a novel integrated method for fault detection and localization using wavelet transform of transient current (IDD) waveform. The time-frequency resolution property of wavelet helps us detect as well as localize faults in digital CMOS circuits. Experiments performed on an 8-bit ALU show promising results for both detection and localization.

An Efficient Test and Diagnosis Scheme for the Feedback Type of Analog Circuits with Minimal Added Circuits [p. 1119]
J. Lin, C. Lee, and J. Chen

This paper presents a test and diagnosis scheme for feedback type of linear analog circuits with minimal added circuits. For testing, the scheme transforms the circuit-under-test (CUT) into an oscillation circuit by (1) increasing the loop gain of the circuit, and/or (2) reconfiguring the circuit through selectively powering-off operational amplifiers (OP) of the circuit. This eliminates the need of added global paths as in the conventional oscillation test scheme. For diagnosis, the scheme transforms the circuit into a Schmitt trigger type of circuit with a positive feedback. The output of the circuit under an applied triangular input gives signatures which are used to identify faults. Benchmark circuits have been applied with this scheme and results show that it is very effective for testing and diagnosing the feedback type of linear analog circuit.

On the Use of an Oscillation-Based Test Methodology for CMOS Micro-Electro-Mechanical Systems [p. 1120]
V. Beroulle, Y. Bertrand, L. Latorre, and P. Nouet

This paper introduces the use of the oscillation test technique for MEMS testing. This well-known test technique is here adapted to MEMS. Its efficiency is evaluated based on a case study: A CMOS electromechanical magnetometer.

Directed-Binary Search in Logic BIST Diagnostics [p. 1121]
R. Kapur, T. Williams, M. Mercer

Logic BIST is about to become a more main stream test method for IC testing. In some flows when a failure is encountered the IC is diagnosed to determine the cause of the failure. Diagnosing fails in Logic BIST is significantly different from that in a stored pattern test methodology. The first step is to determine the failing pattern or interval among the many patterns that were applied. Today this involves a binary search of the tests that were applied with Logic BIST. In this paper we improve on this binary search strategy to reduce the time taken to isolate the failing patterns by orders of magnitude.

An Evolutionary Approach to the Design of On-Chip Pseudorandom Test Pattern Generators [p. 1122]
M. Favalli and M. Dalpasso

Weighted pseudorandom test generation (WPRTG) uses test sequences characterized by non-uniform distributions of test vectors in order to increase the detection probability of random resistant faults. Such non-uniform distributions are characterized by the values of signal probability of the CUT inputs (weights). Since different faults may require different distributions, a (small) number of distributions is typically used [1]. The weights of such distributions are identified by analyzing the CUT. The corresponding pseudorandom sequences are typically obtained by inserting a combinational network between the TPG and the CUT.
Several different methodologies have been proposed in order to calculate the weights. Some approaches make use of deterministic test sequences [2]. Another class of heuristics, instead, makes use of numerical optimization strategies to determine the set(s) of weights [1]. More recently, genetic algorithms have been identified to provide a good solution to weights selection [3]. All such methods evaluate only the first order coefficients of the distribution(s) and may suffer from a few problems. In particular, the detection of some random resistant faults may strongly depend on signal correlations. Even if the effects of signal correlations can be reduced, some problems are still in order. Consider, for instance, a fault that can be detected by a test vector and its complement. Any WPRTG method using signal probability evaluation would provide (when targeting such a fault) the same coefficients of a uniform distribution.

Fault Isolation Using Tests for Non-Isolated Blocks [p. 1123]
I. Pomeranz and Y. Zorian

Design methodologies for large designs produce circuits that consist of interconnections of functional blocks. If the blocks are large, as in core-based designs, they may be isolated for testing purposes (e.g., by test wrappers) such that different blocks can be tested independently. However, even if a test wrapper exists, it is advantageous to test functional paths that go through two or more blocks by using test vectors that propagate fault effects through several blocks. This contributes to testing of defects that cannot be detected if each block is tested separately. One of the issues that arises when several blocks are tested by the same test is that of fault isolation. If a test that propagates fault effects through blocks C1 and C2 produces a faulty response on the outputs of C2, the goal of fault isolation is to identify which one of C1 and C2 is faulty. Fault isolation is perfect if every faulty response on the outputs of the circuit can be uniquely attributed to a single block. This happens when every pair of faults belonging to different blocks is distinguishable. If faults of different blocks remain indistinguished, fault isolation is not possible when responses equal to the responses produced by these faults are produced by the circuit-under-test. It may appear that tests for several non-isolated blocks will not be able to isolate faults. In this work, we study this issue and demonstrate that perfect or close-to-perfect fault isolation is possible with tests that propagate fault effects through several blocks.

A Heuristic for Test Scheduling at System Level [p. 1124]
M. Flottes, J. Pouget, and B. Rouzeyre

This paper considers the test-scheduling problem of a SoC. The proposed approach is based on a "sessionless" test scheme. It minimizes the system test time while respecting a power dissipation limit and test resource sharing constraints. Experimental results show that our approach outperforms other related test scheduling solutions.

Formulation of SOC Test Scheduling as a Network Transportation Problem [p. 1125]
S. Koranne and V. Choudhary

Reusability of tests is crucial for reducing total design time. This raises the problem of test knowledge transfer, physical test application and test scheduling. We present a formulation of the embedded core-based system-on-chip (SOC) test scheduling problem (ECTSP) as a network transportation problem. The problem is NP-hard and we present a O(mn(m+2n)) 2-approximation algorithm using the result of the single source unsplittable flow problem. We describe the single source unsplittable flow problem (UFP) as given in [1]; let G = (V,E) be a capacitated directed graph with edge capacities c : E -> R+, a source s and k commodities with terminals ti and demands di is a member of R+, 1 <= i >= k. A vertex may contain a number of terminals. For each i, we would like to route di units of commodity i along a single path from s to the corresponding terminal so that the total flow through an edge e is at most its capacity c(e).

A Novel Methodology for the Concurrent Test of Partial and Dynamically Reconfigurable SRAM-Based FPGAs [p. 1126]
M. Gericota, G. Alves, M. Silva, and J. Ferreira

This poster presents the first truly non-intrusive structural concurrent test approach, aimed to test partial and dynamically reconfigurable SRAM-based FPGAs without disturbing its operation. This is accomplished by using a new methodology to carry out the replication of active Configurable Logic Blocks (CLBs), i.e. CLBs that are part of an implemented function that is actually being used by the system, releasing it to be tested in a way that is completely transparent to the system.

Efficient On-Line Testing Method for a Floating-Point Iterative Array Divider [p. 1127]
A. Drozd, M. Lobachev, and J. Drozd

This work is a part of researches directed to checking methods development for approximate calculations executed in floating-point circuits in a mantissa part. A problem of the truncated non-restoring division residue checking is solved. It provides an efficient implementation of truncated division reduced almost twice hardware amount and time in iterative array divider.

An Instruction-Level Methodology for Power Estimation and Optimization of Embedded VLIW Cores [p. 1128]
A. Bona, M. Sami, D. Sciuto, V. Zaccaria, C. Silvano, and R. Zafalon

The overall goal of this work is to define an instruction-level power macro-modeling and characterization methodology for VLIW embedded processor cores. The approach presented in this paper is a major extension of the work previously proposed in [1-3], targeting an instruction-level energy model to evaluate the energy consumption associated with a program execution on a pipelined VLIW core. Our first goal is the reduction of the complexity of the processor's energy model, without reducing the accuracy of the results. The second goal is to show how the energy model can be further simplified by introducing a methodology to automatically cluster the whole Instruction Set with respect to their average energy cost, in order to converge to an highly effective design of experiments for the actual characterization task. The paper describes also the application of the proposed model to a real industrial VLIW core (the Lx Architecture developed by HP Labs and STMicroelectronics), to validate the effectiveness and accuracy of the proposed methodology.

The Fraunhofer Knowledge Network (FKN) for Training in Critical Design Disciplines [p. 1129]
A. Sauer, G. Elst, L. Krahn, and W. John

For the application of new technologies with ever shorter lifecycles, the availability of the most recent knowledge is mandatory. The intervals within which acquired knowledge bases therefore have to be updated, become shorter and shorter. It is well known that software development tools and systems are getting more and more sophisticated, and the learning expenditure for the personnel is growing accordingly. This tendency affects major parts of the electrical and electronics industry where demand for qualified workforce already manifests itself in the `designer crisis'. The combined effects of the increased functionality of new tool generations, the change of application areas of relevant methods due to technological progress and the improvement of the information exchange facilities lead to increased requirements with respect to further professional training. The microelectronic industry and related business sectors are extremely innovative and knowledge based. Students, engineers, scientists and others need to develop, transfer and share knowledge. The above mentioned knowledge processes and knowledge flow from researchers and universities to industry and vice versa need to be strengthened to ensure a leading edge position for European companies and institutes in this market.

Comparative Analysis and Application of Data Repository Infrastructure for Collaboration-Enabled Distributed Design Environments [p. 1130]
L. Indrusiak, M. Glesner, and R. Reis

A collaborative design system depends heavily on the chosen collaboration methodology, as well as on its technological infrastructure. This paper presents three data repository technologies and discusses their pros and cons on the role of supporting a collaborative design system.

FlexBench: Reuse of Verification IP to Increase Productivity [p. 1131]
S. Stöhr, M. Simmons, and J. Geishauser

This paper presents FlexBench, which is a complete framework for SoC verification at the Module and SoC level, both with and without embedded processors. The focus is to increase the productivity of the verification engineer by providing a framework to reuse verification IP, which includes parts of the testbench and the test stimulus.

Mappability Estimation of Architecture and Algorithm [p. 1132]
J. Soininen, J. Kreku, and Y. Qu

Method for the selection of processor core and algorithm combinations for system on chip designs is presented. The method uses a mappability concept that is an addition to performance and cost metrics used in codesign. The mappability estimation is based on the analysis of the correlations of algorithm and core characteristics. The method is demonstrated with an analysis tool and the experimental results with DSP cores and algorithms are similar to expectations.

Behavioural Modelling of Operational Amplifier Faults Using VHDL-AMS [p. 1133]
P. Wilson, J. Ross, M. Zwolinski, A. Brown, and Y. Kiliç

The use of behavioural modelling for operational amplifiers has been well known for many years and previous work has included modelling of specific fault conditions using a macro-model. In this paper, the models are implemented in a more abstract form using an Analogue Hardware Description Language (AHDL), VHDL-AMS, taking advantage of the ability to control the behaviour of the model using high-level fault condition states. The implementation method allows a range of fault conditions to be integrated without switching to a completely new model. The various transistor faults are categorised, and used to characterise the behaviour of the HDL models. Simulations compare the accuracy and speed of the transistor and behavioural level models under a set of representative fault conditions.

A Parallel LCC Simulation System [p. 1134]
K. Hering

Cycle-based simulation at RT- and gate level realized by a Levelized Compiled Code (LCC) technique represents a well established method for functional verification in processor design. We present a parallel LCC simulation system developed to run on loosely-coupled processor systems allowing significant simulation acceleration. It comprises three parallel simulators and a complex model partitioning environment. A key idea of our approach is to valuate circuit model partitions with respect to the expected parallel simulation run-time and to integrate corresponding cost functions into partitioning algorithms. Experimental results are given with respect to IBM processor models of different size.

Error Simulation Based on the SystemC Design Description Language [p. 1135]
F. Bruschi, M. Chiamenti, F. Ferrandi, and D. Sciuto

The combined effects of devices increased complexity and reduced design cycle time creates a testing problem: an increasing larger portion of the design time is devoted to testing and verification. Today EDA tools, moving towards higher levels of abstraction, promise greater designer productivity, resulting in increased design complexity and size. In order to reduce the testing and verification time, different high-level approaches have been proposed in literature [2]. Most of these approaches are based on the definition of an error or fault model, applicable at a higher level of abstraction of the description of the system to be implemented. In this paper we concentrate our attention on the evaluation of error models, used in test generation and in functional verification. Evaluation of error models is also an important aspect when fault injection methodologies are used to evaluate the dependability of complex system. The ideas proposed by this work try to solve this evaluation and analysis problem starting from the following requirements:
• the error simulation task should be based only on the original hardware description language primitives;
• the flow from the given specification to the fault simulation should be as automatic as possible;

Towards a Kernel Language for Heterogeneous Computing [p. 1136]
D. Björklund and J. Lilius

What is characteristic of modern embedded systems like mobile phones, multimedia terminals, etc. is that their design requires several different description techniques: The radio-frequency part of a mobile phone is designed using analog techniques, the signal processing part can be described using synchronous data-flow, while the protocol stack uses an extended finite state machine based description model. This heterogeneity poses a challenge to embedded system design methodologies, and has resulted in a search for a System Level Design Language (SLDL) for describing both software and hardware.
We believe that to obtain a good SLDL one needs to first understand what the combination of models of computation means. To this end we are developing a kernel language in which it is possible to use different models of computation. The main contributions of this work are: (1) a common set of concepts that form the basis of the kernel language, (2) a formally defined operational semantics, which also makes it possible to verify designs using e.g. model-checking, (3) the explicit use of atomicity and, (4) the introduction of the notion of execution policy.

Top-Down System Level Design Methodology Using SpecC, VCC and SystemC [p. 1137]
L. Cai, D. Gajski, P. Kritzinger, and M. Olivares

There appears to be an increasing trend towards the use of the C/C++ language as a basis for the next generation modeling tools and platform methodology to encompass design reuse. However, even with this convergence, industry is suffering the pain that there is no one tool or a complete tool flow methodology that can implement a top-down design methodology from C to silicon . In this paper we suggest a top-down methodology from C to silicon. In our methodology, we focus on methods to make the design flow smooth, efficient, and easy. The proposed methodology is a pure top-down methodology. We developed our design methodology by using SpecC [1], VCC[2], and SystemC[3]. We choose SpecC, VCC and SystemC because they are all C-related and each have strong support in at least one field of design. Our proposal for a methodology is based on our experiences of attempting to model the JPEG encoder with SpecC, SystemC and VCC, and one internal project, attempting to implement architecture exploration for MPEG encoding and decoding using VCC.

Automatic Topology-Based Identification of Instruction-Set Extensions for Embedded Processors [p. 1138]
L. Pozzi, M. Vuletic, and P. Ienne

The need for high performance in ASIC embedded processors, coupled with aggressive energy and area goals, is pushing researchers and designers toward processor specialisation for a given application-domain. In this paper, specialisation is addressed through introduction of Ad-hoc Functional Units--special arithmetic/logic units added to a traditional architecture to perform domain-specific complex operations.

Steady State Calculation of Oscillators Using Continuation Methods [p. 1139]
H. Brachtendorf, S. Lampe, R. Laur, R. Melville, and P. Feldmann

Shooting, finite difference or Harmonic Balance techniques in conjunction with Newton's method are widely employed for the numerical calculation of limit cycles of oscillators. The resulting set of nonlinear equations are normally solved by damped Newton's method. In some cases however, divergence occurs when the initial estimate of the solution is not close enough to the exact one. A two-dimensional homotopy method is presented in this paper which overcomes this problem. The resulting linear set of equations employing Newton's method is under-determined and is solved in a least squares sense for which a rigorous mathematical basis can be derived.