Moderator: I. Bolsens, IMEC, B
Speakers: Jerry Fiddler, Chairman and Co-founder of Wind River Systems, USA
Wim Roelandts, CEO Xilinx, USA
Media processors show special instruction sets for fast execution of signal processing algorithms on different media data types. They provide SIMD instructions, capable of executing one operation on multiple data in parallel within a single instruction cycle. Unfortunately, their use in compilers is so far very restricted and requires either assembly libraries or compiler intrinsics. This paper presents a novel code selection technique capable of exploiting SIMD instructions also when compiling plain C source code. It permits to take advantage of SIMD instructions for multimedia applications, while still using portable source code.
Memory intensive applications require considerable arithmetic for the computation and selection of the different memory access pointers. These memory address calculations often involve complex (non)linear arithmetic expressions which have to be calculated during program execution under tight timing constraints, thus becoming a crucial bottleneck in the overall system performance. This paper explores applicability and effectiveness of source-level optimisations (as opposed to instruction-level) for address computations in the context of multimedia. We propose and evaluate two processor-target independent source-level optimisation techniques, namely, global scope operation cost minimisation complemented with loop-invariant code hoisting, and non-linear operator strength reduction. The transformations attempt to achieve minimal code execution within loops and reduced operator strengths. The effectiveness of the transformations is demonstrated with two real-life multimedia application kernels by comparing the improvements in the number of execution cycles, before and after applying the systematic source-level optimisations, using state-of-the-art C compilers on several popular RISC platforms.
Embedded systems make a heavy use of software to perform Real-Time embedded control tasks. Embedded software is characterized by a relatively long lifetime and by tight cost, performance and safety constraints. Several super-optimization techniques for embedded softwares based on Multi-valued Decision Diagram (MDD) representations have been described in the literature, but they all share the same basic limitation. They are based on standard Ordered MDD (OMDD) packages, and hence require a fixed order of evaluation for the MDD variables on every execution path. Free MDDs (FMDDs) lift this limitation, and hence open up more optimization opportunities. Finding the optimal variable ordering for FMDDs is a very difficult problem. Hence in this paper we describe a heuristic procedure that performs well in practice, and is based on FMDD cost estimation applied to recursive cofactoring. Experimental results show that our new variable ordering method obtains often smaller embedded software than previous (sifting-based) methods.
Dynamic power management saves power by shutting down idle devices. Several management algorithms have been proposed and demonstrated effective in certain applications. We quantitatively compare the power saving and performance impact of these algorithms on hard disks of a desktop and a notebook computers. This paper has three contributions. First, we build a framework in Windows NT to implement power managers running realistic workloads and directly interacting with users. Second, we define performance degradation that reflects user perception. Finally, we compare power saving and performance of existing algorithms and analyze difference.
We present efficient power estimation techniques for HW/SW System-On-Chip (SOC) designs. Our techniques are based on concurrent and synchronized execution of multiple power estimators that analyze different parts of the SOC (we refer to this as co-estimation), driven by a system-level simulation master. We motivate the need for power co-estimation, and demonstrate that performing independent power estimation for the various system components can lead to significant errors in the power estimates, especially for control-intensive and reactive embedded systems. We observe that the computation time for performing power co-estimation is dominated by: (i) the requirement to analyze/simulate some parts of the system at lower levels of abstraction in order obtain accurate estimates of timing and switching activity information, and (ii) the need to communicate between and synchronize the various simulators. Thus, a naive implementation of power co-estimation may be too inefficient to be used in an iterative design exploration framework. To address this issue, we present several acceleration (speedup) techniques for power co-estimation. The acceleration techniques are energy caching, software power macro-modeling, and statistical sampling. Our speedup techniques reduce the workload of the power estimators for the individual SOC components, as well as their communication/synchronization overhead. Experimental results indicate that the use of the proposed acceleration techniques results in significant (8X to 87X) speedups in SOC power estimation time, with minimal impact on accuracy. We also show the utility of our co-estimation tool to explore system-level power tradeoffs for a TCP/IP Network Interface Card sub-system and an automotive controller.
In this paper, we introduce a discrete-time model for the complete power supply sub-system that closely approximates the behavior of its circuit-level (i.e., HSpice), continuous-time counterpart. The model is abstract and efficient enough to enable event-driven simulation of digital systems described at a very high level of abstraction and that include, among their components , also the power supply. Therefore, it can be successfully used for the purpose of battery life-time estimation during design optimization, as shown by the results we have collected on a meaningful case study. Experiments prove also that the accuracy of our model is very close to that provided by the corresponding Spice-level model.
In this paper, a new method for analog circuit sizing with respect to manufacturing and operating tolerances is presented. Two types of robustness objectives are presented, i.e. parameter distances for the nominal design and worst-case distances for the design centering. Moreover, the generalized boundary curve is presented as a method to determine a parameter correction within an iterative trust region algorithm. Results show that a significant reduction in computational costs is achieved using the presented robustness objectives and generalized boundary curve.
This paper introduces a new hierarchical analysis methodology which incorporates approximation strategies during the analysis process. Consequently, the circuit sizes that can be analyzed increase dramatically, without suffering from the combinatorial explosion of expression complexity. Moreover, the interpretability and usability in practical applications is enabled by providing analytical models that keep complexity at a minimum with the prescribed accuracy.
This paper presents a methodology towards synthesis of high performance analog circuits. Layout parasitics are estimated and compensated during circuit sizing. Physical layout constraints are thus taken into consideration early in the design. This approach shortens the overall design time by avoiding laborious sizing-layout iterations. The approach has been implemented using two knowledge-based tools dedicated to analog circuit sizing and layout generation. An example of a high performance OTA is presented at the end to illustrate the effectiveness of the approach.
Rapid prototyping followed by technology retargeting provides a fast and cost-effective approach to analog system synthesis. Field-programmable analog arrays (FPAAs) enable rapid implementation of a function-compliant prototype, while technology retargeting converts the functional FPAA prototype to an ASIC. We first address the FPAA technology mapping problem. A novel structural approach based on hierarchical pattern matching and covering is employed to map the analog behavior onto the FPAA. We then address issues of technology retargeting and design reuse, and present our FPAA-ASIC retargeting strategy. We present experiments and a design example for FPAA technology mapping and retargeting.
This paper gives an overview of the different tools, needed
for accomplishing optimal IC manufacturability and rapid
technology learning during the successive phases of
process maturity. The paper then describes two specific
DfM tools that are in use within Philips Semiconductors.
Keywords: DfM, yield improvement, yield prediction, wire
spreading.
In this paper, we give an overview of the trade-off to
improve yield and optimize silicon manufacturing cost.
The specific technology focus is on large embedded
memories in complex ASIC or system-on-chip designs.
Embedded capabilities for test, redundancy analysis and
repair are shown as design-for-manufacturability features
needed for large embedded memories in VDSM design.
Keywords: Yield improvement, DFM, BIST, silicon repair
With the introduction of 0.18 micron CMOS process technology a new phenomenon in circuit manufacturing can be observed: design-rule values as specified in the design-rule manual are no longer "hard" numbers. Where designers and EDA tool manufacturers used to consider rule-values as strict limits when creating mask layouts, rule values have turned into gray areas around the specified rule values. This concept is illustrated in the following figure.
This paper discusses the use of C++ for the design of digital systems. The paper distinguishes a number of different approaches towards the use of programming languages for digital system design and will discuss in more detail how C++ can be used for system modeling and refinement, for simulation, and for architecture design.
Today's system-on-a-chip designs consist of many cores. To
enable cores to be easily integrated into different systems,
many propose creating cores with their internal logic
separated from their bus wrapper. This separation may
introduce extra read latency. Pre-fetching register data into
register copies in the bus wrapper can reduce or eliminate this
extra latency. In this paper, we introduce a technique for
automatically designing a pre-fetch unit that satisfies user-imposed
register-access constraints. The technique benefits
from mapping the pre-fetching problem to the well-known real-time
process scheduling problem. We then extend the technique
to allow user-specified register interdependencies, using a
Petri Net model, resulting in even more efficient pre-fetch
schedules.
Keywords: Cores, system-on-a-chip, interfacing, on-chip bus, intellectual
property, design reuse, bus wrapper.
In embedded data-dominated applications a global system-level data transfer and storage exploration phase is crucial in obtaining an efficient solution. We have developed a novel formalism to describe reusable blocks such that the essential part of the design exploration freedom is retained. This formalism is the basis for a system-level reuse methodology which allows to reuse large parts of the design as structural VHDL and describes the costly data access related constructs at higher levels in the code hierarchy. Compared to a reuse approach based on fixed blocks, considerable power and area savings can be obtained, as demonstrated on real-life video and modem applications.
Fault simulation and testability analysis are major concerns in design flows employing intellectual-property (IP) protected virtual components. In this paper we propose a paradigm for the fault simulation of IP-based designs that enables testability analysis without requiring IP disclosure, implemented within the JavaCAD framework for distributed design [1, 2]. As a proof of concept, stuck-at fault simulation has been performed for combinational circuits containing virtual components.
In [1], Murata et al introduced an elegant representation of block placement called sequence pair. All block placement algorithms which are based on sequence pairs use simulated annealing where the generation and evaluation of a large number of sequence pairs is required. Therefore, a fast algorithm is needed to evaluate each generated sequence pair, i.e. to translate the sequence pair to its corresponding block placement. This paper presents a new approach to evaluate a sequence pair based on computing longest common subsequence in pair of weighted sequences. We present a very simple and efficient O(n2) algorithm to solve the sequence pair evaluation problem. We also show that using a more sophisticated data structure, the algorithm can be implemented to run in O(n log n) time. Both implementations of our algorithm are significantly faster than the previous O(n2 graph-based algorithm in [1]. For example, we achieve 60X speedup over the previous algorithm when input size n = 128.
This paper describes a new multi-level partitioning algorithm (PART) that combines a blend of iterative improvement and clustering, biasing of node gains, and local uphill climbs. PART is competitive with recent state-of-the-art partitioning algorithms. PART was able to find new lower cuts for a number of benchmark circuits.
We consider the problem of placing a set of cells in a single row with a given horizontal ordering, minimizing the (weighted) bounding box netlength. we analyze the running time of an algorithm of Kahng, Tucker and Zelikovsky which solve this problem optimally. By using different data structures we are able to improve the worst-case running time in the unweighted case as well as in the presence of netweights.
This paper presents a new compaction algorithm to improve the yield of IC layout. The yield is improved by reducing the area where the faults are more likely to happen known as critical area. Instead of assuming that the critical area could probably be present everywhere in the layout, the algorithm first finds where this area can actually exist, and then attempts to minimize it. The algorithm takes benefit from a fast multi-layer critical area computation to extract the rectangles that compose it. Afterwards, the extracted rectangles are involved into the layer minimization process which is the second phase of the compaction procedure to minimize their area. A new formulation of the layer minimization problem is used in such a way that the critical area minimization adds neither extra variables nor extra constraints to the original compaction algorithm. The algorithm has been tested on actual layouts.
Higher levels of integration, the need for test re-use, and the mixed-signal nature of today's SOC's necessitate hierarchical test generation and system level test composition to meet stringent market requirements. In this paper, a novel methodology for testing analog and digital components in a signal path is discussed. Consequent testability analysis can be utilized to reduce DFT requirements, while test translation provides highly effective low cost test. The proposed approach seamlessly propagates test information across the analog/digital divide. Experimental results substantiate the effectiveness of the proposed mixed-signal test synthesis methodology.
In this paper, an analysis of test time by CBET (which is an acronym for Combination of BIST and External Test) test approach is presented. The analysis validates that CBET test approach can achieve shorter testing time than both external test and BIST in many situations. An efficient test time minimization algorithm for CBET-based LSIs is also proposed. It uses several characteristics of CBET test approach derived by the analysis to reduce computation time to find the optimum test sets. The algorithm helps designers to save their precious design time.
This paper describes CAS-BUS, a P1500 compatible Test Access Mechanism for Systems on a Chip. The TAM architecture is made up of a Core Access Switch (CAS) and a test bus. The TAM characteristics are its flexibility, scalability and reconfigurability. A CAS generator has been developed, and some results are provided in the paper.
This paper describes a new approach in the high level design and test of transport-triggered architectures (TTA), a special type of application specific instruction processors (ASIP). The proposed method introduces the test as an additional constraint, besides throughput and circuit area. The method, that calculates the testability of the system, helps the designer to assess the obtained architectures with respect to test, area and throughput in the early phase of the design and selects the most suitable one. In order to create the templated TTA, the "MOVE" framework has been addressed. The approach is validated with respect to the "Crypt" Unix application.
The composite signal flow model of computation targets systems with significant control and data processing parts. It builds on the data flow and synchronous data flow models and extends them to include three signal types: non-periodic signals, sampled signals, and vectorized sampled signals. Vectorized sampled signals are used to represent vectors and computations on vectors. Several conversion processes are introduced to facilitate synchronization and communication with these signals. We discuss the severe implications, that these processes have on the causal behaviour of the system. We illustrate the model and its usefulness with three applications. A co-modelling and co-simulation environment combining Matlab and SDL; a high level timing analysis as a consequence of the operations on vectors; conditions for a parallel, distributed simulation.
We integrate data and control flow at the system specification level, using the two specialized and well established languages Matlab and SDL. For this we provide a modeling technique, which integrates the timing concepts and allows synchronization of vector-based computation with event based state transition. The technique is supported by a library of wrappers and communication functions, which has been implemented to make cosimulation easy to use and almost transparent to the user. A methodology formulates the rules to use the modeling technique, to partition the system, and to select communication modes. A complex industrial example illustrates the modeling technique and the methodology, and shows the efficiency of the Matlab-SDL cosimulation.
Delay-insensitive interfacing was first demonstrated on the macromodules project in the 1960's, but globally synchronous (clocked) schemes have so far dominated the VLSI era. In deep sub-micron technologies, problems of clock skew, including excessive size and power consumption of clock buffers, and heterogeneity of systems on a chip are rekindling an interest in global asynchrony. DI-Algebra is presented here as a language for the specification of modules with delay-insensitive interfaces. Such modules can be implemented either in synchronous or in asynchronous logic. A design flow is also illustrated in which specifications are automatically translated into Petri nets, validated, and synthesised into asynchronous logic.
Very low bit error rate has become an important constraint in high performance communication systems that operate at very low signal to noise ratios: due to their impressive coding gains, turbo codes have been proposed for several applications, although they suffer a large decoding delay. This paper presents the design of a turbo decoder with high performances in terms of throughput implemented using TSPC (True Single Phase Clocking) logic family. In order to achieve the best compromise between cost (in terms of area) and throughput, several architectural solutions have been analyzed. The whole system and in particular its core, the SISO module, has been verified through VHDL simulations. HSPICE simulations show that the system can operate with a 1 GHz clock and thus it can reach a throughput of 50 Mbit/s.
This paper presents a single chip implementation of a space-time algorithm for co-channel interference (CCI) and intersymbol interference (ISI) reduction in GSM/DCS systems. The temporal channel for the Viterbi receiver and the beamformer weights for the CCI rejection are estimated jointly by optimizing a suitable cost function for separable space-time channels. By taking into account nowadays integration capabilities provided by FPGA (Field Programmable Gate Array), it is demonstrated the feasibility of a single chip JSTE solution based on three processor architecture for carrier beamforming, equalization and demodulation.
The paper describes the concept and implementation of a telecom emulator that features both reconfigurability and high-speed processing. The emulator can be easily transmuted into any telecom system as a real node. It has two innovative system design concepts. The first is to divide the specification into simplified processes based on the open system interconnection (OSI) reference model. The second is the use of a sophisticated hardmacro and its software-callable driver. We implemented a prototype system called ATTRACTOR and applied it to some telecom applications. The applications were able to be implemented in a short design time and were operated in real computer network environments.
Novel methodology and algorithms to seamlessly integrate logic synthesis and physical placement through a transformational approach are presented. Contrary to most placement algorithms that minimize a global cost function based on an abstract representation of the design, we decomposed the placement function into a set of transforms and coupled them directly with incremental timing, noise, and/or power analyzers. This coupling results in a direct and more accurate feedback on optimizations for placement actions. These placement transforms are then integrated with traditional logic synthesis transforms leading to a converging set of optimizations based on the concurrent manipulation of boolean, electrical, as well as physical data. Experimental results indicate that the proposed approach creates an efficient converging design flow that eliminates placement and synthesis iteration. It results in timing improvements, and maintains other global placement measures such as wire congestion and wire length. The flexibility of the transformational approach allows us to easily add, extend and support more sophisticated algorithms that involve critical as well as non-critical regions and target a variety of metrics including noise, yield and manufacturability.
Traditional FPGA design flows have treated logic synthesis and physical design as separate steps. With the recent advances in technology, the lack of information on the physical implementation during logic synthesis has caused mismatches between the final circuit characteristics (delay, power and area) and those predicted by logic synthesis. In this paper, we present a technique that tightly links the logic and physical domains -- we combine logic and placement optimization in a single step. The combined algorithm is based on simulated annealing and hence, very amenable to new optimization goals or constraints. Two types of moves, directed towards global reduction in the cost function (linear congestion), are accepted by the simulated annealing algorithm: (1) logic optimization steps consisting of removing or replacing redundant wires in a circuit using functional flexibilities derived from SPFDs [12] and (2) the placement optimization steps consisting of swapping a pair of blocks in the FPGA. Feedback from placement is very valuable in making an informed choice of a target wire during logic optimization moves. Experimental results demonstrate the efficacy of our approach over the placement independent approach.
In this paper a constructive library-aware multilevel logic synthesis approach using symmetries is described. It integrates the technology-independent and technology-dependent stages of synthesis, and is premised on the goal of relating the functional structure of a logic specification closer to the ultimate topological and physical structures. We show that symmetries interpreted as structural attributes of functions can be effectively used to induce a favorable structural implementation. These symmetries are used in bridging 1) the structural properties of the functions being synthesized, 2) the structural attributes of the implementation network, and 3) the functional content of the target library. Experimental results show that the quality of circuits synthesized using this approach is generally superior to those synthesized by traditional approaches, and that the improvement correlates with the symmetry measure in a function.
In this paper, we present a BIST scheme for testing on-chip AD and DA converters. We discuss on-chip generation of linear ramps as test stimuli, and propose techniques for measuring the DNL and INL of the converters. We validate the scheme with software simulation -- 5% LSB (least significant bit) test accuracy can be achieved in the presence of reasonable analog imperfection.
In this paper, a new built-in self-test structure to test the static specifications of analog to digital converters (ADCs) is presented. A ramp signal generated by an integrator serves as a test input signal. A specific range of this signal is divided into 2n+1 segments, with each segment corresponding to one output combination of an n+1-bit counter, where n is the number of bits of the ADCs under test. The testing process is done with digital data processing by comparing the outputs of ADCs under test with the outputs of the n+1- bit counter. Simple structure, low area overhead, and high speed are the advantages of the proposed test structure.
The objective of this paper is to discuss the possibility of reusing the existing hardware originally present in an analog application to implement test functions for a completely autonomous self-testable solution. In this first approach, a 8 th analog linear filter is used as an application example. The required modifications in the circuit are presented with the results in terms of area overhead and fault coverage.
Binary Decision Diagrams have been widely used to solve the Boolean Satisfiability (SAT) problem. The individual constraints can be represented using BDDs and the conjunction of all constraints provides all satisfying solutions. However, BDD-related SAT techniques suffer from size explosion problems. This paper presents two BDD-based algorithms to solve the SAT problem that attempt to contain the growth of BDD-size while identifying solutions quickly. The first algorithm, called BSAT, is a recursive, backtracking algorithm that uses an exhaustive search to find a SAT solution. The well known unate recursive paradigm is exploited to solve the SAT problem. The second algorithm, called INCOMPLETE-SEARCH-USAT (abbreviated IS-USAT), incorporates an incomplete search to find a solution. The search is incomplete inasmuch as it is restricted to only those regions that have a high likelihood of containing the solution, discarding the rest. Using our techniques we were able to find SAT solutions not only for all MCNC & ISCAS benchmarks, but also for a variety of industry standard designs.
Previous researchers have suggested the use of "light-houses" to act as guides in directed state space search. The drawback of using lighthouses is that the user has to manually derive them, through a potentially laborious examination of the design. Additionally, specifying a large number of lighthouses results in wasted effort during the search. We present approaches to automatically generate high-quality lighthouses for hard-to-cover targets.
Temporal logic model checking is a technique for the automatic verification of systems against specifications. Besides the correctness of safety and liveness properties it is often important to determine critical answer and delay times of systems, especially if they are embedded in a real-time environment. In this paper we present an approach which allows the verification as well as the timing analysis of real-time systems. The systems are described as networks of communicating time-extended finite state machines (I/O-interval structures). We use a compact symbolic representation to obtain efficient analysis algorithms.
This paper presents an architectural study of a scalable system-level interconnection template. We explain why the shared bus, which is today's dominant template, will not meet the performance requirements of tomorrow's systems. We present an alternative interconnection in the form of switching networks. This technology originates in parallel computing, but is also well suited for heterogeneous communication between embedded processors and addresses many of the deep submicron integration issues. We discuss the necessity and the ways to provide high-level services on top of the bare network packet protocol, such as dataflow and address-space communication services. Eventually we present our first results on the cost/performance assessment of an integrated switching network.
With the ongoing advancements in VLSI technology, the performance of an embedded system is determined to a large extend by the communication of data and instructions. This results in new methods for on- and off-chip communication and caching schemes. In this paper, we use an arbitration scheme that exploits the characteristics of continuous "media" streams while minimizing the latency for random (e.g. CPU) memory accesses to background memory. We also introduce a novel caching scheme for a stream-based multiprocessor architecture, to limit as much as possible the amount of on-chip buffering required to guarantee the throughput of the continuous streams. With these two schemes we can build an architecture for media processing with optimal flexibility at run-time while performance guarantees can be determined at compile-time.
The design process for an engine management system is presented. The functional specification of the system has been captured using C and C++ as specification languages. The validation of the specification has been carried out using functional simulation. Then an architecture for the implementation of the functional specification is selected among a set of three possible alternatives, all based on the same micro-controller, characterized by different hardware-software trade-offs. The choice is motivated by a fast performance estimation that can also be used to identify the parts of the design that could be moved across the hardware-software partition to obtain better cost or better performance. The case study has been performed in the Felix VCC framework.
In this paper we address the problem of designing very high throughput finite state machines (FSMs). The presence of loops in sequential circuits prevents a straightforward and generalized application of pipelining techniques, which work so well for combinational circuits, to increase FSM performance. We observe that appropriate extensions of the "wave steering" technique [17,18] are possible to partially overcome the problem. Additionally we use FSM decomposition theory to decouple state variable dependencies. Application of these two techniques to MCNC benchmarks resulted in a factor of 3 average throughput increase as compared to a standard cell implementation, at the expense of factor 3.7 area and less than factor 2 latency penalties.
This paper presents a new delay minimization and technology mapping algorithm for two-level structures (TLS) implemented using clock-delayed (CD) domino logic. We take advantage of CD domino's high-speed, large fan-in NOR and OR gates to increase the speed of circuit by partial collapsing. The algorithm is delay-driven and the delays are obtained from a characterized CD domino library. The results on eight combinational MCNC benchmark circuits show an average speed improvement of 89% for CD domino with TLS, compared to static CMOS implementations generated by Synopsys. CD domino with TLS using our tools produced on average 44% faster circuits than CD domino benchmarks minimized and mapped using Synopsys. At last, the delay results for CD domino with TLS were on average 22% better than for standard domino.
This paper is about gate sizing under a statistical delay model. It shows we can solve the gate sizing problem exactly for a given statistical delay model. The formulation used allows many different forms of objective functions, which could for example directly optimize the delay uncertainty at the circuit outputs. We formulate the gate sizing problem as a nonlinear programming problem, and show that if we do this carefully, we can solve these problems exactly for circuits up to a few thousand gates using the publicly available large scale nonlinear programming solver LANCELOT.
Functional BIST is a promising solution for self-testing complex digital systems at reduced costs in terms of area and performance degradation. The present paper addresses the computation of optimal seeds for an arbitrary sequential module to be used as hardware test pattern generator. Up to now, only linear feedback shift registers and accumulator based structures have been used for deterministic test pattern generation by reseeding. In this paper, a method is proposed which can be applied to general finite state machines. Nevertheless the method is absolutely general, for sake of comparison with previous approaches, in this paper an accumulator based unit is assumed as pattern generator module. Experiments prove the effectiveness of the approach which outperforms previous results for accumulators, in terms of test size and test time, without sacrificing the fault detection capability.
We describe a method for on-chip generation of weighted test sequences for synchronous sequential circuits. For combinational circuits, three weights, 0, 0.5 and 1, are sufficient to achieve coverage of stuck-at faults, since these weights are sufficient to reproduce any specific test pattern. For sequential circuits, the weights we use are defined based on subsequences of deterministic test sequence. Such weights allow us to reproduce parts of the test sequence, and help ensure that complete fault coverage would be obtained by the weighted test sequences generated.
The increasing use of large embedded memories in Systems-on-Chips requires automatic memory reconfiguration to avoid the need for external accessibility. In this work, effective diagnostic memory tests of linear order O(N) are proposed that enable memory reconfiguration, and their diagnostic capabilities are analyzed. In particular, these tests allow single-cell faults to be distinguished from multiple-cell faults, such as coupling faults. In contrast to conventional O(N) tests, all cells involved in a fault are detected and localized, which allows complete reconfiguration using minimal-area BIST hardware that compares favorably with other BIST designs.
One of the greatest challenges in C/C++-based design methodology is to efficiently map C/C++ models into hardware. Many of the networking and multimedia applications implemented in hardware or mixed hardware/software systems are making use of complex data structures stored in one or multiple memories. As result, many of the C/C++ features which were originally designed for software applications are now making their way into hardware. Such features include dynamic memory allocation and pointers used to manage data. We present a solution for efficiently mapping arbitrary C code with pointers and malloc/free into hardware. Our solution fits current memory management methodologies. It consists of instantiating a hardware allocator tailored to an application and a memory architecture. Our work also supports the resolution of pointers without restriction on the data structures. An implementation using the SUIF framework is presented, followed by some case studies such as the realization of a video filter.
Partially reconfigurable processors provide the unique ability by which a part of the device can be reconfigured, while the remaining part is still operational. In this paper, we present a novel partitioning methodology that temporally partitions a design for such a partially reconfigurable processor and improves design latency by minimizing reconfiguration overhead. This is achieved by overlapping execution of one temporal partition with the reconfiguration of another, using the processors partial reconfiguration capability. We have incorporated block-processing in the partitioning framework for reducing reconfiguration overhead of partitioned designs. A highlight of our partitioner is it's ability to handle loops and conditional constructs in the input specification. The proposed methodology was tested on several examples on the Xilinx 6200 FPGA. The results show significant reduction in the design latency, leading to a considerable speed-up due to partial reconfiguration.
This paper presents a new approach on combined high-level synthesis and partitioning for FPGA-based multi-chip emulation systems. The goal is to synthesize a prototype with maximal performance under the given area and interconnection constraints of the target architecture. Interconnection resources are handled similarly to functional resources, enabling the scheduling and the sharing of interchip connections according to their delay. Moreover, data transfer serialization is performed completely or partially, depending on the mobility of the data transfers, in order to satisfy the given interconnection constraints. In contrast to conventional partitioning approaches, the constraints of the target architecture are fulfilled by construction.
We present a technique for fast estimation of the power
consumed by the cache and bus sub-system of a parameterized
system-on-a-chip design for a given application. The technique
uses a two-step approach of first collecting intermediate data
about an application using simulation, and then using
equations to rapidly predict the performance and power
consumption for each of thousands of possible configurations
of system parameters, such as cache size and associativity and
bus size and encoding. The estimations display good absolute
as well as relative accuracy for various examples, and are
obtained in dramatically less time than other techniques,
making possible the future use of powerful search heuristics.
Keywords: System-on-a-chip, low power, estimation, intellectual property,
cache, on-chip bus.
Clock and data recovery circuits are essential components in communication systems. They directly influence the bit-error-rate performance of communication links. It is desirable to predict the rate of occasional detection errors and the loss of synchronization due to the non-ideal operation of such circuits. In high-speed data networks, the bit-error-rate specification on the system can be very stringent, i.e., 10 -14. It is not feasible to predict such error rates straightforward, simulation based, approaches. This work introduces a stochastic model and an efficient, analysis-based, non-Monte-method for performance evaluation of digital data and clock recovery circuits. The analyzed circuit is modeled as finite state machines with inputs described as functions on a Markov state-space. System performance measures, such as probability of bit errors and rate of synchronization loss, can be evaluated the analysis of a larger resulting Markov system. A multi-grid method is used to solve the very large associated systems. The method is illustrated on a real industrial recovery circuit design.
A new method for computation of timing jitter in a PLL is proposed. The computational method is based on the representation of the circuit as a linear time-varying system with modulated stationary noise models, spectral decomposition of stochastic process and decomposition of noise into orthogonal components i. e. phase and amplitude noise. The method is illustrated by examples of jitter computation in PLLs.
The design of analog front-ends of digital telecommunication transceivers requires simulations at the architectural level. The nonlinear nature of the analog front-end blocks is a complication for their modeling at the architectural level, especially when the nonlinear behavior is frequency dependent. This paper describes a method to derive a bottom-up model of nonlinear analog continuous-time circuits used in communication systems. The models take into account frequency dependence of the nonlinear behavior, making them suitable for wideband applications. Such model consists of a block diagram that corresponds to the most important contributions to the second- and third-order Volterra kernels of the output quantity (voltage or current) of a circuit. The examples in the paper, a high-level model of a CMOS low-noise amplifier and an active lowpass filter, demonstrate that the generated models can be efficiently evaluated in high-level dateflow-type simulations of mixed-signal front-ends and that they yield insight in the nonlinear behavior of the analog front-end blocks.
Covering problems are widely used as a modeling tool in Electronic Design Automation (EDA). Recent years have seen dramatic improvements in algorithms for the Unate/Binate Covering Problem (UCP/BCP). Despite these improvements, BCP is a well-known computationally hard problem, with many existing real-world instances that currently are hard or even impossible to solve. In this paper we apply search pruning techniques from the Boolean Satisfiability (SAT) domain to BCP. Furthermore, we generalize these techniques, in particular the ability to backtrack non-chronologically, to exploit the actual formulation of covering problems. Experimental results, obtained on representative instances of the Unate and Binate Covering Problems, indicate that the proposed techniques provide significant performance gains for different classes of instances.
The classical solving approach for two-level logic minimisation reduces the problem to a special case of unate covering and attacks the latter with a (possibly limited) branch-and-bound algorithm. We adopt this approach, but we propose a constructive heuristic algorithm that combines the use of Binary Decision Diagrams with the lagrangian relaxation. This technique permits to achieve an effective choice of the elements to include into the solution, as well as cost-related reductions of the problem and a good lower bound on the optimum. The results support the effectiveness of this approach: on a wide set of benchmark problems, the algorithm nearly always hits the optimum, and in most cases proves it to be such. On the problems whose optimum is actually unknown, the best known result is strongly improved.
Pass Transistor Logic has attracted more and more interest during last years, since it has proved to be an attractive alternative to static CMOS designs with respect to area, performance and power consumption. Existing automatic PTL synthesis tools use a direct mapping of (decomposed) BDDs to pass transistors. Thereby, structural properties of BDDs like the ordering restriction and the fact that the select signals of the multiplexers (corresponding to BDD nodes) directly depend on input variables, are imposed on PTL circuits although they are not necessary for PTL synthesis. General Multiplexer Circuits can be used instead and should provide a much higher potential for optimization compared to a pure BDD approach. Nevertheless -- to the best of our knowledge -- an optimization of general Multiplexer Circuits (MCs) for PTL synthesis was not tried so far due to a lack of suitable optimization approaches. In this paper we present such an algorithm which is based on efficient BDD optimization techniques. Our experiments prove that there is indeed a high optimization potential by the use of general MCs -- both concerning area and depth of the resulting PTL networks.
The Boolean satisfiability problem (SAT) has various applications in electronic design automation (EDA) fields such as testing, timing analysis and logic verification. SAT has been typically applied to EDA as follows: 1) formulation of the given problem as a SAT instance 2) solution of the SAT instance. In this paper, we present a method to simultaneously solve several closely related SAT instances using incremental satisfiability (ISAT). In ISAT, the decision sequence made for a "prefix" function is used to solve another set of functions which have a number of new constraints (extensions) added to the prefix function. Our experiments show that we can achieve significant gains in total runtime when we use this methodology as opposed to resetting the decision sequences and solving each instance from scratch. Application of ISAT to delay fault testing is presented by formulating incremental path sensitization as an ISAT problem. Non-robust tests for the combinational portion of ISCAS 89 circuits are generated using this method.
In current microprocessors and systems, an increasingly high silicon portion is derived through automatic synthesis, with designers working exclusively at the RT-level, and design productivity is greatly enhanced. However, in the new design flow, validation still remains a challenge: while new technologies based on formal verification are only marginally accepted, standard techniques based on simulation are beginning to fall behind the increased circuit complexity. This paper proposes a new approach to simulation-based validation, in which a Genetic Algorithm helps the designer in generating useful input sequences to be included in the test bench. The technique has been applied to an industrial circuit, showing that the quality of the validation process is increased.
This paper describes an efficient error simulator able to analyze functional VHDL descriptions. The proposed simulation environment can be based on commercial VHDL simulators. All components of the simulation environment are automatically built starting from the VHDL specification of the description under test. The effectiveness of the simulator has been measured by using a random functional test generator. Functional test patterns produce, on some benchmarks, a higher gate-level fault coverage than the fault coverage achieved by a very efficient gate-level test pattern generator. Moreover, functional test generation requires a fraction of the time necessary to generate test at the gate level. This is due to the possibility of effectively exploring the test patterns space since error simulation is directly performed at the VHDL level.
We study the effectiveness of functional tests for full scan circuits. Functional tests are important for design validation, and they potentially have a high defect coverage independent of the circuit implementation. The functional fault model we consider consists of single state-transition faults. The test generation procedure we describe uses one of two approaches at any given time in order to minimize the number of tests while minimizing the test application time. (1) It may use scan to set the state of the circuit, and observe fault effects propagated to the next-state variables. (2) It may use transfer sequences to set the circuit state, or unique input-output sequences to propagate fault effects to the primary outputs. We present experimental results to demonstrate the effectiveness of scan-based functional tests.
There has been a proliferation of block-diagram environments for specifying and prototyping DSP systems. These include tools from academia like Ptolemy [3], and GRAPE [7], and commercial tools like SPW from Cadence Design Systems, Cossap from Synopsys, and the HP ADS tool from HP. The block diagram languages used in these environments are usually based on dataflow semantics because various subsets of dataflow have proven to be good matches for expressing and modeling signal processing systems. In particular, synchronous dataflow (SDF)[8] has been found to be a particularly good match for expressing multirate signal processing systems. One of the key problems that arises during synthesis from an SDF specification is scheduling. Past work on scheduling [1] from SDF has focused on optimization of program memory and buffer memory. However, in [1], no attempt was made for overlaying or sharing buffers. In this paper, we formally tackle the problem of generating optimally compact schedules for SDF graphs, that also attempt to minimize buffering memory under the assumption that buffers will be shared. This will result in schedules whose data memory usage is drastically lower (up to 83%) than methods in the past have achieved.
This paper describes how optimization techniques can be applied to efficiently solve the constrained co-design problem. This is performed by the formulation of different cost functions which will drive the hardware-software partitioning process. The use of complex cost functions allows us to capture more aspects of the design. Besides, the appropriate formulation of this kind of functions has a great impact on the results that can be obtained regarding both quality and algorithm convergence rate. A strong point of the proposed formulation is its generality. Therefore, it does not depend on the problem and can be easily extended for considering new design constraints.
Recently a number of heuristic based system-level synthesis algorithms have been proposed. Though these algorithms quickly generate good solutions, how close these solutions are to optimal is a question that is difficult to answer. While current exact techniques produce optimal results, they fail to produce them in reasonable time. This paper presents a synthesis algorithm that produces solutions of guaranteed quality (optimal in most cases or within a known bound) with practical synthesis times (few seconds to minutes). It takes a unified look (the lack of which is one of the main sources of sub-optimality in the heuristic techniques) at different aspects of system synthesis such as pipelining, selection, allocation, scheduling and FPGA reconfiguration. Our technique can handle both time constrained as well as resource constrained synthesis problems. We present results of our algorithm implemented as part of the Match project [1] at Northwestern University.
Current processor architectures, both in the programmable and custom case, become more and more dominated by the data access bottlenecks in the cache, system bus and main memory subsystems. In order to provide sufficiently high data throughput in the emerging era of highly parallel processors where many arithmetic resources can work concurrently, novel solutions for the memory access and data transfer will have to be introduced. The crucial question we want to address in this hot topic session is where one can expect these novel solutions to rely on: will they be mainly innovative processor architecture ideas, or novel approaches in the application compiler/ synthesis technology, or a mix.
We address the problem of inserting repeaters, selected from a library, at feasible locations in a placed and routed network to meet user-specified delay constraints. We use minimal repeater area by taking advantage of slacks available in the network. Specifically, we transform the problem into an unconstrained optimization problem and solve it by iterative local refinement. We show that the optimal repeater locations and sizes that locally minimize the objective function in the unconstrained problem can be efficiently computed. We have implemented our algorithm and tested it on a set of benchmarks; experimental results are promising.
As the CMOS technology scaled down, the horizontal coupling capacitance between adjacent wires plays dominant part in wire load, crosstalk interference becomes a serious problem for VLSI design. We focused on delay increase caused by crosstalk. On-chip bus delay is maximized by crosstalk effect when adjacent wires simultaneously switch for opposite signal transition directions. This paper proposes a bus delay reduction technique by intentional skewing signal transition timing of adjacent wires. An approximated equation of bus delay shows our delay reduction technique is effective for repeater-inserted bus. The result of SPICE simulation shows that the total bus delay reduction by from 5% to 20% can be achieved.
We present the single layer router CDR (Current Driven Router) capable of routing analog multiterminal signal nets with current driven wire widths. The widths used during routing are determined by current properties per terminal gained by simulation or manually specified by circuit designers. The algorithm presented computes a Steiner tree layout satisfying all specified current constraints while obeying the maximum allowed current densities on all connections. CDR calculates the Steiner tree topology, computes the unknown currents of wires connecting two Steiner points and generates the final Steiner tree layout in a single step thus eliminating the need for a separate layout post-processing step common to power and ground routing algorithms. CDR uses a connection graph for layout representation and applies an advanced minimum detour algorithm in combination with a modified "three-point steinerization" heuristic for Steiner tree based layout construction.
Capacitance coupling can have a significant impact on gate delay in today's deep submicron circuits. In this paper we present a static timing analysis tool that calculates the longest path of synchronous circuits taking the impact of crosstalk on gate delays into account. We show that passive modeling of the coupling capacitance can significantly underestimate the delay and that an assumption of permanent worst-case coupling unnecessarily overestimates it. Our method is validated by comparison to Spice simulations.
Delay defects on I/O pads, interconnections of a board, or interconnections among embedded cores can not be tested with the current IEEE 1149.1 boundary scan design. This paper introduces a simple design technique which slightly modifies the TAP controller to test delay defects by postponing the UpdateDR with EXTEST instruction. Furthermore 2log(N+2) interconnect test patterns are proposed for both static and delay testing.
IEEE 1149.4 infrastructure has been aimed primarily for printed circuit board (PCB) interconnect test, parametric test of discrete components and functional test of IC cores. Methods to perform these test have been published and experimental results using evaluation samples of IEEE 1149.4 ICs have been reported. So far, most attention has been paid to test and measurement techniques for the first two issues. Proposed methods typically employ IEEE 1149.4 infrastructure in the function of a built-in test probe that enables external test and measurement equipment to access the internal PCB points via the analog test bus. This paper describes an alternative approach based on functional transformation of the tested board by means of the existing IEEE 1149.4 resources. In this way, efficient go no-go functional test can be performed. Case studies are given to illustrate the proposed approach.
An objective of DSP testing should be to ensure that any errors due to missed faults are infrequent compared to a circuit's intrinsic errors, such as overflow. A method is proposed for quantifying test quality for digital filters by measuring the risk associated with any untested faults. Techniques for finding upper bounds on fault activation rates under worst-case operating conditions are described. These techniques enable test designers to objectively discriminate significant missed faults from near-redundant faults, which are unlikely to be activated in normal operation of the device. This complements fault coverage as a measure of test quality, providing a means of locating high-risk missed faults even in very high coverage test regimes.
Efficient built-in and external test strategies are becoming essential in MicroElectroMechanical Systems (MEMS), especially for high reliability and safety critical applications. To be realistic however, internal and external test must be properly validated in terms of fault coverage. Fault simulation is hence likely to become a critical utility within the design flow. This paper will discuss methods for achieving test support based on the extension of tools and techniques currently being introduced into the mixed signal ASIC market.
We present abstraction techniques for systems containing counters, which allow to significantly reduce their state spaces for their efficient verification. In contrast to previous approaches, our abstraction technique lifts the entire verification problem, i.e., also the specification, to the abstract level. As an application, we consider the reduction of real-time systems by replacing discrete clocks of timed automata with abstract counters. The presented method allows the reduction of such systems to very small state spaces. As bench-mark examples, we consider the generalized railroad crossing and Fischer's mutual exclusion protocol.
Recently, a methodology for worst-case analysis of discrete systems has been proposed [1, 2]. The methodology relies on a user-provided abstraction of system components. In this paper we propose a procedure to automatically generate such abstractions for system components with Boolean transition functions. We use a binary decision diagram (BDD) of the transition function to generate a formula in Presburger arithmetic representing the desired abstraction. Our experiments indicate that the approach can be applied to control-dominated embedded systems.
A paradigm for automatic approximation/refinement in conservative CTL model checking is presented. The approximations are used to verify a given formula conservatively by computing upper and lower bounds to the set of satisfying states at each sub-formula. These approximations attempt to perform conservative verification with the least possible number of BDD variables and BDD nodes. We present new forms of operational graphs to avoid limitations associated with previously used operational graphs. Three new techniques for efficient automatic refinement of approximate system are presented. These methods make it easier to find the locality. We also present a new type of don't cares (Approximate Satisfying Don't Cares) that can make model checking more efficient in time and space. On average, an order of magnitude speedup was achieved.
`In this paper, we consider continuous wire-sizing optimization for delay minimization and ringing control. The optimization is based on a fast and accurate delay estimation method under a finite ramp input, where an analytical expression is also derived to estimate overshoot/under shoot voltage. In this paper, we specify the wire shape to be of the form f(x) = alpha e-bx, since previous studies under the Elmore delay model suggest that exponential wire shape is effective for delay minimization. The relevant transmission line equations are solved by using Picard-Carson method. The transient response in the time domain is derived as a function of alpha and b. The coefficients alpha and b are then determined such that either the actual delay (50% delay) is minimized, or the wiring area is minimized subject to a delay bound. At the same time, the over-shoot/ undershoot voltage is bounded to prevent false switching. Our method for delay estimation is very efficient. In all the experiments we performed, it is far more accurate than the Elmore delay model and the estimated delay values are very close to SPICE's results. We also find that in determining the optimal shape which minimizes delay, the Elmore delay model performs as good as the our method in terms of the minimum actual delay it achieves, i.e. the Elmore delay model has high fidelity. However, in determining the optimal shape which minimizes area subject to a delay bound, the Elmore delay model performs much worse than our method. We also find that the constraint for overshoot/undershoot control does affect optimization results for both delay and area minimization objectives.
A novel method which can be regarded as the noise-counterpart of the celebrated Elmore's delay formula -- both being based on the first two moments of the network's transfer function -- efficiently and accurately predicts maximum noise between two capacitively coupled RC networks, without simulation. The method applies to general topologies (with significant simplification for coupled trees), accurately models how coupling varies with driver transition time, and quantifies the uncertainty in the calculated noise values. Efficient enough for large circuits, the new method can serve as a key ingredient in CAD methodologies to ensure that a layout is noise-problem free.
In this paper, we present an efficient yet accurate inductance extraction methodology and also apply it to clocktree RLC extraction. We first show that without loss of accuracy, the inductance extraction problem of n traces with or without ground planes can be reduced to a number of one-trace and two-trace subproblems. We then solve one-trace and two-trace subproblems via a table-based approach. We finally validate the linear cascading assumption that enables us to apply our inductance extraction approach to clocktree RLC extraction and optimization.
This paper proposes an all digital on-chip bus delay and crosstalk measurement methodology. A diagnosis procedure is derived to distinguish the delay faults in drivers, receivers, and wires. The crosstalk profile is plotted by monitoring the changes in delay with the presence of the crosstalk. The distinguished features include all digital design and low hardware overhead. The SPICE simulation results prove the feasibility of the methodology.
This paper proposes a methodology for designing sampled- data Mixed-Signal circuits by using VHDL-based behavioural descriptions. The goal is using a VHDL description of both the analog and the digital part, to simulate and verify the entire mixed-signal system, as well as to facilitate the synthesis and fault simulation of the digital part. As an example of the proposed methodology, a digitally corrected/calibrated pipeline A/D converter (ADC) has been designed. Among other aspects of general interest, we will show how analog dynamic effects are incorporated in order to obtain accurate high level simulations. Results from simulations carried out using QuickHDL in Mentor-Graphics prove the feasibility of the approach and are in agreement with those obtained experimentally from a Silicon prototype.
Passive components integrated into a high-density substrate can be a tolerable way to overcome the size and manufacturing limits of SMD passives mounted onto the system board. Still, this technology is perceived as being "too risky" and not cost effective. In this paper we propose a "passives optimized" solution combining the advantages from both SMD and integrated technology and avoiding the respective drawbacks. Exemplified by a GPS receiver front end, we present a methodology to assess the possible benefits when using the mixed technology.
This paper presents the development of some front-end analog circuits for mixed signals systems. The paper proposes the use of externally linear, internally non-linear analog circuits. Using this approach, analog area is greatly reduced, and circuits can be built on top of completely digital technologies. Experimental results in the analog and digital domain support the proposed approach to mixed circuits design.
This paper examines several techniques for static timing analysis. In detail, the first part of the paper analyzes the connection of prediction accuracy (worst case execution time) and applicability of a methodology for modeling and analysis of instruction as well as data cache behavior. The second part of the paper proposes a timing analysis technique for super-scalar processors. The objects of our studies are two processors of the PowerPC family, in particular the PPC403 and the MPC750.
In a multi-FPGA synthesis system, ideally the designer has only an abstract view of the board architecture. This abstract modeling of the underlying reconfigurable computer poses complex challenges to the synthesis and partitioning tools. Since the design specification is not constrained by the number of memory segments on the board or the number of pins between FPGAs, it is difficult for the CAD tools to transform the design into one that maps onto the multi-FPGA board. This paper describes an arbitration mechanism that bridges the abstraction between the input design and the reconfigurable architecture. Since this mechanism allows such architecture abstraction between the design and the board, it becomes easier to port a design from one target architecture to another. This arbitration mechanism introduces very little overhead in terms of area and delay. It has been used in data-dominated applications; in this paper, Fast Fourier Transform (FFT) is shown as an illustrative example.
We present an approach to bus access optimization and schedulability analysis for the synthesis of hard real-time distributed embedded systems. The communication model is based on a time-triggered protocol. We have developed an analysis for the communication delays proposing four different message scheduling policies over a time-triggered communication channel. Optimization strategies for the bus access scheme are developed, and the four approaches to message scheduling are compared using extensive experiments.
We address the issue of standards development for the system-level design space. System-level design IP re-use standards are key to the future of the VSIA. However, the concept of system-level standards has its share of sceptics: what role can standards play in this developing market segment? In response we present an overview of three standards in the system-level VC integration space, and describe two distinct industrial case studies to support their practicality.
The widespread adoption of embedded microprocessor-based systems for safety critical applications mandates the use of co-design tools able to evaluate system dependability at every steps of the design cycle. In this paper, we describe how Fault Injection techniques have been integrated in an existing co-design tool and which advantages come from the availability of such an enhanced tool. The effectiveness of the proposed tool is assessed on a simple case study.
IC technologies are approaching the ultimate limits of silicon in terms of channel width, power supply and speed. By approaching these limits, circuits are becoming increasingly sensitive to noise, which will result on unacceptable rates of soft-errors. Furthermore, defect behavior is becoming increasingly complex resulting on increasing number of timing faults that can escape detection by fabrication testing. Thus, fault tolerant techniques will become necessary even for commodity applications. This work considers the implementation and improvements of a new soft error and timing error detecting technique based on time redundancy. Arithmetic circuits were used as test vehicle to validate the approach. Simulations and performance evaluations of the proposed detection technique were made using time and logic simulators. The obtained results show that detection of such temporal faults can be achieved by means of meaningful hardware and performance cost.
We present an integrable solution for detection of defective sensor elements using sigma-delta-(SD)-modulation and a matched filter. The sensor element is stimulated using a pseudo random binary sequence (PRBS). The sensor signal is read out and the analog output is digitized using a SD-modulator. The binary pulse density stream of the SD-modulator is the output of the sensor system and thus should ideally contains the PRBS. A matched filter has the task of detecting the pseudo random sequence in the pulse density stream and its sampled output is compared to a threshold thus making it possible to judge the functionality of the sensor element. By evaluating the magnitude of the matched filter output it is also possible to measure the sensor sensitivity. We present a discrete solution of this method, but an integrated chip using a standard 1.2mm CMOS-process has been designed and is being fabricated.
The problem of power management for an embedded system is to reduce system level power dissipation by shutting off parts of the system when they are not being used and turning them back on when they are required. Algorithms for this problem are online in nature where the algorithm must operate without access to the complete data set of its characteristics. In this paper, we present online algorithms to manage power for embedded systems and provide experimental analysis to back up the theoretical results. Specifically, this paper makes four contributions. We propose and optimal online algorithm for power management. We present an analysis of algorithmic efficiency using a technique called competitive analysis which is particularly suitable for online algorithms. Using the analysis technique, we develop a lower bound for the non-adaptive version of the power management problem and show that our algorithms that try to shut down the system based on historical data. We provide a lower bound for any algorithm that uses adaptive methods to manage power. We also propose an algorithm that is independent of the input data distribution, practical and usable in both hardware and software systems with guaranteed performance. Finally, we compare these algorithms with previously proposed heuristics both theoretically and experimentally. For the experiments, we model the disk drive of a laptop computer as an embedded system. The results show that the proposed algorithms perform well in practice with guaranteed bounds on their performance. Further, this paper conclusively demonstrates that to implement aggressive power management techniques for power critical subsystems, designers will have to commit greater resources such as dedicated registers and ALU units.
A split-bus architecture is proposed to improve the power dissipation for global data exchange among a set of modules. The resulting bus splitting problem is formulated and solved combinatorially. Experimental results show that the power saving of the split-bus architecture compared to the monolithic-bus architecture varies from 16% to 50%, depending on the characteristics of the data transfer among the modules and the configuration of the split bus. The proposed split-bus architecture can be extended to multi-way split-bus when a large number of modules are to be connected.
In this paper, a power reduction technique which merges frequently executed sequences of object codes into a set of single instructions is proposed. The merged sequence of object codes is restored by an instruction decompressor before decoding the object codes. The decompressor is implemented by a ROM. In many programs, only a few sequences of object codes are frequently executed. Therefore, merging these frequently executed sequences into a single instructions leads to a significant energy reduction. Our experiments with actual read only memory(ROM) modules and some benchmark program demonstrate significant energy reductions up to more than 65% at best case over a instruction memory without the object code merging.
Designs which do not fully utilize their arithmetic datapath components typically exhibit a significant overhead in power consumption. Whenever a module performs an operation whose result is not used in the downstream circuit, power is being consumed for an otherwise redundant computation. Operand isolation [3] is a technique to minimize the power overhead incurred by redundant operations by selectively blocking the propagation of switching activity through the circuit. This paper discusses how redundant operations can be identified concurrently to normal circuit operation, and presents a model to estimate the power savings that can be obtained by isolation of selected modules at the register-transfer (RT) level. Based on this model, an algorithm is presented to iteratively isolate modules while minimizing the cost incurred by RTL operand isolation. Experimental results with power reductions of up to 30% demonstrate the effectiveness of the approach.
Modern deep submicron CMOS processes cost $2B or more to develop, qualify and deploy. Yet the incremental impact of each technology generation has been steadily decreasing due to a variety of phenomena such as increasing wire delay, power dissipation and reliability limits, and increasing process tolerances. This increase is portrayed in Figure 1 which shows the SIA Roadmap[1] predictions of variability for five technologies in the 250 to 70nm gate length regime. These observations lead to the conclusion that we need to make better use of existing and future manufacturing processes in order to recoup our investment.
Accounting for the clustering effect is fundamental to
increase the accuracy of Defect Level (DL) modeling. This
result has long been known in yield modeling but, as far
as known, only one DL model directly accounts for it. In this paper, we
improve this model, reducing its number of parameters from three to two
by noticing that multiple faults caused by a single defect can also be
modeled as additional clustering. Our result is supported by test
data from a real production line.
Keywords:
defect clustering, defect level, fault clustering, fault coverage,
reject ratio.
This work presents a new IDDQ-based test criterion supported by the characteristics of a set of experimental testing measurements realized over different samples of industrial ICs and by the definition of the corresponding simulation model. Comparing the current consumptions of a specific circuit a significant correlation between measurements can be observed. The current behaviour can be divided into two parts: (1) a circuit dependent one, which has a major contribution, and affects equally all the devices in a given die, and (2) a smaller die dependent fraction due to variations, defective and non-defective, of each of the devices of a specific die. In this paper, a current model is defined, introducing the effects of manufacturing variations in the basic equations of the sub-threshold current to explain that double behaviour. The results show how it is possible to obtain a lot of information from IDDQ measurements and how other test selection criteria can be applied to increase the IDDQ testing sensitivity and quality.
Process variation has forever been the major fail cause of analog circuit where small deviations in component values cause large deviations in the measured output parameters. This paper presents a new approach for parametric fault simulation and test vector generation. The proposed approach utilizes the process information and the sensitivity of the circuit principal components in order to generate statistical models of the fault-free and the faulty circuit. The obtained information is then used as a measurement to quantify the testability of the circuit. This approach extended by hard fault testing has been implemented as automated tool set for IC testing called FaultMaxx and TestMaxx.
This paper presents a methodology for parallel and distributed simulation of VHDL using the PDES (parallel discrete-event simulation) paradigm. To achieve better features and performance, some PDES protocols assume that simultaneous events may be processed in arbitrary order. We describe a solution of how to apply these algorithms to have a correct simulation of the distributed VHDL cycle, including the delta cycle. The solution is based on tie-breaking the simultaneous events using Lamport's logical clocks to causally order them according to the VHDL simulation cycle, and defining the VHDL virtual time as a pair of simulation physical time and cycle/phase logical time. The paper also shows how to use this method with a PDES protocol that relaxes the simulation of simultaneous events to arbitrary order, allowing the LPs to self-adapt to optimistic or conservative mode, without the lookahead requirement. The lookahead is application-dependent and for some systems may be zero or unknown. The parallel simulation of VHDL designs ranging from 5531 to 14704 LPs using these methods obtained a promising, almost linear speedup.
To achieve fast verification of the software part of embedded system, we propose to run the target processor optimistically, which effectively reduces the synchronization overhead with other simulators. For the optimistic processor execution, we present a processor execution platform and state saving/restoration methods. We performed optimistic execution of ARM710A processor in the coverification of an IS-95 CDMA cellular phone system and obtained up to orders of magnitude higher performance compared with the case that the processor runs conservatively.
This paper presents a methodology to retarget the technique of compiled simulation for Digital Signal Processors (DSPs) using the modeling language LISA. In the past, the principle of compiled simulation as means for speeding up simulators has only been implemented for specific DSP architectures. The new approach presented here discusses methods of integrating compiled simulation techniques to retargetable simulation tools. The principle and the implementation are discussed in this paper and results for the TI TMS320C6201 DSP are presented.
This paper shows how to simulate a circuit as an interlocked collection of state machines. Separate state-machines are used to represent nets and gates. The technique permits intermixing of logic models, direct simulation of higher-level functions, and optimization techniques for fanout free circuits. These techniques are an extension of techniques that have been used to achieve high-performance event-driven simulations. New, more efficient state-machine implementations are presented, and experimental data is presented that show the efficiency of the new techniques.
Simulation is still one of the most important subtasks when designing a VLSI circuit. However, more and more elements on a chip increase simulation runtimes. Especially on transistor level with highly accurate element modelling, long simulation runtimes of typically several hours delay the design process. One possibility to reduce these runtimes is to divide the circuit into several partitions and to simulate the partitions in parallel. But the success of such a parallel simulation is heavily depending on the quality of the partitioning. This paper presents a new approach for partitioning VLSI circuits on transistor level and gives runtimes of parallel simulations of large industrial circuits. The resulting runtimes show considerable improvement compared to a known partitioning method, the Node Tearing method [10].
In this paper we describe a methodology and accompanied tool support for the development of parallel and distributed embedded real-time system software. The presented approach comprises the complete design flow from the modeling of a distributed controller system by means of a high-level graphical language down to the synthesis of executable code for a given target hardware, whereby the implementation is verified to meet hard real-time constraints. The methodology is mainly based upon the tools SEA (System Engineering and Animation) and CHaRy (The C-LAB Hard Real-Time System).
We present a novel method for developing reconfigurable systems targeted at embedded system applications. We show how an existing object oriented design method (MOOSE) has been adapted and enhanced to include reconfigurable hardware (FPGAs). Our work represents a significant advance over current embedded system design methods in that it integrates the use of reconfigurable hardware components with a systematic design method for complete systems. The objective is to produce an object oriented design methodology where system objects can be seamlessly implemented in either software or reconfigurable hardware.
This paper presents the system synthesis techniques available in S3E2S, a CAD environment for the specification, simulation, and synthesis of embedded electronic systems that can be modeled as a combination of analog parts, digital hardware, and software. S3E2S is based on a distributed, object-oriented system model, where objects are initially modeled by their abstract behavior and may be later refined into digital or analog hardware and software. System synthesis is targeted to a multiprocessor platform. Each processor, either a custom-designed one or an off-the-shelf component, can have a specialized behavior, like signal processing or control processing. The environment selects processors that best match the desired application by analyzing and comparing processor and application characteristics. The paper illustrates the architecture selection process with concrete examples.
Microcontrollers have been playing an important role in the embedded market. However, the designer of microcontroller based systems must deal with different languages and tools in the hardware and software development, despite of their distinct design process. This paper presents a new design strategy to implement embedded applications described uniquely in Java, while maintaining software compatibility throughout the design process. Moreover, the target hardware is a single chip FPGA, taking benefit from their low cost and easy reconfiguration to customize the microcontroller. This paper presents the environment and some results of system synthesis.
We present cost and benefit models and analyze the economics effects of built-in self-test (BIST) for logic and memory cores. In our cost and benefit models for BIST, we take into consideration the design verification time and test development time associated with testability. Experimental results for logic BIST and memory BIST examples show that a threshold volume exists when BIST is profitable for the logic core under consideration -- it is not recommended for a higher volume. However, BIST is a good choice for memory cores in general.
Power dissipated during test application is substantially higher than power dissipated during functional operation [22] which can decrease the reliability and lead to yield loss. This paper presents a new technique for power minimization during test application in full scan sequential circuits. The technique is based on classifying scan latches into compatible, incompatible and independent scan latches. Based on their classification scan latches are partitioned into multiple scan chains. A new test application strategy which applies an extra test vector to primary inputs while shifting out test responses for each scan chain, minimizes power dissipation by eliminating the spurious transitions which occur in the combinational part of the circuit. Unlike previous approaches [9] which are test vector and scan latch order dependent and hence are not able to handle large circuits due to the complexity of the design space, this paper shows that with low test area and test data overhead substantial savings in power dissipation during test application are achieved in very low computational time. For example, in the case of benchmark circuit s15850 it takes 3600s in computational time and 1% in test area and test data overhead to achieve 80% savings in power dissipation.
In systems consisting of interacting datapaths and controllers, the datapaths and controllers are traditionally tested separately by isolating each component from the environment of the system during test. This is not possible when the controller-datapath pair is an embedded system designed as a hard core. This work facilitates the testing of controller-datapath pairs in a truly integrated fashion. The key to the approach is a careful examination of the types of gate level stuck-at faults that can occur within the controller. A class of faults that are undetectable in an integrated test by traditional means is identified. These faults create faulty but functional circuits. The effect of these faults on power consumption is explored, and a method based on power analysis is given for detecting these faults. Analysis is given for three example systems.
This paper presents a method for redundancy identification (RID) using multi-node logic implications. The algorithm discovers a large number of direct and indirect implications by extending single node implications [7] to multiple nodes. The large number of implications found by multi-node implication method introduces a new redundancy identification technique. Our approach uses an effective node-pair selection method which is O(n) in the number of nodes to reduce execution time, and it can be used as an efficient preprocessing phase for test generation. Application of these multi-node static logic implications uncovered more redundancies in ISCAS85 combinational circuits than previous single-node methods without excessive computational effort.
Optimal power management policies for laptop hard disk are obtained with a system model that can handle non-exponential interarrival times in the idle and the sleep states. The measurement results on Sony Vaio laptop show that our policy has 1.7 times less power consumption as compared to the default Windows timeout policy with still high performance.
The problem of estimating lower bounds on the power consumption in scheduled data flow graphs with a fixed number of allocated resources prior to binding is addressed. The estimated bound takes into account the effects of resource sharing. It is shown that by introducing Lagrangian multipliers and relaxing the low power binding problem to the Assignment Problem, which can be solved in , a tight and fast computable bound is achievable. Experimental results show the good quality of the bound. In most cases, deviations smaller than 5% from the optimal binding were observed. The proposed technique can for example be applied in branch and bound high-level synthesis algorithms for efficient pruning of the design space.
A new, fully analytical method is presented to optimize active device area in complex, device mismatch sensitive analog circuits. It represents an efficient alternative to time consuming Monte-Carlo simulations and numerical iteration procedures for design centering.
This paper presents a user-friendly tool which allows automated sizing of IC cells. It comprises an open optimization-based sizing program, a database which allows knowledge re-use and also easy addition of new knowledge, and a powerful graphical user interface.
In this paper, a novel technique is presented for the verification of board level connections on PCBs. The time domain reflectometry (TDR) method is used to identify whether a pin connection is faulty or not. The test pulse - and evaluation circuitry is part of the chip. Although the chip size increases slightly, the method is highly efficient. No Automatic Test Equipment (ATE) is necessary to carry out the test and since only the physical behaviour of the connection from the internal driver via pin to board is examined, no test vectors are needed. The test time and the test preparation time are lower compared with conventional test methods.
For structural interconnect testing a graph is generated from the physical layout of the interconnects. The vertices are then colored. The number of colors determines the number of different serial test patterns needed. Based on real PCB layout data we give experimental results, that show how the choice of the graph generation method and of the coloring algorithm influence the number of colors.
With the rising complexity of electronic systems, containing more and more both hardware and software parts, it becomes necessary to simulate simultaneously hardware and software parts at whatever abstraction level. These simulation techniques, called co-simulation, require fast and flexible simulators. In this paper, we introduce the elaboration of a microcontroller simulator for an accurate hardware/ software co-simulation at the clock-cycle level. It is our goal to have a simulator which is fast enough to simulate a few minutes of real time execution within a reasonable laps of time. To be more precise, we deal here with the realization of a simulator for the ST10 microcontroller and its integration into a co-simulation environment.
The paper addresses the problem of speeding up functional cycle-based simulation of digital systems. The system is represented as a network of interconnected Decision Diagrams (DD). Three new innovative simulation algorithms are introduced to implement the idea of simulation execution according to activities of the system variables: forward event-driven algorithm and two versions of back-tracing algorithms. Experiments are presented to show the simulation efficiency improvement offered by those algorithms.
Reducing the area overhead required by BIST structures can be achieved by reconfiguring existing hardware to perform test related control and processing functions. This work shows how the resources required for these operations can be implemented in-circuit, taking advantage of programmable logic available in the system. Structural and functional tests are performed using correlation to obtain iDD and vOUT cross-correlation signatures, and to measure gain, phase, and total harmonic distortion.
Interest in propositional satisfiability (SAT) has been on the rise lately, spurred in part by the recent availability of powerful solvers that are sufficiently efficient and robust to deal with the large-scale SAT problems that typically arise in electronic design automation application. A frequent question that CAD tool developers and users typically ask is which of these various solvers is "best;" the quick answer is, of course, "it depends." In this paper we attempt to gain some insight into, rather than definitively answer, this question.
A memory architecture with four address configurations is proposed for video signal processing. The implemented 8-words X 64-bits 8-port SRAM has 256-bit simultaneous data accessibility by horizontal and vertical address configurations and has 25.6 Gbits/s of high bandwidth.
In [2] the concept of a very long instruction word (VLIW) processor based system to emulate synthesized RT-level descriptions has been presented. As described in [2] the RAVE System (RT-Architecture-VLIW-Emulator) overcomes many of the problems common to FPGA based emulation and prototyping systems. Particularly, these are area problems in conjunction with large data paths, long turnaround times and low emulation clock frequencies. This abstract briefly describes the hardware of the RAVE System.
Design Space Exploration (DSE) of programmable systems-on-chip (SOC) incorporating parameterizable processor cores is difficult due to the complex and intrinsically non-structured interactions between different architectural features of the processor (such as wide parallelism, and deep pipelines), the compiler and the application. Changing different processor features implies generating detailed operation conflict information -- represented as Reservation Tables (RTs). If done manually, it can be a very tedious and error prone task, especially for deep pipelines, with complex resource sharing and large non-structured instruction sets. In this paper we use RTGEN[2], an approach for automatic generation of RTs, to drive rapid architectural exploration of a large number of designs. We present exploration experiments on a large set of VLIW-like EPIC 1 architectures, for varying port sharing, number of functional units, multicycling units, and with varied latency configurations. Our experiments uncovered several non-intuitive architecture design points, giving the system-level designer further flexibility in exploration of programmable SOC architectures.
The most compelling reason for High-Level Synthesis (HLS) to be accepted in the state-of-the-art CAD flow is its ability to perform design space exploration. Design space exploration requires efficient scheduling techniques that have a low complexity and yet produce good quality schedules. The Time-Constrained Scheduling (TCS) problem minimizes the number of functional units required to schedule a particular Data Flow Graph (DFG) within a specified number of time steps. Over the past few years a number of techniques [1, 2] have been proposed to solve the TCS problem. Heuristic list scheduling algorithms have been widely used for their low-complexity and good performance. The complexity of a dynamic-list scheduling algorithm, such as the Force Directed Scheduling (FDS), is O(T * N2), where T is the time constraint and N is the number of operations. list scheduling [1, 2] algorithms are the least complex among the known class of scheduling techniques with a linear time complexity of O(T * N) . Typically, staticlist algorithms, in order to maintain low-complexity, do not perform any look-ahead like that of FDS. The draw-back that, static-list scheduling algorithms may not generate high-quality schedules.
High noise immunity and level-restoring capabilities of static CMOS gates, combined with small area and low power of PTL cells, make a mixed CMOS/PTL design style an ideal alternative to the all-CMOS technology. However, the synthesis of mixed CMOS/PTL circuits imposes a great challenge to the existing synthesis methodology. Neither traditional techniques based on algebraic factorization nor methods based on direct BDD mapping [1] [2] [3] are applicable to this new circuit style. We have recently proposed a new BDD-based logic optimization method for static CMOS [4]. It is based on iterative BDD decomposition using various dominators which correspond to decomposable BDD structures leading to AND, OR, XOR and MUX decompositions. Synthesis results show that the method is very efficient for both AND/OR- and XOR-intensive functions. Since PTL structures can be easily identified on a BDD, our method can be readily extended to perform logic decomposition leading to mixed CMOS/PTL logic implementation. In contrast to other PTL synthesis techniques, based on direct BDD mapping, our method is not limited to decomposition onto PTLs only; its logic decomposition and optimization is driven by the capabilities of both the static CMOS and PTL logic. Our BDD decomposition method can also account for various parameters associated with circuit performance, thus avoiding drawbacks of direct BDD mapping-based synthesis, such as large fanouts and long transistor chains. bulk of our BDD decomposition theory has been published in [4]. Table I summarizes the different types of BDD decompositions available; it can be seen that all types of atomic decompositions and their corresponding BDD structures can be easily identified.
In this paper we present an heuristic algorithm TOP (Three-level Optimization of PLDs), targeting a three-level logic expression of type g1 o g2, where g1 and g2 are sum-of-products and "o" is a binary operation. Such an expression can be implemented by a three-level Programmable Logic Device (PLD) consisting of PLA1 and PLA2, implementing the first two levels of logic, and a set of two-input logic expanders, implementing the third level. Each logic expander can be programmed to realize any function of two variables. PLD of this type seems to give a good trade-off between the speed of a flat PLA and density of a multi-level network of PLAs. TOP chooses the functionality of the logic expanders so that the area of the PLAs is minimized. To the best of our knowledge, this is the first work addressing this problem for an arbitrary operation "o" and attempting to choose the operation which results in the smallest total number of product-terms. Several algorithms for the specified cases of "o" have been presented in the past (see [2] for overview). An algorithm, constructing the expansion of type for an arbitrary "o" with Xg, Xh &propersubset; X and Xg &union; Xh = X and is described in [3]. However, this algorithm does not target the minimal number of products, and does not consider the case when or equals , which is allowed in our case.
Arithmetic coprocessors (AC) are quite complex circuits and testing them is an important and not easy problem (not covered in the literature). Analyzing diagnostic software for IBM PCs we found that the testing procedures for ACs are limited to simple basic checks. Hence we decided to develop efficient test procedures in a systematic way. They are executed on the main processor and generate appropriate stimuli to AC functional blocks (e.g. instruction sequencer, data path units) and verify test responses. An important contribution of this paper is the integration of various approaches to testing and increased test observability of test results assured by on-chip event monitors and system exceptions.
In this poster, we present a new specification technique for complex hardware-software systems, based on standard high-level programming languages, such as C, C++, Java, Scheme, or Ada, without extensions or semantic changes. Unlike previous approaches, the designer may choose the model of computation and the specification language that best suits her needs, while still being able to formally verify the correctness of the specification. The details of the available hardware and software resources, and the implementation of the different models of computation are encapsulated in libraries to maximize reuse in system specifications.
Conceptual design, the preliminary phase of design in which both well-defined problem specifications and high level design solutions are developed, is becoming increasingly important as design complexity increases. In spite of the importance of this activity, few tools exist to support this phase of design. In this paper we present a systematic and flexible model of conceptual design and describe how this model has been employed to realize a prototype conceptual design process management environment, called Clio II.
Users need to access design data for a variety of reasons. Designers may be interested in accessing repositories of IP blocks for possible inclusion in their own designs. Alternatively, EDA tool developers and purchasers need a representative set of designs to evaluate or benchmark software. This poster presents a web-based system used both for profiling designs and for searching for designs with specific characteristics. The STEED system summarised here is based on external information models that tailor it to user requirements.
A new Built-In Self Test (BIST) scheme is presented that can be used for both off-line production or periodic testing of delay faults as well as for concurrent detection of faults causing signal delays in the field. The scheme is based on the IDDT monitoring of the outputs of the circuit under test (CUT). The proposed scheme has minimal impact on the performance and silicon area of the design since the same response verifier circuit is used for both off-line and concurrent detection of errors in the field.
Power in processing cores (microprocessors, DSPs) is primarily consumed in the datapath part. Among the datapath functional modules, multipliers consume the largest amount of power due to their size and complexity. We propose a low power BIST scheme for datapaths built around accumulator pairs. The target is low average power dissipation between successive test vectors. This is achieved by taking advantage of the regularity of multiplier modules and achieving very high fault coverage by a sized test set with as small as possible input switching activity. The proposed BIST scheme is more efficient than pseudorandom BIST for the same high fault coverage target. Up to 77.25% power saving is achieved in the set of experimental results provided in the paper.
Boolean equivalence checking has turned out to be a powerful method for verifying combinational circuits and is already an integrated part of the design cycle. If equivalence checking fails, Design Error Diagnosis and Correction (DEDC) is performed. DEDC tries to locate and correct design errors fully automatically and can therefore considerably speed up the whole design cycle. The methods can roughly be divided into three classes: ATPG based approaches (e.g. [4]), structure based approaches [3], and logic based (symbolic) approaches (e.g. [1]). Most approaches rely on the "single error assumption" and cannot be applied if multiple errors occur in a circuit. This is a hard restriction for practical applications as the average number of design errors is usually greater one. However, multiple error rectification is a challenging task since the search space grows exponentially with the number of design errors. Our method is a symbolic method for multiple error rectification of combinational circuits and further development of [1] that can correct single errors, only.
One of the big challenges in circuit design is the formal verification at clocked algorithmic or register-transfer level. To overcome the limits of BDD based approaches we apply an abstraction of the datapath by uninterpreted functions [2]. A function f is uninterpreted if all properties except &universal;i(si => ti)) = f(s1,...,sn) = f (t1,...,tn) are dropped. In the past symbolic execution and theorem proving were used to check the equivalence of two sequential circuits that are abstracted by uninterpreted functions. Symbolic execution is an enumeration of states reachable from the initial state [2]. Because of the uninterpreted functions there is no general termination condition of such procedures. In the theorem prover based approach [4] the proof is usually carried out using the induction principle. Often lemmas are needed to prove the equivalence. These lemmas are also proven by induction. These lemmas are often invariants. The proof of the induction step is automated by decision procedures. In our approach symbolic execution is used to generate potential invariants. Then the equivalence is proven by automatic induction proofs of the lemmas. A more detailed description of the procedure can be found in [3].
A Single Phase Latch (SPL) suitable for GaAs domino logic gates and compatible with DCFL is presented. Two versions of the SPL are reported in this work: Single Ended SPL used in pure domino logic and Differential SPL used in dynamic Cascode Voltage Switch Logic. SPL is compared with other common GaAs dynamic circuits and latches. The results demonstrate that SPL is superior in terms of device count, area, clock rate and power consumption.
The fast growing complexity of today's real time embedded systems necessitates new design methods and tools to face the problems of integration and validation of complex systems. We have combined a number of different hardware and software methods into one system level design method. The proposed flow is based on UML concepts, executable specifications and platform based design.
A heuristic design-for-checkability method based on observation point insertion in the Circuit Under Check (CUC) is proposed to increase the error detection ability of Concurrent Checkers (CC). In particular, at least 99% of error detection is obtained for parity checkers and almost all ISCAS'85 benchmark circuits by inserting 2-5 groups of observation points compacted by parity trees.
This paper presents a self-checking, on-line testing and diagnosis scheme for bus lines affected by intermediate voltage values possibly due to bridging faults, or to different kinds of faults affecting the bus connected units.
In this paper we present a method for on-line testing of multiplier. The method is based on the time and information natural redundancy and provides design of the simple self-checking checker for hard failure detection in 8-bit and 16-bit array multiplier.
The application of the Linear Error Mechanism Modeling Algorithm (LEMMA [1]) to various DAC and ADC architectures has raised the issue of including hard-fault-coverage as an integral part of the algorithm. In this work, we combine defect-oriented functionality tests and specification-oriented linearity tests of a Mixed-Signal IC to save test time. The key development is a novel test point selection strategy which not only optimizes the INL-prediction variance of the model, but also satisfies hard-fault- coverage constraints.