MARCO GSRC Calibrating Achievable Design Theme

GSRC Technology Extrapolation (GTX)

The GTX Framework

revision 1.0, June 2, 2000

Andrew Caldwell, Andrew B. Kahng, Igor Markov, Mike Oliver and Dirk Stroobandt

Contents

I.  Introduction
II.  Related Work and GTX Goals
II.a. VLSI Technology Extrapolation
II.b. Related Artificial Intelligence Works
II.c. GTX Goals in Perspective
III.  The Structure of GTX
III.a. Parameters, Rules and Rule Chains
III.b. Engine Structure and Operation
III.c. Graphical User Interface


Abstract

 Technology extrapolation - the calibration and prediction of achievable design in future technology generations - drives the evolution of VLSI system architectures, design methodologies, and design tools. Via roadmapping efforts such as the International Technology Roadmap for Semiconductors (ITRS), technology extrapolation also influences levels of investment in various areas of academic research, private-sector entrepreneurial activity, and other facets of VLSI design automation.

 This page describes the general ideas behind GTX, the MARCO GSRC Technology Extrapolation system, as well as its main structure. GTX provides a robust, portable framework for interactive specification and comparison of modeling choices, e.g., for predicting system cycle time, die size and power dissipation. Unlike previous ``hard-coded'' systems, GTX adopts a paradigm wherein parameters and rules allow users to flexibly capture attributes and relationships germane to VLSI technology and design. Serialized user-defined rules can be composed in numerous ways to define rule chains, which are then executed by a derivation engine to perform studies. Supporting grammars, parameter naming conventions, extension mechanisms, etc. enable GTX to incorporate - and serve as a repository for - literally unlimited forms of domain knowledge.


I. Introduction

 Leading-edge VLSI system design aggressively exploits new process technologies, circuit techniques, design methodologies and design tools. It is thus difficult to predict the envelope of achievable design - e.g., with respect to power, speed, area, manufacturing cost, etc. - for a given behavior or function, in a given (future) process technology. On the other hand, such technology extrapolation activity directly influences the evolution of future VLSI system architectures, design methodologies, and design tools. Via roadmapping efforts such as the International Technology Roadmap for Semiconductors (ITRS), technology extrapolation also influences levels of investment in academic research, career choices for faculty and graduate students, as well as private-sector entrepreneurial activity.

 Highly influential technology extrapolation systems, developed 5-10 years ago, are due to Bakoglu and Meindl (SUSPENS), Sai-Halasz, and Hewlett-Packard Laboratories (AIM). More recent ``second-generation'' systems include GENESYS, RIPE and BACPAC, along with Roadmap-related efforts (ITRS, Fisher) and innumerable internal projects throughout industry and academia. Typically, each system provides a plausible ``cycle-time model'' and estimates of die size and power dissipation, based on a small set of descriptors spanning device/interconnect technology through system architecture. In Section II, we observe that

These observations motivate efforts toward an entirely new level of technology extrapolation capability. Our GSRC Technology Extrapolation (GTX) system has been developed with the goals of flexibility, quality and prevention of redundant effort in mind.

 The GTX system addresses these goals by providing an open, portable framework for specification and comparison of alternative modeling choices. The GTX paradigm is based on the concepts of ``parameters'' and ``rules''. Parameters represent the values used to describe technology, circuit and design attributes, while rules encapsulate methods of deriving unknown parameter values from known values. Rules include executable algorithm implementations and closed-form models or table-lookups. Serialized user-defined rules can be composed in numerous ways to define rule chains, which are then executed by a derivation engine to perform studies.

 A fundamental design decision in GTX is to separate model specifications from the derivation engine. This separation is achieved by a human-readable ASCII grammar. As domain-specific knowledge is represented independently of the derivation engine, it can be created and shared by multiple users. Additional extension mechanisms allow specialized prediction methods, technology data sets, and even optimization engines to be encapsulated and shared within GTX; this further reduces the amount of effort that is diverted from actual creation of best-possible prediction models.

 Section II reviews relevant previous work in VLSI technology extrapolation and puts the GTX goals in perspective. Section III describes the architecture and implementation of GTX. 


II. Related Work and GTX Goals

II.a VLSI Technology Extrapolation

 A number of previous systems attempt to forecast and estimate the performance of microprocessors. Given that GTX can flexibly accommodate the addition of new rules and inference chains, a baseline GTX implementation is intended to encompass these previous models. Four systems - SUSPENS, GENESYS, RIPE and BACPAC - are especially noteworthy.

 SUSPENS is the forerunner for most technology extrapolation systems. SUSPENS predicts the clock frequency, chip area and power dissipation. It ignores on-chip cache and memory structure, as well as details of multi-layer interconnect structure and clock distribution. SUSPENS is also oblivious to such DSM effects as scaling and noise.

 The entire system is characterized by eleven equations at the chip level. Interconnect, gates and technology are described by as few parameters as possible. In the chip-level calculations, there are 3 parameters at the chip architecture level (Rent's parameter, logic gates and activity factors of the gates), 9 parameters for chip technology (defining devices, interconnects, and efficiencies), and 1 parameter at the chip integration level (number of gates).

 GENESYS offers both a GUI for MS Windows (95, 98, NT) and a command-line interface (it has no Web interface).

 Inputs to GENESYS fall into five main classifications:

The GENESYS output file is divided into four main sections: device/material, circuit, interconnect, and system. The device section contains information concerning device parameter calculations such as device capacitance and drain currents. The circuit section is broken into four parts: area, capacitance, delay and energy. The interconnect portion provides information on the interconnect structure of each wiring tier, as well as results of certain repeater insertion optimizations. System-level outputs include throughput, maximum clock frequency, CPI, and delay times for random logic and interconnects.

 RIPE explores the effect of interconnect design and technology tradeoffs on IC performance. Default input data are extracted from the NTRS roadmap. Memory and the multilayer interconnect structure are taken into account. No estimations of noise or reliability are available; other limitations are in the modeling of electromigration, non-ideal scaling, etc. The RIPE executable, available via Web interface, can be used in two basic modes:

The user can choose between the two modes, but cannot add new parameters and rules. In mode (i) the global wire parameters include 30 system-level parameters (including cache parameters, gates scaling, logic depth, total chip Area, activity factor), 16 technology parameters (including minimum feature size, cell areas, clock style and clock skew, and device resistance and capacitance); and 4 interconnect parameters (including resistance, capacitance, and pitch). Mode (ii) adds further parameters such as interconnect description for the lowest level and the target clock frequency; RIPE then estimates the number of wiring levels, wire pitches, and wiring efficiency of the chip.

 BACPAC is based on a system-level performance model that consists of smaller-scale analytical models. The innovations of BACPAC compared to earlier models include attention to power dissipation, on-chip memory, process variation, and other effects. BACPAC is applicable to both ASICs and microprocessors. It attempts to enhance the accessibility of technology extrapolation via a Web-based interface; users can enter parameter values and receive relevant technology predictions. However, the derivation flow is mostly fixed, and users cannot add new parameters and rules. BACPAC does not capture architectural attributes or system reliability.

 BACPAC inputs include 20 parameters to specify interconnect dimensions for up to 9 wiring layers, 15 device parameters, 4 parameters to capture design characteristics (e.g., number of transistors and Rent's exponent), and 9 parameters to capture cycle time and critical-path customization. BACPAC can perform a number of analyses:

II.b Related Artificial Intelligence Works

 Design Sheet aims to facilitate conceptual system design in an arbitrary domain. The system accepts tabular data and algebraic (in)equalities, and allows the user to compose these equations in near-arbitrary fashion to

TkSolver is used for modeling, knowledge management and design optimization. As with Design Sheet, it uses a relationship graph to propagate relationships among variables.

 UniCalc solves systems of algebraic relations. Valid relations include equations, inequalities and logical expressions. UniCalc is based on constraint programming and handles over- and under-determined systems as well as coefficient and variables with uncertain (interval) values. It is implemented as an integrated environment supporting direct user input, editing, calculations, results browsing, accuracy specifications and file I/O. User input can also be specified in a ``source language'' that handles commonly used mathematics. UniCalc is used in a number of domain-specific AI systems for combinatorics, domain-specific natural language processing, knowledge-based scheduling and financial planning. More importantly, it is used in advanced constraint programming environments such as NeMo, which allow rapid construction of full-fledged domain-specific constraint-based applications.

 Although these systems are very powerful, their generality may impose unnecessary overheads for VLSI technology extrapolation.

II.c GTX Goals in Perspective

 With respect to the previous systems for technology extrapolation, we make the following observations.

  1. Different systems may predict the same ``parameter'' (e.g., microprocessor clock frequency), yet be incomparable due to differing sets of inputs and assumptions, as well as lack of documentation and visibility into internal calculations.
  2. Each system typically offers exactly one ``inference chain'' for any given output of interest (e.g., cycle time). Furthermore, this inference chain can involve a large spectrum of modeling choices in, for example, the formula for optimal repeater insertion and sizing in global interconnects, or the form of the wirelength distribution induced from the system Rent parameter.

  3. The quality of such modeling choices cannot be assessed since the system is ``hard-coded'', and no exploration of modeling sensitivity or robustness is possible. (For example, it is difficult to determine areas of vulnerability or fragility in the 1999 ITRS using, say, BACPAC.)
  4. The hard-coded nature of previous systems also means that they are inflexible: the user cannot define studies of other system parameters, and interaction with the system is limited.
  5. Finally, development of previous systems has entailed near-total duplication of effort - since each system attempts to bound the same envelope of achievable design - in gathering, interpreting, and systematizing data and models. Redundant efforts are made even though no single entity - EDA vendor, system house, or academic group - can achieve ``best-possible modeling'' of all aspects of technology and design. (For example, recent systems attempt non-trivial models of process scaling, manufacturing yield and cost, microprocessor architecture and implementation (e.g., area and power implications of cache tags), etc.)
These observations motivate three key goals as we seek a new level of technology extrapolation capability.
  1. Flexibility.

  2. To experimentally determine model sensitivity and robustness, users must have the ability to GTX inherits the flexibility of AI constraint-programming and design support systems, while retaining VLSI domain-specificity and avoiding unreasonable implementation complexity. Support for interaction (GUI, session management, etc.) is an implicit requirement.
  3. Quality.

  4. GTX seeks adoptability in the sense of having an easy learning curve and providing much ``value'' in the form of high-quality embedded data, embedded models, and user interface. We aim for a system that can be continuously improved to have ``best-possible models'' across the entire scope of technology extrapolation. Since no single group can achieve this alone, we require an open-source mechanism that is conducive to distributed ownership and maintenance.
  5. Prevention of redundant effort.

  6. To avoid redundant effort, GTX is meant as a ``permanent repository of first choice'' for rules and data (calibration points) related to technology extrapolation. Beyond the open distribution mechanism noted above, adoptability (by academics open to collaboration, or by companies with proprietary data and firewalls) and maintainability become key concerns. A lower bound for adoptability is a platform-independent implementation that subsumes the functionality of all previous ``hard-coded'' systems. This recognizes the proprietary nature of user data and offers usability behind firewalls, with frequent releases to update the state of model/data collection. GTX also applies to any domain of semiconductors, VLSI or VLSI CAD, and is extensible to models of arbitrary complexity.

III. The Structure of GTX

 GTX establishes a clear separation between knowledge and implementation (see Figure). Knowledge is represented independently from its implementation in a serializable public-domain format. It contains data (parameters), the models (rules) that can operate on them and studies (rule chains), a collection of rules to obtain a particular result. The implementation then consists only of a derivation engine and a graphical user interface (GUI). The engine can load modules of parameters, rules and a rule chain and automatically operate on them. The result of the operation is new data. Known studies are supplied in pre-packaged rule chains; additional modules can be written and shared by users.

III.a Parameters, Rules and Rule Chains

 As previously mentioned, the values of interest are encapsulated in parameters, and potential inferences between them in rules. Each rule accepts as inputs a fixed collection of parameters, and its evaluation computes a single output parameter. The collection of available rules and parameters is naturally viewed as a bipartite digraph in which an edge extends from a rule to a parameter if the parameter is the output of the rule, or from a parameter to a rule if the parameter is an input to the rule.

 Two or more rules may compute the same output (i.e., alternative models of the same value), and the above digraph may contain cycles. However, any particular calculations must avoid such irregularities to prevent value conflicts and infinite loops. This is supported through the notion of a rule chain - an acyclic subgraph of the graph of available rules and parameters such that no two rules compute the same output. (Except for a constraint, a special kind of rule for calculations with constraints on the input parameter values, see User manual.)

 Parameters

 Parameters are the common base on which rules of different types operate. The main attributes of a parameter are its name, data type and its units. In order to obtain the goal of high reuse-ability of rules and parameters, the parameter names have to be carefully chosen so that they are easy to understand. Also, we must ensure that no physical attribute receives two different names in GTX and that no GTX parameter name is used for two different physical attributes. Therefore, we have devised strict rules for the parameter names.

 The grammar for parameters is specified at the grammar document. Following is a very simple example representing the chip edge length.

 #parameter dl_chip
#type double
#units {m}
#default
    1e-2
#description
    chip edge length
#endparameter

 Rules

 GTX supports the following types of rules:

These types provide a reasonable expressive power and facilitate easy updates to GTX with new models. The following is an example of an ASCII rule computing the chip edge length from the chip area. The #output and #inputs sections declare the types and units of output and input parameters. The formula in the #body section specifies the evaluation of the rule.

 
#namespace BACPAC

#rule dl_chip
#description
    rule from BACPAC for the chip edge length
#output
    double {m} dl_chip;     // chip edge length
#inputs
    double {m^2} dA_chip;     // chip area
#body
    sqrt(dA_chip)
#reference
    BACPAC
#endrule

 Rule chain

 The GTX user indicates to the engine which of the currently available rules should be evaluated, by providing a simple list of those rules. The order in which rules are executed forms the rule chain, and is decided by the engine based on the relations between the rule inputs and outputs. If we had a rule ``BACPAC::dA_chip'' that computes the chip area, e.g., as a function of number and size of the gates, then the chip edge length could be computed by executing the following rule chain

BACPAC::dA_chip.
BACPAC::dl_chip.

 Additional features

 A Constraint is a special kind of rule that limits engine calculations to those input value combinations which meet given constraints. Each input may have multiple values. Based upon its input values, the constraint rule decides whether a given calculation is to continue. If the constraint fails the outputs for that set of values are not considered and the engine goes on to a different set of inputs.

 The output of a constraint is always the special boolean parameter CONSTRAINT, which (unlike any other parameter) is permitted to be calculated by more than one rule in the same rule chain (this permits multiple constraints).

III.b Engine Structure and Operation

 For each parameter, the engine maintains zero, one or more values. Values can be set by default, loaded from files, entered by the user or computed. Multiple values can be computed by sweeping, i.e., evaluating rules over multiple combinations of input parameters. When instructed to evaluate a rule chain, the engine clears values that can be computed by rules of the chain. For each combination of values of primary inputs of the chain, the engine evaluates rules in topological order and adds their output values to respective collections of values, unless some constraints fail. A faster algorithm is possible to produce all derivable sets of values, but with our simple algorithm the inputs of any particular value can be recovered (e.g., for minimization along a rule chain).

III.c Graphical User Interface

 The GUI is implemented with the cross-platform toolkit wxWindows; we have run it successfully on Windows 95/98, Windows NT, Solaris and Linux.

 At any given time, the user may view

  1. current parameters
  2. current rules
  3. current rule chain
  4. values of parameters in the current chain
  5. the graph of relationships between parameters and rules, as a tree
When a particular parameter or rule is selected, its details are shown and can be edited. The chain view shows all rules in the chain and helps the user to add new rules to the chain. The values view shows both inputs to and outputs of the current chain. The inputs may be edited. This view permits invoking the chain and observing the output, sweeping over multiple input values, observing the trace of such a sweep (including optimization) and plotting (see Figure). In addition to the four views, the GUI handles extensive file I/O and interactive addition of new parameters and rules.

 More information on the use of the GUI to perform studies in GTX can be found in the user manual.