Copyright © 1996, 1998 Donnamaie E. White
Last Edit July 1998
Chapter 2 Structured Design MethodologyIntroductionThe Structured Design Methodology, as developed here for the design of Bipolar, CMOS or BiCMOS logic arrays, applies to any array design effort regardless of technology or vendor. The designer who follows this methodology will ensure a smooth design flow between milestones that will help ensure a successful design the first time - called "First-Time Success". The driving intent is to produce a circuit that is successfully buildable in silicon the first time. Wafers are too expensive to allow for trial and error. Successfully buildable means that the chip works, no bugs, no errors. The design flow is presented in this chapter at the introductory level. Following chapters will detail specific areas such as timing analysis, simulation and power computation. Design Sequence - Pre-CaptureThe Structured Design Methodology stresses a certain design flow sequence of events, developed for use by the beginning array (basedie) designer, the beginning user of an Engineering Workstation (EWS) and Design System softwareor the designer experienced in both. Each step will be discussed in more detail after the design flow is fully outlined. Although this book was originally written when engineering workstations featured schematic capture, today's designs are more likely to be in a Verilog or VHDL netlist format. Schematics can be displayed - but they are automatically generated. Design capture via a workstation screen has gone the way of the drafting table. An example is the viewable schematic generated by the Design Analyzer stage of the Synopsys Design Compiler design synthesis tool. Vendor model libraries would need to include models for display purposes. The SIARC (Silicon Architects of Synopsys) CBA (cell-based array) libraries contain such models. Over 50 processes form almost every major array vendor are represented by CBA Libraries as more and more library development is out-sourced from the array vendors themselves. Circuit functional specificationThe circuit functional specification is the target specification; it describes what it is that is to be implemented on one or more arrays. This includes: a block diagram of the system or circuit, overall performance requirements, I/O interface, testability, environmental and packaging requirements. (See Table 2-1.) In blocking out or partitioning a design, the first stages of what is now called floorplanning have been defined. Once the functional specification identifies the need for more than one array, partitioning of the overall circuit modules to ensure proper boundary conditions between design blocks and between arrays must be made and then the functional specifications of the individual array circuits must be created. The specifications must be defined to be independent of each other to allow parallel circuit development. Note that there is no constraint at this point as to the product to be used beyond operating specifications. An integral part of the specification will be to identify potential instances for design reuse. If a block of the target design matches a block that has been used before, that previously-designed block can be identified and targeted for re-use without redesign. To save time and design effort (and, therefore, cost), the block does need to be a 100% match to the design requirements. If it incorporates the functionality required, that is sufficient. Such blocks have been labeled as "IP" blocks, for intellectual property. The technology of the array is defined by the performance requirements. As a basic guideline, high speed requires ECL bipolar, slower speeds and low power require CMOS, and moderate speeds and bipolar drive capability without the price of bipolar power dissipation require BiCMOS. Where the boundaries are between these various technologies is subjective and subject to continual evolution and change. As CMOS has increased in speed and as the need to reduce dissipated power has grown, many circuits are now on CMOS with speeds that are comparable to early ECL. Speeds for CMOS were usually under 100MHz. This is not, however, the upper limit and design speeds will continue to increase as DSM (deep sub-micron) designs move to dominate the design starts. Table 2-1 Components Of The Target Specification
Circuit hardware specificationThe circuit hardware specification is the planned hardware approach to satisfying the target functional specification. For multiple array designs, this may involve another level of specification, one specification for each circuit intended for a different array. This implies that project partitioning has been completed, and defines all required I/O and throughput performance. (See Table 2-2.) A hardware architecture specification equates to PDL (program description language) for software. It identifies modules and closely defines how the modules will work together. HDL (hardware description language) and VHDL have been developed to formalize this specification. From this level of specification it is possible to estimate I/O signal requirements and internal core requirements. At this point, the estimates are very rough and will only serve to allow a first cut at reducing the number of array families that need to be considered. Some compromises or engineering tradeoffs may have been made, refining the functional specification. Review of the available arraysThe arrays available at the time of a design evaluation need to be reviewed using the outline in Table 2-3 as an initial basis of comparison. In today's environment, this comparison is made simpler by the existence of the CBA/CBAII library. The core library is the same on all 50 processes from 11+ vendors. The functionality of the library when CBA/CBAII has been chosen is process-independent. There is still, however, variability in the functionality of the I/Os. Since some vendor support their own I/O library and others use the CBA/CBAII I/O library. A new factor introduced into the design equation is the availability of IP blocks. These blocks can make a library unique to one vendor. The long range intent is to allow these blocks to be licensed from their designer for design use. The existence of IP is either a new complication, or a step to simplification. Core blocks designed on the CBA/CBAII architecture can be transported from one process to another within certain restrictions. SIARC offers the Block Transport software to assist in this operation. Figure 2-1 indicates the interdependencies between functional specification, hardware specification and the available arrays and processes. Table 2-2 Components Of The Hardware Specification
Figure 2-1 The Array Selection Process - Fixed Array Family The Array Selection Process - CBA Supported ProcessThe use of CBA Architecture changes this flow. It is now possible, within the limits of package availability, for a designer to specify the basedie itself based on estimated circuit size. The CBA Design System software, CBA Frame, accepts an ASCII input description that specifies the number of rows and columns of CBA coreunits and the number of I/O and uses descriptions of those units ported to a particular process. The software also allows the insertion of all-layer apertures for the inclusion of all-layer hard IP blocks. Given this flexibility, the designer using CBA must chose a process rather than an array family. SummaryThe hardware review must compare the circuit specifications with the available processes and produce a list of the available array processes that could be used to support those specifications. As the number of potential choices is reduced, preliminary implementations of some of the critical paths for the circuit, constructed from the macro libraries under consideration, should be evaluated. Synthesis programs support modular design efforts such as non-I/O modules that allow just such evaluations to be made in minutes. Critical path analysis and other initial-stage evaluations are possible with the synthesis program itself, and most other design software will allow incomplete designs to be examined. Table 2-3 Array Checklist - Initial Review
Initial Sizing of the CircuitBefore an array or array series has been chosen, estimate the size of the circuit or circuits to be placed on the array. Estimate the number of I/O connections, the types of I/O connections and the I/O cell count. The I/O cell count and the pad count may both be required. Identify unusual I/O structures. Identify IP (intellectual property) blocks that will be imported to the array (embedded block). Estimate the internal cell count. (See Table 2-4.) Table 2-4 Sizing Review
For standard functions, equivalent gate counts may exist that can be used in place of internal cell count to estimate the size of the internal array area that will be required. Internal cell counts are more useful than equivalent gate counts where the cells are more complex than one or two gates. For CBA, internal cell count is split between number of compute cells and number of drive cells. For CBA/CBAII-based libraries, the macros specify compute and drive site usage by macro drive option. The individual circuit modules can be sized once a basic netlist exists. IP blocks are pre-designed circuit modules, somewhat like a giant macrocell, and their footprint is known. CBA Block Import specifies the architectural code required to include such a block into the basedie description and the block size is part of this code. Each CBA SRAM designed with CBA Logical Memory Architect comes with a datasheet that provides the "footprint" of the RAM, convertible to coreUnits. This also breaks down into number of computes and number of drives. The metalization (2-layer, 3-layer, 4-layer, 5-layer) of the process will determine what the potential maximum internal utilization may be. It is not guarenteed and the designer would do well to allow for "elbow room" in a design to prevent serious routing issues later. Utilization of 90-95% is possible with the 4 and 5-layer processes. For CBA in a 3-layer process, 75% is a wildly optimistic goal. The routability of a design can be estimated from initial pins/net and pins/section computations produced by CBA Advisor, part of the CBA Design System software. For any design, the use of RAMs and embedded blocks (and their spacing requirements) must be factored into the design size. CBA SRAMs "float" on the CBA fabric or core architecture. Embedded blocks require an aperture larger than they are to allow for power and ground rings to surround the block. In estimating die size, I/O to core spacing, I/O site or cell size and scribe lines must also be added. Compare these sizing estimates to the review of the array processes still under consideration and their I/O resources, their internal density and their maximum frequency of operation. Note that, at this stage in the design, the sizing estimates for the circuit may be off by a considerable margin. Historically, device size at the estimate stage of a design is 20-30% below the final value. Hence the suggestion about "elbow room". Internal cell utilization or Compute/Drive Site UtilizationThe first population checks can be made before the circuit is designed. Internal cell utilization is one of these checks. Internal cell utilization is the number of cells required by a circuit divided by the number of cells available. number of internal cells used Internal cell utilization = -------------------------------------------- number of internal cells available For CBA arrays, the utilization is computed as the number of compute sites used vs. the number available and the number of drive sites used vs. the number available. number of compute sites used Compute site utilization = -------------------------------------------- number of compute sites available number of drive sites used Compute site utilization = -------------------------------------------- number of drive sites available Macros that are suitable can be listed and a rough estimate of internal cell/site utilization computed. This step includes a review of the available macros in the various libraries with emphasis on the requirements of the specific circuit application. Where several vendors who use CBA are involved, this comparison is simplified since the CBA core library is guarenteed to be identical across all of the processes which support CBA. Also, different CBA macro drive options use different amounts of compute and drive sites. The actual macro lists are no longer prepared by hand. The Verilog netlist can be input to Design Compiler along with the chosen library database and Design Compiler will synthesize the circuit, selecting the macros as it goes. The success of the synthesis in the use of the more complex multi-input macros vs. the simpler macros is a function of the number of design optimization constraints placed on the Compiler at its time of execution. The designer can synthesize parts of a design and optimize each and then combine the results or can synthesize the entire circuit at once. Modular design, where several people can work on several modules in parallel, is faster. It may not, however, be the ultimate in design efficiency from the point of view of the synthesizer. All other things being equal, the convenience of the macro library can be a decisive factor in the final array selection. Do the macros available support the circuit modules? Large macros may include adders, carry-look-ahead, comparators, up and down counters, universal registers, large multiplexors and decoders. The CBA library includes simple adders. flip-flops and DFT (design for test) elements. Large units may be pre-constructed and added to the library as a repeat-cell structure or as an IP block. Internal cell utilization should be 60-70% at the initial stages of sizing estimates to allow for expansion due to buffers, fan-out load distribution, path balancing or specification changes. The internal cell utilization limit for a completed design is array-specific. (See Table 2-5.) AMCC ECL arrays had an upper limit of 95-100%. CBA devices can reach 95% utilization with 4-5 layers metal and the right design. Table 2-5 Internal Cell Utilization Limit
Interface cell utilizationThe I/O requirements to the outside world are the second size determination. The array for a circuit must provide sufficient I/O capability to handle all signals, all other interface-placed circuit support such as three-state enable drivers, test enable controls and added power and ground pads to support simultaneously switching outputs (SSO) and high-speed inputs. number of interface cells used Interface cell utilization = -------------------------------------------- number of interface cells available As with internal cell utilization, only an estimate of final interface cell utilization can be made. For fixed-size arrays, the array should not use 100% of the I/O or the design will become I/O bound. Pad utilization, for cases where the I/O cells and pads are not one for one, must also be kept under 100%. CBA arrays are designed specifically for the design at hand. CBA Frame produces a core-limited or IO-limited base die based on input parameters. Design synthesis will produce a count of I/O required. Placement requirements and restrictions will dictate the number of added power and grounds that are required. A check on array symmetry should be made. The Q20000 Series arrays do not provide the same number of I/O cells in each array quadrant. This may affect placement and added power and ground usage. The Q24008 is not square and has variable power and ground bonding. Check for these and other variations that might affect allowable utilization of the I/O pads and cells. A CBA array's symmetry is under the control of the base-die designer and the allowable bonding configurations. Selection of the ProcessIntegrate the hardware specification, the available array processes and the initial sizing estimates to select the target array process and/or array series. The final choice is usually based on the performance - cost - availability - support matrix. In cases of equivalence between one or more array series, the final choice may be subjective. Package availability should be considered in the early decisions since customized packages, especially for large arrays, take months to develop. The specified performance and requirements for on-chip memory will assist in the reducing the number of options. Only a limited number of fixed-size arrays support on-chip memory, such as the QM1600T. CBA arrays support the CBA SRAMs. Other RMAs must be handled as custom embedded all-layer blocks. CMOS and BiCMOS do not yet support designs operating at 300MHz (although individual macros can toggle at these speeds). High-speed bipolar arrays support paths operating over 1.4GHz and climbing. Combine all of the information gathered to date |
Placement Considerations Checklist |
---|
|
Combine internal cell utilization, internal pin count and complexity of placement to evaluate circuit routability.
Compute the path propagation delay for the most critical (time sensitive) paths in the circuit. Make adjustments to the schematic in terms of macro options for speed where needed. Does the estimated performance satisfy the specification?
Sum of Macro | Sum of Macro | |
Path Delay = | Intrinsic Delays + | Extrinsic Loading |
Delays |
For the arrays that use typical specifications, be certain to use the correct multiplication factor (WCM) for this worst-case analysis by properly specifying the operating conditions. Review the assumptions made in establishing the multiplication factors and adjust them if these assumptions are not expected to be met (i.e., derate the performance by a higher factor). Some vendors call these multiplication factors "adjustment factors". Be clear as to what is being adjusted and why.
There may be different multipliers for the different product grades, Commercial and Military, and for different power supplies within the product grade. The multiplier may depend on the macro type.
Many arrays are specified without worst-case timing multipliers. They are specified with min/max ranges for each macro propagation delay. Maximum path delay is found using the MAX data although the conditions for a maximum propagation delay for an individual macro will vary. Minimum delays are found using the MIN data.
Be certain that the proper fan-out loading and performance specifications are selected when doing this computation. Because of the high degree of variation in the way a library is documented between vendors and between array series from the same vendor, be certain that the rules regarding the methods of specifying timing delays for the macros for the array series selected are clearly understood.
Internal extrinsic loading delays are composed of metal load (Lnet), electrical fan-out load, the sum of all loads driven (Lfo), wire-OR electrical loading if the array allows wire-ORs and if one was used in the net (Lwo) and the k-factors for each. The k-factors, expressed in ns/LU, convert the load units into time units. Table 2-7 shows the extrinsic load equations for internal nets as they are used by AMCC and other vendors. K-factors may be specified as tables, graphs, or broken down into parts for temperature, voltage and processing. Check with the specific vendor.
Critical path identification and computation of its delay can be performed by many tools. The Design Compiler synthesis tool can perform early estimates before any placement has been performed. The library must support fast, typical and slow values and the correct operating condition must be selected. Design Compiler is to be enhanced to allow min/max analysis.
To annotate the simulation netlist, programs such as CBA Annotate and SGFvgen produce files for use with simulation and timing analysis (static and dynamic timing validation). These programs will compute Front-Annotation values, or, if a floorplanner has been used, Intermediate (post-place) Annotation values for the intrinsic loading.
Table 2-7 Components Of Path Delay - Internal Loading
External extrinsic loading delays are composed of the system load capacitance and the package pin capacitance (Lcap) and the k-factor. The k-factor, expressed in ns/pF, converts the load capacitance into time units. Sample equations for this delay are listed in Table 2-8.
Table 2-8 Components Of Path Delay - External Loading
Use the macro occurrence list compiled for cell utilization to compute power. Determine the worst-case current multipliers used by the array and what voltage variations will be used by the circuit for DC power computations. Review the AC power equation if AC power must be computed. ECL output macros use a termination current and that power element must be included with the DC power computation.
Different technologies use different methods to compute power as seen by the examples in Table 2-9.
Table 2-9 Example Technology Approaches To Power Computation - AMCC Arrays (1984-1994)
|
Design Software has been slowly expanding to provide power estimates and final data at diferent stages of the design flow. CBA Logical Memory Architect, which is used to design the CAB SRAMS, reports on maximum AC and DC Idd current which allows power computations. Pre-designed blocks know their power requirements. The remaining logic may be oiptimzed during synthesis to reduce power as one of the design objectives. Third-party tools can compute circuit power dissipation automatically.
A maximum internal current may be specified for bipolar arrays. It is possible for the total core current to be computed and compared to array limits. It does not guarantee that the design will later pass layout row current limits. If the circuit internal core current is high and the cell utilization is also high, and other placement constraints are required, then the placement process will be difficult and may be unsuccessful. These restrictions apply more to high-frequency bipolar designs than to current CMOS designs.
Before placement, a global check is used, verifying that the core as a whole can handle the current required by the macros. A more detailed bus-check, or row, half-row, and quadrant current check, can be made after placement for those arrays which require this type of checking.
BiCMOS and CMOS arrays typically have no internal current limit. The development of three-layer metal arrays reduced the concern for this check for bipolar arrays as well, leaving the final control of the power used in the design to be a function of the ability to keep the junction temperature of the packaged part within limits.
Make the final package selection based on the array chosen and the estimated power. Refer to the Packaging Brochure from the chosen vendor.
For packages with internal power and ground planes, the package selected will control the placement of added power and grounds if the use of package signal pins is to be avoided. A package must accommodate all signal pins required for the circuit plus any signal pins required by added power and grounds not placed to connect to the internal power/ground planes of the package.
When a package has no internal bonding planes, the selected package signal pins must be sufficient to include all circuit signals and all added power and grounds.
Review the array for any other pads that need package signal pins before making the package selection. The Q20000 Series arrays have four fixed pads, two for the thermal diode anode and cathode and two for the AC speed monitor. These array pads must reach external package signal pins, decreasing what is available for the circuit proper. CBA/CBAII basedies use the corner cells to place 2-6 power/ground pads per corner. These are bonded out to package pins. Each CBA/CBAII library/process database contains 1-2 packages to start to accommodate basic designs. Specific packages must be suplied by the array vendor.
Compute the estimated junction temperature based on the power dissipation, the packages available that meet specifications and the operating environment, including any heat sinking and air flow as specified in the functional specifications. If possible, several options should be evaluated.
The allowed packages for an array should also have their thermal coefficients for junction-case (Qjc) and junction-ambient (Qja) specified. Tables or some other means of computing the coefficient for case-ambient (Qca) as a function of the heatsink, the array, the package and airflow should also be provided. For most military applications, Tc can be maintained at 125oC. For most Commercial applications, Ta can be maintained at 70oC.
Read "Theta" for Q:
Military: Tj = Pd * Qjc + Tc
Commercial: Tj = Pd * Qja + Ta
with Qca = Qjc + Qca
With the completion of both timing and power analysis, changes in macro options, or optional functions within the circuit can be evaluated and the speed-power curve managed before full schematic capture and simulation have been performed.
As an option, a bonding diagram (pin out) request can be submitted to the vendor for approval
Both pin out requests and placement requests can be initiated by the designer and both must be approved by the vendor after layout and Back-Annotation evaluation.
Review the requirements for the process design submission as specified by the vendor.