ASIC-System on Chip-VLSI Design: October 2007

3-D chip design strategy

It is clear from previous discussions that with the ever increasing chip complexity and functionalities interconnect delay problems are going to be worse in very deep submicron technology. 3-D IC (or interconnect) technologies such as wire-bonding, micro-bumps, through-vias, and contact less interconnect are promising solution for interconnect delay concerns. This technique helps in effective large scale integration of different systems on a single IC.

3-D IC design architecture consists of a number of blocks which are divided from a 2-D chip(s). Different silicon layers are stacked one above the other and different blocks are placed on different layers, known as “tier”. Multiple layer of interconnects can be constructed in each Si tier. These interconnects are linked by vertical interconnects. By routing vertical interconnects appropriately long wire length can be shortened. Multiple active routing layers enhances the options to place the critical path components close to each other thereby decreasing RC delay and significantly improve performance of the design. Global interconnects are made common to all layers. Long wire inter-block communication delay is eliminated by placing these blocks in different layers and connecting them by a vertical interconnect. Thus system design becomes flexible with 3-D IC architecture. Three-dimensional integration can reduce the wiring and hence reduce the capacitance, power dissipation, and chip area and therefore improve overall performance of the chip.

The powerful advantage of 3-D chip design methodology can be exploited to build System on Chips (SoCs). Circuits with different voltage and performance requirements such as digital and analog components in the mixed-signal systems can be placed in different layers as shown in the Figure (1).

Figure (1) Block representation of 3-D IC [5]

Blocks are placed in different layers have lesser electromagnetic interference noise. This can achieve better noise performance of the intended design. High performance SoCs requires synchronous clock distribution. At the topmost layer of the 3-D IC optical interconnects and I/Os can be employed to achieve this.

Reference

[1] Kaustav Banerjee, Shukri J. Souri, Pawan Kapur, And Krishna C. Saraswat, 3-D ICs: A Novel Chip Design for Improving Deep-Submicrometer Interconnect Performance and Systems-on-Chip Integration, Proceedings Of The IEEE, pp.602-633,Vol. 89, NO. 5, May 2001, 0018–9219/01, 2001, IEEE

[3] Jason Cong, An Interconnect-Centric Design Flow for Nanometer Technologies, Proceedings of the IEEE, pp.505-528, VOL. 89, No. 4, April 2001, 0018–9219/01,2001 IEEE

[4] Sungjun Im, Navin Srivastava, Kaustav Banerjee, and Kenneth E. Goodson, Thermal Scaling Analysis of Multilevel Cu/Low-k Interconnect Structures in Deep Nanometer Scale Technologies, Proceedings of the 22nd International VLSI Multilevel Interconnect Conference (VMIC), Oct. 3 - 6, Fremont, CA, pp. 525-530, 2005.

[5] Demystifying 3D ICs: The pros and cons of going vertical, http://www.ece.ncsu.edu/muse/papers/dtoc2005.pdf, 9/5/2007

Related Articles

Limits of Cu/low-k interconnects

At submicron level of 250 nm copper with low-k dielectric was introduced to decrease affects of increasing interconnect delay. But below 130 nm technology node interconnect delays are increasing further despite of introducing low-k dielectric. As the scaling increases new physical and technological effects like resistivity and barrier thickness start dominating and interconnect delay increases. Introduction of repeaters to shorten the interconnect length increases total area. The vias connecting repeaters to global layers can cause blockage in lower metal layers. Thus as the technology improves material limitations will dominate factor in the interconnect delay. Increasing metal layer width will cause increase in metallization layer. This can’t be a solution for the problem as it increases complexity, reliability and cost.

Cu low-k dielectric films are deposited by a special process known as Damascene process. Adhesion property of Cu with dielectric materials is very poor. Under electric bias they easily drift and cause short between metal layers. To avoid this problem a barrier layer is deposited between dielectric and Cu trench. Even though it decreases effective cross section of interconnects compared to drawn dimensions, it improves reliability. The barrier thickness becomes significant in deep submicron level and effective resistance of the interconnect rises further. In addition to this increasing electron scattering and self heating caused by the electron flow in interconnects due to comparable increase in internal chip temperature also contribute to increase interconnect resistance.

Related Articles

In scan chains if some flip flops are +ve edge triggered and remaining flip flops are -ve edge triggered how it behaves?

Answer:

For designs with both positive and negative clocked flops, the scan insertion tool will always route the scan chain so that the negative clocked flops come before the positive edge flops in the chain. This avoids the need of lockup latch.

For the same clock domain the negedge flops will always capture the data just captured into the posedge flops on the posedge of the clock.

For the multiple clock domains, it all depends upon how the clock trees are balanced. If the clock domains are completely asynchronous, ATPG has to mask the receiving flops.

What is difference between normal buffer and clock buffer?

Answer:

Clock net is one of the High Fanout Net(HFN)s. The clock buffers are designed with some special property like high drive strength and less delay. Clock buffers have equal rise and fall time. This prevents duty cycle of clock signal from changing when it passes through a chain of clock buffers.

Normal buffers are designed with W/L ratio such that sum of rise time and fall time is minimum. They too are designed for higher drive strength.

What is difference between HFN synthesis and CTS?

Answer:

HFNs are synthesized in front end also.... but at that moment no placement information of standard cells are available... hence backend tool collapses synthesized HFNs. It resenthesizes HFNs based on placement information and appropriately inserts buffer. Target of this synthesis is to meet delay requirements i.e. setup and hold.

For clock no synthesis is carried out in front end (why.....????..because no placement information of flip-flops ! So synthesis won't meet true skew targets !!) ... in backend clock tree synthesis tries to meet "skew" targets...It inserts clock buffers (which have equal rise and fall time, unlike normal buffers !)... There is no skew information for any HFNs.

Is it possible to have a zero skew in the design?

Answer:

Theoretically it is possible....!

Practically it is impossible....!!

Practically we cant reduce any delay to zero.... delay will exist... hence we try to make skew "equal" (or same) rather than "zero"......now with this optimization all flops get the clock edge with same delay relative to each other.... so virtually we can say they are having "zero skew " or skew is "balanced".

What you mean by scan chain reordering?

Answer1:

Based on timing and congestion the tool optimally places standard cells. While doing so, if scan chains are detached, it can break the chain ordering (which is done by a scan insertion tool like DFT compiler from Synopsys) and can reorder to optimize it.... it maintains the number of flops in a chain.

Answer2:

During placement, the optimization may make the scan chain difficult to route due to congestion. Hence the tool will re-order the chain to reduce congestion.

This sometimes increases hold time problems in the chain. To overcome these buffers may have to be inserted into the scan path. It may not be able to maintain the scan chain length exactly. It cannot swap cell from different clock domains.

Because of scan chain reordering patterns generated earlier is of no use. But this is not a problem as ATPG can be redone by reading the new netlist.

On what basis we decide the clock frequency in any design?

Answer:

There are several factors. Important of them are:

1) Input and output data rate : For example if you are designing any encryptor or decryptor you need minimum 100 MHz

2) Power: Higher the frequency more the power consumption

3)Accuracy of the results required: If higher accuracy is not needed RC oscillator can be used which saves area... and everything we want in compact size..... but RC cant produce higher frequency !

4) Technology: Lower the node more speed (also more power....again trade off !!).... how much fast we want ?

5) Target platform: Is it FPGA or custom ASIC.... naturally ASIC can give higher clok frequency... but FPGA frequency of operation is limited by several other factors

What is JTAG?

Answer1:

JTAG is acronym for "Joint Test Action Group".This is also called as IEEE 1149.1 standard for Standard Test Access Port and Boundary-Scan Architecture. This is used as one of the DFT techniques.

Answer2:

JTAG (Joint Test Action Group) boundary scan is a method of testing ICs and their interconnections. This used a shift register built into the chip so that inputs could be shifted in and the resulting outputs could be shifted out. JTAG requires four I/O pins called clock, input data, output data, and state machine mode control.

The uses of JTAG expanded to debugging software for embedded microcontrollers. This elimjinates the need for in-circuit emulators which is more costly. Also JTAG is used in downloading configuration bitstreams to FPGAs.

JTAG cells are also known as boundary scan cells, are small circuits placed just inside the I/O cells. The purpose is to enable data to/from the I/O through the boundary scan chain. The interface to these scan chains are called the TAP (Test Access Port), and the operation of the chains and the TAP are controlled by a JTAG controller inside the chip that implements JTAG.

For more info:

http://www.xess.com/faq/M0000297.HTM
http://www.cadreng.com/open_source/jtag/jtag_tutorial.php
http://www.ee.ic.ac.uk/pcheung/teaching/ee3_DSD/ti_jtag_seminar.pdf

Limitations of the existing interconnect technologies

Performances of deep sub micron ICs are limited by increasing interconnect loading affect. Long global clock networks account for the larger part of the power consumption in chips. Traditional CAD design methodologies are largely affected by the interconnect scaling. Capacitance and resistance of interconnects have increased due to the smaller wire cross sections, smaller wire pitch and longer length. This has resulted in increased RC delay. As technology is advancing scaling of interconnect is also increasing. In such scenario increased RC delay is becoming major bottleneck in improving performance of advanced ICs.

Figure (1) Gate and interconnect delays Vs technology nodes [1]

This problem is illustrated in Figure (1).

Here the gate delay and the interconnect delay are shown as functions of various technology nodes ranging from 180nm to 60nm. The interconnect delays shown assumes a line where repeaters are connected optimally and includes the delay due to the repeaters. From the graph it can be observed that with the shrinking of technology gate delay reduces but interconnect delay increases.

Reference

[3] Jason Cong, An Interconnect-Centric Design Flow for Nanometer Technologies, Proceedings of the IEEE, pp.505-528, VOL. 89, No. 4, April 2001, 0018–9219/01,2001 IEEE

[5] Demystifying 3D ICs: The pros and cons of going vertical, http://www.ece.ncsu.edu/muse/papers/dtoc2005.pdf, 9/5/2007

Related Articles

Introduction to Interconnect Technologies

Interconnects are dominating in deep sub micron level VLSI design due to decreasing wire pitch. Increased complexities of circuits die size are also increasing. Planar 2-D ICs may not fulfill the requirement of heterogeneous integration of different technologies in single IC. As technology nodes advanced below 250nm, interconnect delays started dominating heavily. The introduction of Cu/low-k interconnects provided lesser resistivity and hence reduced delay. But in deep submicron domain (bellow 130 nm) even these interconnects added considerable delay due to increased resistivity owing several material limitations. [1-4]. Interconnected related problems like increased delay and power consumption are tackled by 3-D ICs which shorten the total interconnect length. [1][5].This part of the article series discusses these issues mainly based on reference [1].

Reference

[2] Kaustav Banerjee, Shukri J. Souri, Pawan Kapur and Krishna C. Saraswat, 3-D Heterogeneous ICs: A Technology for the Next Decade and Beyond, 5th IEEE Workshop On Signal Propagation On Interconnects, Venice, Italy May, 13-16, 2001
[3] Jason Cong, An Interconnect-Centric Design Flow for Nanometer Technologies, Proceedings of the IEEE, pp.505-528, VOL. 89, No. 4, April 2001, 0018–9219/01,2001 IEEE
[4] Sungjun Im, Navin Srivastava, Kaustav Banerjee, and Kenneth E. Goodson, Thermal Scaling Analysis of Multilevel Cu/Low-k Interconnect Structures in Deep Nanometer Scale Technologies, Proceedings of the 22nd International VLSI Multilevel Interconnect Conference (VMIC), Oct. 3 - 6, Fremont, CA, pp. 525-530, 2005.
[5] Demystifying 3D ICs: The pros and cons of going vertical, http://www.ece.ncsu.edu/muse/papers/dtoc2005.pdf, 9/5/2007

Related Articles

ASIC Design Check List

Silicon Process and Library Characteristics

What exact process are you using?
How many layers can be used for this design?
Are the Cross talk Noise constraints, Xtalk Analysis configuration, Cell EM & Wire EM available?

Design Characteristics

What is the design application?
Number of cells (placeable objects)?
Is the design Verilog or VHDL?
Is the netlist flat or hierarchical?
Is there RTL available?
Is there any datapath logic using special datapath tools?
Is the DFT to be considered?
Can scan chains be reordered?
Is memory BIST, boundary scan used on this design?
Are static timing analysis constraints available in SDC format?

Clock Characteristics

How many clock domains are in the design?
What are the clock frequencies?
Is there a target clock skew, latency or other clock requirements?
Does the design have a PLL?
If so, is it used to remove clock latency?
Is there any I/O cell in the feedback path?
Is the PLL used for frequency multipliers?
Are there derived clocks or complex clock generation circuitry?
Are there any gated clocks?
If yes, do they use simple gating elements?
Is the gate clock used for timing or power?
For gated clocks, can the gating elements be sized for timing?
Are you muxing in a test clock or using a JTAG clock?
Available cells for clock tree?
Are there any special clock repeaters in the library?
Are there any EM, slew or capacitance limits on these repeaters?
How many drive strengths are available in the standard buffers and inverters?
Do any of the buffers have balanced rise and fall delays?
Any there special requirements for clock distribution?
Will the clock tree be shielded? If so, what are the shielding requirements?

Floorplan and Package Characteristics

Target die area?
Does the area estimate include power/signal routing?
What gates/mm2 has been assumed?
Number of routing layers?
Any special power routing requirements?
Number of digital I/O pins/pads?
Number of analog signal pins/pads?
Number of power/ground pins/pads?
Total number of pins/pads and Location?
Will this chip use a wire bond package?
Will this chip use a flip-chip package?
If Yes, is it I/O bump pitch? Rows of bumps? Bump allocation?Bump pad layout guide?
Have you already done floorplanning for this design?
If yes, is conformance to the existing floorplan required?
What is the target die size?
What is the expected utilization?
Please draw the overall floorplan ?
Is there an existing floorplan available in DEF?
What are the number and type of macros (memory, PLL, etc.)?
Are there any analog blocks in the design?
What kind of packaging is used? Flipchip?
Are the I/Os periphery I/O or area I/O?
How many I/Os?
Is the design pad limited?
Power planning and Power analysis for this design?
Are layout databases available for hard macros ?
Timing analysis and correlatio?
Physical verification ?

Data Input

Library information for new library
.lib for timing information
GDSII or LEF for library cells including any RAMs
RTL in Verilog/VHDL format
Number of logical blocks in the RTL
Constraints for the block in SDC
Floorplan information in DEF
I/O pin location
Macro locations

Routing

Routing flow is shown in the Figure (1).

Figure (1) Routing flow [1]

Routing is the process of creating physical connections based on logical connectivity. Signal pins are connected by routing metal interconnects. Routed metal paths must meet timing, clock skew, max trans/cap requirements and also physical DRC requirements.

In grid based routing system each metal layer has its own tracks and preferred routing direction which are defined in a unified cell in the standard cell library.

There are four steps of routing operations:

1. Global routing
2. Track assignment
3. Detail routing
4. Search and repair

Global Route assigns nets to specific metal layers and global routing cells. Global route tries to avoid congested global cells while minimizing detours. Global route also avoids pre-routed P/G, placement blockages and routing blockages.

Track Assignment (TA) assigns each net to a specific track and actual metal traces are laid down by it. It tries to make long, straight traces to avoid the number of vias. DRC is not followed in TA stage. TA operates on the entire design at once.

Detail Routing tries to fix all DRC violations after track assignment using a fixed size small area known as “SBox”. Detail route traverses the whole design box by box until entire routing pass is complete.

Search and Repair fixes remaining DRC violations through multiple iterative loops using progressively larger SBox sizes.

Reference

[1] Astro User Guide, Version X-2005.09, September 2005

Related Articles

Clock Tree Synthesis (CTS)

The goal of CTS is to minimize skew and insertion delay. Clock is not propagated before CTS as shown in Figure (1).

Figure (1) Ideal clock before CTS

After CTS hold slack should improve. Clock tree begins at .sdc defined clock source and ends at stop pins of flop. There are two types of stop pins known as ignore pins and sync pins. ‘Don’t touch’ circuits and pins in front end (logic synthesis) are treated as ‘ignore’ circuits or pins at back end (physical synthesis). ‘Ignore’ pins are ignored for timing analysis. If clock is divided then separate skew analysis is necessary.

Global skew achieves zero skew between two synchronous pins without considering logic relationship.

Local skew achieves zero skew between two synchronous pins while considering logic relationship.

If clock is skewed intentionally to improve setup slack then it is known as useful skew.

Rigidity is the term coined in Astro to indicate the relaxation of constraints. Higher the rigidity tighter is the constraints.

In Clock Tree Optimization (CTO) clock can be shielded so that noise is not coupled to other signals. But shielding increases area by 12 to 15%. Since the clock signal is global in nature the same metal layer used for power routing is used for clock also. CTO is achieved by buffer sizing, gate sizing, buffer relocation, level adjustment and HFN synthesis. We try to improve setup slack in pre-placement, in placement and post placement optimization before CTS stages while neglecting hold slack. In post placement optimization after CTS hold slack is improved. As a result of CTS lot of buffers are added. Generally for 100k gates around 650 buffers are added.

Global skew report is shown below.

**********************************************************************
*
* Clock Tree Skew Reports
*
* Tool : Astro
* Version : V-2004.06 for IA.32 -- Jul 12, 2004
* Design : sam_cts
* Date : Sat May 19 16:09:20 2007
*
**********************************************************************

======== Clock Global Skew Report =============================

Clock: clock
Pin: clock
Net: clock

Operating Condition = worst
The clock global skew = 2.884
The longest path delay = 4.206
The shortest path delay = 1.322

The longest path delay end pin: \mac21\/mult1\/mult_out_reg[2]/CP
The shortest path delay end pin: \mac22\/adder1\/add_out_reg[3]/CP

The Longest Path:
====================================================================
Pin Cap Fanout Trans Incr Arri Master/Net
--------------------------------------------------------------------
clock 0.275 1 0.000 0.000 r clock
U1118/CCLK 0.000 0.000 0.000 r pc3c01
U1118/CP 3.536 467 1.503 1.124 1.124 r n174
\mac21\/mult1\/mult_out_reg[2]/CP
4.585 3.082 4.206 r sdnrq1
[clock delay] 4.206
====================================================================

The Shortest Path:
====================================================================
Pin Cap Fanout Trans Incr Arri Master/Net
--------------------------------------------------------------------
clock 0.275 1 0.000 0.000 r clock
U1118/CCLK 0.000 0.000 0.000 r pc3c01
U1118/CP 3.536 467 1.503 1.124 1.124 r n174
\mac22\/adder1\/add_out_reg[3]/CP
1.701 0.198 1.322 r sdnrq1
[clock delay] 1.322
====================================================================

Figure (2) Clock after CTS and CTO

Related Articles

Placement

Complete placement flow is illustrated in Figure (1).

Figure (1) Placement flow [1]

Before the start of placement optimization all Wire Load Models (WLM) are removed. Placement uses RC values from Virtual Route (VR) to calculate timing. VR is the shortest Manhattan distance between two pins. VR RCs are more accurate than WLM RCs.

Placement is performed in four optimization phases:

1. Ire-placement optimization
2. In placement optimization
3. Post Placement Optimization (PPO) before clock tree synthesis (CTS)
4. PPO after CTS.

Pre-placement Optimization optimizes the netlist before placement, HFNs are collapsed. It can also downsize the cells.

In-placement optimization re-optimizes the logic based on VR. This can perform cell sizing, cell moving, cell bypassing, net splitting, gate duplication, buffer insertion, area recovery. Optimization performs iteration of setup fixing, incremental timing and congestion driven placement.

Post placement optimization before CTS performs netlist optimization with ideal clocks. It can fix setup, hold, max trans/cap violations. It can do placement optimization based on global routing. It re does HFN synthesis.

Post placement optimization after CTS optimizes timing with propagated clock. It tries to preserve clock skew.

Reference

[1] Astro User Guide, Version X-2005.09, September 2005

Related Articles

Timing Analysis in Physical Design

Timing analysis at back end requires knowledge of all clock related constraints provided at front end. When .sdc file given to physical design tool (like Astro) its first object is to remove all Wire Load Models (WLM) which are used for front end timing analysis. In backend there is no term called as wire load model. Actual delays are calculated based on the RC value of metal layers. All RC values like sidewall, junction and fringe capacitances are stored as Table Look Up (TLU) format in technology file.

In backend design hold violation has higher priority compared to setup violation because hold violation is related to data path of the design. Setup violation can be eliminated by slowing down the clock.

Placement and routing goal is always to meet timing constraints provided by the .sdc file. If latency and uncertainty are not set for clock at front end then at backend doing Clock Tree Synthesis (CTS) is not possible.

Cell delay and net delay are stored as look up table.

Cell delay consists of transition, timing arcs and capacitances while net delay is constituted by RCs only. Cell delays are available in libraries

. Net delays are specified in technology files. (In front end it is in WLM). Cell delays are fixed. Net delays are not fixed and they depend on interconnect length and width. Net delay parameters Rnet and Cnet are available as Table Look Up (TLU) provided by the vendor.

There is one more set of file TLU+ which account for Ultra Deep Sub Micron (UDSM) effects. UDSM effects are not included in TLU file. A mapping file maps TLU to TLU+. UDSM effects like Optical Proximity Correction (OPC), Resumption Enhanced Technology (RET) and Litho Compliance Check (LCC) are not taken care by Astro. For the placement stage virtual RC (based on Manhattan distance) Layout Parasitic Extraction (LPE) mode is used. For CTS real R and virtual C is used and for routing Real RC is used.

Clock definition given to SAMM in front end design flow is generated as .sdc file from Design Compiler is given below. It includes clock frequency, rise and fall time, setup and hold, skew and insertion delay.

#####################################################
# Created by Design Compiler write_sdc on Fri May 11 18:35:45 2007
#####################################################
create_clock -period 4.85 -waveform {0 2.425} [get_ports {clock}]
set_clock_transition -rise 0.04 [get_clocks {clock}]
set_clock_transition -fall 0.04 [get_clocks {clock}]
set_clock_uncertainty 0.485 -setup [get_clocks {clock}]
set_clock_uncertainty 0.27 -hold [get_clocks {clock}]
set_clock_latency 0.45 [get_clocks {clock}]
set_clock_latency -source 0.45 [get_clocks {clock}]

Related Articles

Digital design Interview Questions

If inverted output of D flip-flop is connected to its input how the flip-flop behaves?
Design a circuit to divide input frequency by 2?
Design a divide by two counter using D-Latch.
Design a divide-by-3 sequential circuit with 50% duty cycle.
What are the different types of adder implementation?
Draw a Transmission Gate-based D-Latch?
Give the truth table for a Half Adder. Give a gate level implementation of the same.
Design an OR gate from 2:1 MUX.
What is the difference between a LATCH and a FLIP-FLOP?
Design a D Flip-Flop from two latches.
Design a 2 bit counter using D Flip-Flop.
What are the two types of delays in any digital system
Design a Transparent Latch using a 2:1 Mux.
Design a 4:1 Mux using 2:1 Mux's.
What is metastable state? How does it occur?
What is metastablity?
Design a 3:8 decoder
Design a FSM to detect sequence "101" in input sequence
Convert NAND gate into Inverter in two different ways.
Design a D and T flip flop using 2:1 mux only.
Design D Latch from SR flip-flop.
Define Clock Skew, Negative Clock Skew, Positive Clock Skew?
What is race condition? How it occurs? How to avoid it?
Design a 4 bit Gray Counter?
Design 4-bit synchronous counter, asynchronous counter?
Design a 16 byte asynchronous FIFO?
What is the difference between a EEPROM and FLASH?
What is the difference between a NAND-based Flash and NOR-based Flash?
Which one is good: asynchronous reset or synchronous reset? Why?
Design a simple circuit based on combinational logic to double the output frequency.
What is the difference between flip-flop and latch?
Implement comparator using combinational logic, that compares two 2-bit numbers A and B. The comparator should have 3 outputs: A > B, A < a =" B.">
Give two ways of converting a two input NAND gate to an inverter?
What is the difference between mealy and moore state-machines?
What is the difference between latch based design and flip-flop based design?
What is metastability and how to prevent it?
Design a four-input NAND gate using only two-input NAND gates.
Why are most interrupts active low?
How do you detect if two 8-bit signals are same?
7 bit ring counter's initial state is 0100010. After how many clock cycles will it return to the initial state?
Design all the basic gates NOT, AND, OR, NAND, NOR, XOR, XNOR using 2:1 Multiplexer.
How will you implement a full subtractor from a full adder?
In a 3-bit Johnson's counter what are the unused states?
What is difference between RAM and FIFO?
What is an LFSR? List a few of its industry applications.
Implement the following circuits:
(a) 3 input NAND gate using minimum number of 2 input NAND gates
(b) 3 input NOR gate using minimum number of 2 input NOR gates
(c) 3 input XNOR gate using minimum number of 2 input XNOR gates assuming 3 inputs A,B,C?
Design a D-latch using (a) using 2:1 Mux (b) from S-R Latch?
How to implement a Master Slave flip flop using a 2 to 1 mux?
How many 2 input xor's are needed to inplement 16 input parity generator?
Convert xor gate to buffer and inverter.
Difference between onehot and binary encoding?
What are different ways to synchronize between two clock domains?
How to calculate maximum operating frequency?
How to find out longest path?
How to achieve 180 degree exact phase shift?
What is significance of ras and cas in SDRAM?
Tell some of applications of buffer?
Implement an AND gate using mux?
What will happen if contents of register are shifter left, right?
What is the basic difference between analog and digital design?
What advantages do synchronous counters have over asynchronous counters?
What types of flip-flops can be used to implement the memory elements of a counter?
What are the advantages of using a microprocessor to implement a counter rather than the conventional method (flip-flop and logic gates)?
What is the principal advantage of Gray Code over straight (conventional) binary?
What does Pipelining do?
Design divide by 2, divide by 3 circuit with equal duty cycle.
How many 4:1 mux do you need to design a 8:1 mux?
What is D-Word, Q-word?
Define Moore, Mealy state machines. Which one is good for timing?
Design a FSM to detect 10110. What is the minimum number of flops required?
Design a simple circuit based on combinational logic to double the output frequency.
Design a 2bit up/down counter with clear using gates. (No verilog or vhdl)
Design a finite state machine to give a modulo 3 counter when x=0 and modulo 4 counter when x=1.
Minimize: S= A' + AB
What is the function of a D-flipflop, whose inverted outputs are connected to its input?
How to synchronize control signals and data between two different clock domains?
Describe a finite state machine that will detect three consecutive coin tosses (of one coin) that results in heads.
In what cases do you need to double clock a signal before presenting it to a synchronous state machine?
How many bit combinations are there in a byte?
What are the different Adder circuits you studied?
Give the truth table for a Half Adder. Give a gate level implementation of the same.
Convert 65(Hex) to Binary
Convert a number to its two's compliment and back.
What is the 1's and 2's complement of the decimal number 25.
If A?B=C and C?A=B then what is the boolean operator ?

Power Planning

There are two types of power planning and management. They are core cell power management and I/O cell power management. In former one VDD and VSS power rings are formed around the core and macro. In addition to this straps and trunks are created for macros as per the power requirement. In the later one, power rings are formed for I/O cells and trunks are constructed between core power ring and power pads. Top to bottom approach is used for the power analysis of flatten design while bottom up approach is suitable for macros.

The power information can be obtained from the front end design. The synthesis tool reports static power information. Dynamic power can be calculated using Value Change Dump (VCD) or Switching Activity Interchange Format (SAIF) file in conjunction with RTL description and test bench. Exhaustive test coverage is required for efficient calculation of peak power. This methodology is depicted in Figure (1).

For the hierarchical design budgeting has to be carried out in front end. Power is calculated from each block of the design. Astro works on flattened netlist. Hence here top to bottom approach can be used. JupiterXT can work on hierarchical designs. Hence bottom up approach for power analysis can be used with JupiterXT. IR drops are not found in floor planning stage. In placement stage rails are get connected with power rings, straps, trunks. Now IR drops comes into picture and improper design of power can lead to large IR drops and core may not get sufficient power.

Figure (1) Power Planning methodology

Below are the calculations for flattened design of the SAMM. Only static power reported by the Synthesis tool (Design Compiler) is used instead of dynamic power.

The number of the core power pad required for each side of the chip

= total core power / [number of side*core voltage*maximum allowable current for a I/O pad]

= 236.2068mW/ [4 * 1.08 V * 24mA] (Considering design SAMM)

= 2.278

~ 2

Therefore for each side of the chip 2 power pads (2 VDD and 2 VSS) are added.

Total dynamic core current (mA)

= total dynamic core power / core voltage

= 236.2068mW / 1.08V

= 218.71 mA

Core PG ring width

= (Total dynamic core current)/ (No. of sides * maximum current density of the metal layer used (Jmax) for PG ring)
=218.71 mA/(4*49.5 mA/µm)
~1.1 µm
~2 µm

Pad to core trunk width (µm)

= total dynamic core current / number of sides * J_max where Jmax is the maximum current density of metal layer used

= 218.71 mA / [4 * 49.5 mA/µm]

= 1.104596 µm

Hence pad to trunk width is kept as 2µm.

Using below mentioned equations we can calculate vertical and horizontal strap width and required number of straps for each macro.

Block current:

I_block= P_block / V_ddcore

Current supply from each side of the block:

I_top=I_bottom= { I_block *[W_block / (W_block +H_block)] }/2

I_left=I_right= { I_block *[H_block / (W_block +H_block)] }/2

Power strap width based on EM:

W_{strap_vertical} =I_top / J_metal

W_{strap_horizontal} =I_left / J_metal

Power strap width based on IR:

W_{strap_vertical} >= [ I_top * R_oe * H_block ] / 0.1 * VDD

W_{strap_horizontal} >= [ I_left * R_oe * W_block ] / 0.1 * VDD

Refresh width:

W_{refresh_vertical} =3 * routing pitch +minimum width of metal (M4)

W_{refresh_horizontal} =3 * routing pitch +minimum width of metal (M3)

Refresh number

N_{refresh_vertical} = max (W_{strap_vertical} ) / W_{refresh_vertical}

N_{refresh_horizontal} = max (W_{strap_horizontal} ) / W_{refresh_horizontal}

Refresh spacing

S_{refresh_vertical} = Wblock / N_{refresh_vertical}

S_{refresh_horizontal} = Hblock / N_{refresh_horizontal}

Figure (2) Showing core power ring, Straps and Trunks

Related Articles

Floor Planning

Floor plan determines the size of the design cell (or die), creates the boundary and core area, and creates wire tracks for placement of standard cells. [1]. It is also a process of positioning blocks or macros on the die.

Floor planning control parameters like aspect ratio, core utilization are defined as follows:

Aspect Ratio= Horizontal Routing Resources / Vertical Routing Resources

Core Utilization= Standard Cell Area / (Row Area + Channel Area)

Total 4 metal layers are available for routing in used version of Astro. M0 and M3 are horizontal and M2 and M4 are vertical layers. Hence aspect ratio for SAMM is 1. Total number of cells =1645; total number of nets=1837 and number of ports (excluding 16 power pads) = 60. The figure depicting floor plan-die size (µm) of SAMM is shown beside.

Top Design Format (TDF) files provide Astro with special instructions for planning, placing, and routing the design. TDF files generally include pin and port information. Astro particularly uses the I/O definitions from the TDF file in the starting phase of the design flow. [1]. Corner cells are simply dummy cells which have ground and power layers. The TDF file used for SAMM is given below. The SAMM IC has total 80 I/O pads out of which 4 are dummy pads. Each side of the chip has 20 pads including 2 sets of power pads. Number of power pads required for SAMM is calculated in power planning section. Design is pad limited (pad area is more than cell area) and inline bonding (same I/O pad height) is used.

define _cell (geGetEditCell)

;create power pads

;Core power supply instantiation for left side

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss1left" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd1left" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss2left" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd2left" "0" "NO" '(0.0 0.0) "sam3"

;Core power supply instantiation for top side

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss1top" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd1top" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss2top" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd2top" "0" "NO" '(0.0 0.0) "sam3"

;Core power supply instantiation for right side

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss1right" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd1right" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss2right" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd2right" "0" "NO" '(0.0 0.0) "sam3"

;Core power supply instantiation for bottom side

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss1bottom" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd1bottom" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pv0i.FRAM" "vss2bottom" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pvdi.FRAM" "vdd2bottom" "0" "NO" '(0.0 0.0) "sam3"

;dummy cell instantiation

dbCreateCellInst (geGetEditCell) "" "pc3t02.FRAM" "dummy1" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pc3t02.FRAM" "dummy2" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pc3t02.FRAM" "dummy3" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pc3t02.FRAM" "dummy4" "0" "NO" '(0.0 0.0) "sam3"

dbCreateCellInst (geGetEditCell) "" "pc3t02.FRAM" "dummy5" "0" "NO" '(0.0 0.0) "sam3"

;corner cell instantiation

dbCreateCellInst (geGetEditCell) "" "pfrelr.FRAM" "cornerll" "270" "NO" '(10 10) "sam3"

dbCreateCellInst (geGetEditCell) "" "pfrelr.FRAM" "cornerlr" "0" "NO" '(10 10) "sam3"

dbCreateCellInst (geGetEditCell) "" "pfrelr.FRAM" "cornerul" "180" "NO" '(10 10) "sam3"

dbCreateCellInst (geGetEditCell) "" "pfrelr.FRAM" "cornerur" "90" "NO" '(10 10) "sam3"

tdfPurgePadConstr

;==================================

;pad placement for corner cells

;==================================

pad "cornerll" "bottom"

pad "cornerur" "top"

pad "cornerlr" "right"

pad "cornerul" "left"

;==================================

;pad(I/O) placement for left side

;==================================

pad "U1065" "left" 1 ;a_row0[0]

pad "U1064" "left" 2 ;a_row0[1]

pad "U1063" "left" 3 ;a_row0[2]

pad "U1062" "left" 4 ;a_row0[3]

pad "U1069" "left" 5 ;a_row1[0]

pad "U1068" "left" 6 ;a_row1[1]

pad "U1067" "left" 7 ;a_row1[2]

pad "vdd1left" "left" 8

pad "vss1left" "left" 9

pad "vdd2left" "left" 10

pad "vss2left" "left" 11

pad "U1066" "left" 12 ;a_row1[3]

pad "U1073" "left" 13 ;a_row2[0]

pad "U1072" "left" 14 ;a_row2[1]

pad "U1071" "left" 15 ;a_row2[2]

pad "U1070" "left" 16 ;a_row2[3]

pad "U1118" "left" 17 ;clock

pad "U1116" "left" 18 ;chip enable

pad "dummy1" "left" 19

pad "dummy2" "left" 20

;==================================

;pad(I/O) placement for top side

;==================================

pad "U1077" "top" 1 ;b_col0[0]

pad "U1076" "top" 2 ;b_col0[1]

pad "U1075" "top" 3 ;b_col0[2]

pad "U1074" "top" 4 ;b_col0[3]

pad "U1081" "top" 5 ;b_col1[0]

pad "U1080" "top" 6 ;b_col1[1]

pad "U1079" "top" 7 ;b_col1[2]

pad "vdd1top" "top" 8

pad "vss1top" "top" 9

pad "vdd2top" "top" 10

pad "vss2top" "top" 11

pad "U1078" "top" 12 ;b_col1[3]

pad "U1085" "top" 13 ;b_col2[0]

pad "U1084" "top" 14 ;b_col2[1]

pad "U1083" "top" 15 ;b_col2[2]

pad "U1082" "top" 16 ;b_col2[3]

pad "U1117" "top" 17 ;reset

pad "U1119" "top" 18 ;mult_over

pad "dummy3" "top" 19

pad "dummy4" "top" 20

;==================================

;pad(I/O) placement for right side

;==================================

pad "U1100" "right" 1 ;c_row1[5]

pad "U1101" "right" 2 ;c_row1[4]

pad "U1102" "right" 3 ;c_row1[3]

pad "U1103" "right" 4 ;c_row1[2]

pad "U1104" "right" 5 ;c_row1[1]

pad "U1105" "right" 6 ;c_row1[0]

pad "U1086" "right" 7 ;c_row0[9]

pad "vdd1right" "right" 8

pad "vss1right" "right" 9

pad "vdd2right" "right" 10

pad "vss2right" "right" 11

pad "U1087" "right" 12 ;c_row0[8]

pad "U1088" "right" 13 ;c_row0[7]

pad "U1089" "right" 14 ;c_row0[6]

pad "U1090" "right" 15 ;c_row0[5]

pad "U1091" "right" 16 ;c_row0[4]

pad "U1092" "right" 17 ;c_row0[3]

pad "U1093" "right" 18 ;c_row0[2]

pad "U1094" "right" 19 ;c_row0[1]

pad "U1095" "right" 20 ;c_row0[0]

;==================================

;pad(I/O) placement for bottom side

;==================================

pad "dummy5" "bottom" 1

pad "U1121" "bottom" 2 ;test_se

pad "U1106" "bottom" 3 ;c_row2[9]

pad "U1107" "bottom" 4 ;c_row2[8]

pad "U1108" "bottom" 5 ;c_row2[7]

pad "U1109" "bottom" 6 ;c_row2[6]

pad "U1110" "bottom" 7 ;c_row2[5]

pad "vdd1bottom" "bottom" 8;

pad "vss1bottom" "bottom" 9

pad "vdd2bottom" "bottom" 10

pad "vss2bottom" "bottom" 11

pad "U1111" "bottom" 12 ;c_row2[4]

pad "U1112" "bottom" 13 ;c_row2[3]

pad "U1113" "bottom" 14 ;c_row2[2]

pad "U1114" "bottom" 15 ;c_row2[1]

pad "U1115" "bottom" 16 ;c_row2[0]

pad "U1096" "bottom" 17 ;c_row1[9]

pad "U1097" "bottom" 18 ;c_row1[8]

pad "U1098" "bottom" 19 ;c_row1[7]

pad "U1099" "bottom" 20 ;c_row1[6]

;=======================================

If TDF is free from syntax errors and pins are properly numbered in consecutive steps then TDF will be read successfully and message will be displayed on scheme window.

Aspect ratio of 0.65 is set which means 65% of the core area is used for cells and remaining 35% is for routing. Since channel less row arrangement is desired for area optimization row to core ratio can be kept at 1. Rows should be arranged horizontal, they are flipped and abutted and thus double back arrangement should be enabled.

Floor planned cell

Floor planned cell is shown above and its related die size is shown first itself. All dimensions are in µm. The total die size is approximately 1.9sqmm.

Reference

[1] Astro User Guide,Version X-2005.09, September 2005

Related Articles

ASIC-System on Chip-VLSI Design

3-D chip design strategy

Limits of Cu/low-k interconnects

In scan chains if some flip flops are +ve edge triggered and remaining flip flops are -ve edge triggered how it behaves?

What is difference between normal buffer and clock buffer?

What is difference between HFN synthesis and CTS?

Is it possible to have a zero skew in the design?

What you mean by scan chain reordering?

On what basis we decide the clock frequency in any design?

What is JTAG?

Limitations of the existing interconnect technologies

Introduction to Interconnect Technologies

ASIC Design Check List

Routing

Clock Tree Synthesis (CTS)

Placement

Timing Analysis in Physical Design

Digital design Interview Questions

Power Planning

Floor Planning

Copyright/Disclaimer

Thank You !

Pageviews last month