# 13.7  Static Timing Analysis

We return to the comparator/MUX example to see how timing analysis is applied to sequential logic. We shall use the same input code ( comp_mux.v in Section 13.2 ), but this time we shall target the design to an Actel FPGA.

Before routing we obtain the following static timing analysis:

Instance name in pin-->out pin tr total incr cell

--------------------------------------------------------------------

END_OF_PATH

outp_2_ R 27.26

OUT1 : D--->PAD R 27.26 7.55 OUTBUF

I_1_CM8 : S11--->Y R 19.71 4.40 CM8

I_2_CM8 : S11--->Y R 15.31 5.20 CM8

I_3_CM8 : S11--->Y R 10.11 4.80 CM8

IN1 : PAD--->Y R 5.32 5.32 INBUF

a_2_ R 0.00 0.00

BEGIN_OF_PATH

The estimated prelayout critical path delay is nearly 30 ns including the I/O-cell delays (ACT 3, worst-case, standard speed grade). This limits the operating frequency to 33 MHz (assuming we can get the signals to and from the chip pins with no further delays—highly unlikely). The operating frequency can be increased by pipelining the design as follows (by including three register stages: at the inputs, the outputs, and between the comparison and the select functions):

// comp_mux_rrr.v

module comp_mux_rrr(a, b, clock, outp);

input [2:0] a, b; output [2:0] outp; input clock;

reg [2:0] a_r, a_rr, b_r, b_rr, outp; reg sel_r;

wire sel = ( a_r <= b_r ) ? 0 : 1;

always @ ( posedge clock) begin a_r <= a; b_r <= b; end

always @ ( posedge clock) begin a_rr <= a_r; b_rr <= b_r; end

always @ ( posedge clock) outp <= sel_r ? b_rr : a_rr;

always @ ( posedge clock) sel_r <= sel;

endmodule

Following synthesis we optimize module comp_mux_rrr for maximum speed. Static timing analysis gives the following preroute critical paths:

Rise delay, Worst case

Instance name in pin-->out pin tr total incr cell

--------------------------------------------------------------------

END_OF_PATH

D.a_r_ff_b2 R 4.52 0.00 DF1

INBUF_24 : PAD--->Y R 4.52 4.52 INBUF

a_2_ R 0.00 0.00

BEGIN_OF_PATH

---------------------CLOCK to SETUP longest path---------------------

Rise delay, Worst case

Instance name in pin-->out pin tr total incr cell

--------------------------------------------------------------------

END_OF_PATH

D.sel_r_ff R 9.99 0.00 DF1

I_1_CM8 : S10--->Y R 9.99 0.00 CM8

I_3_CM8 : S00--->Y R 9.99 4.40 CM8

a_r_ff_b1 : CLK--->Q R 5.60 5.60 DF1

BEGIN_OF_PATH

Rise delay, Worst case

Instance name in pin-->out pin tr total incr cell

--------------------------------------------------------------------

END_OF_PATH

outp_2_ R 11.95

OUTBUF_31 : D--->PAD R 11.95 7.55 OUTBUF

outp_ff_b2 : CLK--->Q R 4.40 4.40 DF1

BEGIN_OF_PATH

The timing analyzer has examined the following:

1. Paths that start at an input pad and end on the data input of a sequential logic cell (the D input to a D flip-flop, for example). We might call this an entry path (or input-to-D path) to a pipelined design. The longest entry delay (or input-to-setup delay) is 4.52 ns.
2. Paths that start at a clock input to a sequential logic cell and end at the data input of a sequential logic cell. This is a stage path ( register-to-register path or clock-to-D path) in a pipeline stage. The longest stage delay ( clock-to-D delay) is 9.99 ns.
3. Paths that start at a sequential logic cell output and end at an output pad. This is an exit path ( clock-to-output path) from the pipeline. The longest exit delay ( clock-to-output delay) is 11.95 ns.

By pipelining the design we added three clock periods of latency, but we increased the estimated operating speed. The longest prelayout critical path is now an exit delay, approximately 12 ns—more than doubling the maximum operating frequency. Next, we route the registered version of the design. The Actel software informs us that the postroute maximum stage delay is 11.3 ns (close to the preroute estimate of 9.99 ns). To check this figure we can perform another timing analysis. This time we shall measure the stage delays (the start points are all clock pins, and the end points are all inputs to sequential cells, in our case the D input to a D flip-flop). We need to define the sets of nodes at which to start and end the timing analysis (similar to the path clusters we used to specify timing constraints in logic synthesis). In the Actel timing analyzer we can use predefined sets 'clock' (flip-flop clock pins) and 'gated' (flip-flop inputs) as follows:

timer> startset clock

timer> endset gated

timer> longest

1st longest path to all endpins

Rank Total Start pin First Net End Net End pin

0 11.3 a_r_ff_b2:CLK a_r_2_ block_0_OUT1 sel_r_ff:D

1 6.6 sel_r_ff:CLK sel_r DEF_NET_50 outp_ff_b0:D

... 8 similar lines omitted ...

We could try to reduce the long stage delay (11.3 ns), but we have already seen from the preroute timing estimates that an exit delay may be the critical path. Next, we check some other important timing parameters.

## 13.7.1  Hold Time

Hold-time problems can occur if there is clock skew between adjacent flip-flops, for example. We first need to check for the shortest exit delays using the same sets that we used to check stage delays,

timer> shortest

1st shortest path to all endpins

Rank Total Start pin First Net End Net End pin

0 4.0 b_rr_ff_b1:CLK b_rr_1_ DEF_NET_48 outp_ff_b1:D

1 4.1 a_rr_ff_b2:CLK a_rr_2_ DEF_NET_46 outp_ff_b2:D

... 8 similar lines omitted ...

The shortest path delay, 4 ns, is between the clock input of a D flip-flop with instance name b_rr_ff_b1 (call this X ) and the D input of flip-flop instance name outp_ff_b1 ( Y ). Due to clock skew, the clock signal may not arrive at both flip-flops simultaneously. Suppose the clock arrives at flip-flop Y 3 ns earlier than at flip-flop X . The D input to flip-flop Y is only stable for (4 – 3) = 1 ns after the clock edge. To check for hold-time violations we thus need to find the clock skew corresponding to each clock-to-D path. This is tedious and normally timing-analysis tools check hold-time requirements automatically, but we shall show the steps to illustrate the process.

## 13.7.2  Entry Delay

Before we can measure clock skew, we need to analyze the entry delays, including the clock tree. The synthesis tools automatically add I/O pads and the clock cells. This means that extra nodes are automatically added to the netlist with automatically generated names. The EDIF conversion tools may then modify these names. Before we can perform an analysis of entry delays and the clock network delay, we need to find the input node names. By looking for the EDIF 'rename' construct in the EDIF netlist we can associate the input and output node names in the behavioral Verilog model, comp_mux_rrr , and the EDIF names,

piron% grep rename comp_mux_rrr_o.edn

(port (rename a_2_ "a[2]") (direction INPUT))

... 8 similar lines renaming ports omitted ...

(net (rename a_rr_0_ "a_rr[0]") (joined

... 9 similar lines renaming nets omitted ...

piron%

Thus, for example, the EDIF conversion program has renamed input port a[2] to a_2_ because the design tools do not like the Verilog bus notation using square brackets. Next we find the connections between the ports and the added I/O cells by looking for 'PAD' in the Actel format netlist, which indicates a connection to a pad and the pins of the chip, as follows:

piron%

This tells us, for example, that the node we called clock in our behavioral model has been joined to a node (with automatically generated name) called CLKBUF_30:PAD , using a net (connection) named DEF_NET_145 (again automatically generated). This net is the connection between the node clock that is dangling in the behavioral model and the clock-buffer pad cell that the synthesis tools automatically added.

## 13.7.3 Exit Delay

We now know that the clock-pad input is CLKBUF_30:PAD , so we can find the exit delays (the longest path between clock-pad input and an output) as follows (using the clock-pad input as the start set):

Working startset 'clockpad' contains 0 pins.

Working startset 'clockpad' contains 2 pins.

I shall explain why this set contains two pins and not just one presently. Next, we define the end set and trace the longest exit paths as follows:

Working endset 'outpad' contains 3 pins.

timer> longest

1st longest path to all endpins

Rank Total Start pin First Net End Net End pin

3 pins

This tells us we have three paths from the clock-pad input to the three output pins ( outp[0] , outp[1] , and outp[2] ). We can examine the longest exit delay in more detail as follows:

timer> expand 0

1st longest path to OUTBUF_33:PAD (rising) (Rank: 0)

Total Delay Typ Load Macro Start pin Net name

16.1 3.7 Tpd 0 OUTBUF OUTBUF_33:D DEF_NET_154

12.4 4.5 Tpd 1 DF1 outp_ff_b0:CLK DEF_NET_1530

7.9 7.9 Tpd 16 CLKEXT_0 CLKBUF_30/U0:PAD DEF_NET_144

The input-to-clock delay, t IC , due to the clock-buffer cell (or macro) CLKEXT_0 , instance name CLKBUF_30/U0 , is 7.9 ns. The clock-to-Q delay, t CQ , of flip-flop cell DF1 , instance name outp_ff_b0 , is 4.5 ns. The delay, t QO , due to the output buffer cell OUTBUF , instance name OUTBUF_33 , is 3.7 ns. The longest path between clock-pad input and the output, t CO , is thus

 t CO = t IC + t CQ + t QO = 16.1 ns . (13.23)

This is the critical path and limits the operating frequency to (1 / 16.1 ns) ª 62 MHz.

When we created a start set using CLKBUF_30:PAD , the timing analyzer told us that this set consisted of two pins. We can list the names of the two pins as follows:

Pin name Net name Macro name

2 pins

The clock-buffer instance name, CLKBUF_30/U0 , is hierarchical (with a '/' hierarchy separator). This indicates that there is more than one instance inside the clock-buffer cell, CLKBUF_30 . Instance CLKBUF_30/U0 is the input driver, instance CLKBUF_30/U1 is the output driver (which is disabled and unused in this case).

## 13.7.4 External Setup Time

Each of the six chip data inputs must satisfy the following set-up equation:

 t SU (external) > t SU (internal) – (clock delay) + (data delay (13.24)

(where both clock and data delays end at the same flip-flop instance). We find the clock delays in Eq.  13.24 using the clock input pin as the start set and the end set 'clock' . The timing analyzer tells us all 16 clock path delays are the same at 7.9 ns in our design, and the clock skew is thus zero. Actel’s clock distribution system minimizes clock skew, but clock skew will not always be zero. From the discussion in Section 13.7.1 , we see there is no possibility of internal hold-time violations with a clock skew of zero.

Next, we find the data delays in Eq,  13.24 using a start set of all input pads and an end set of 'gated' ,

timer> longest

... lines omitted ...

1st longest path to all endpins

Rank Total Start pin First Net End Net End pin

10 10.0 INBUF_26:PAD DEF_NET_1320 DEF_NET_1320 a_r_ff_b0:D

11 9.7 INBUF_28:PAD DEF_NET_1380 DEF_NET_1380 b_r_ff_b1:D

12 9.4 INBUF_25:PAD DEF_NET_1290 DEF_NET_1290 a_r_ff_b1:D

13 9.3 INBUF_27:PAD DEF_NET_1350 DEF_NET_1350 b_r_ff_b2:D

14 9.2 INBUF_29:PAD DEF_NET_1410 DEF_NET_1410 b_r_ff_b0:D

15 9.1 INBUF_24:PAD DEF_NET_1260 DEF_NET_1260 a_r_ff_b2:D

16 pins

We are only interested in the last six paths of this analysis (rank 10–15) that describe the delays from each data input pad ( a[0] , a[1] , a[2] , b[0] , b[1] , b[2] ) to the D input of a flip-flop. The maximum data delay, 10 ns, occurs on input buffer instance name INBUF_26 (pad 26); pin INBUF_26:PAD is node a_0_ in the EDIF file or input a[0] in our behavioral model. The six t SU (external) equations corresponding to Eq,  13.24 may be reduced to the following worst-case relation:

 t SU (external) max > t SU (internal) – 7.9 ns + max (9.1 ns, 10.0 ns) > t SU (internal) + 2.1 ns (13.25)

We calculated the clock and data delay terms in Eq.  13.24 separately, but timing analyzers can normally perform a single analysis as follows:

 t SU (external) max > t SU (internal) – (clock delay – data delay) min . (13.26)

Finally, we check that there is no external hold-time requirement. That is to say, we must check that t SU (external) is never negative or

 t SU (external) min > t SU (internal) – (clock delay – data delay) max > 0 > t SU (internal) + 1.2 ns > 0 . (13.27)

Since t SU (internal) is always positive on Actel FPGAs, t SU (external) min is always positive for this design. In large ASICs, with large clock delays, it is possible to have external hold-time requirements on inputs. This is the reason that some FPGAs (Xilinx, for example) have programmable delay elements that deliberately increase the data delay and eliminate irksome external hold-time requirements.