Mantra VLSI : Time borrowing and Time stealing

Time Borrowing

In certain designs (particularly processor architectures), have some stages which are time hogs (like multiplier). The succeeding stage (say, store) hardly requires any time when compared to a multiply stage. For the sake of one time hogging section of the pipeline, one cannot penalize the entire pipeline.

So, if we can borrow extra time required for the “multiply” stage, from the succeeding ‘less time consuming’ “store” stage, we can have an efficient pipeline in terms of TIMING.

What is Time Borrowing ?

Time Borrowing also known as cycle stealing occurs at a LATCH.

By definition, Time Borrowing is permitting the logic to automatically borrow time from next cycle, thereby reducing the time available for data to arrive for the following cycle OR permitting the logic to use slack from the previous cycle, in the current cycle (explained in FIG # 2).

The slack used from previous cycle ripples through the pipeline automatically. Time Borrowing (Cycle Stealing) applies ONLY to LATCH based designs, while Time Stealing for flop based designs.
In FIG # 1 below, time hogging PATH # 1 causes setup violation at FF1. With clock period of 5 ns, and PATH # 1 consuming 7 ns, timing cannot be met UNLESS, clock period is changed from 5 ns -> 7 ns (the least. Tpd, setup and hold of FF is assumed to be 0). Increasing the clock period affects the performance of the pipeline.

The above timing issue is resolved with “SAME” clock period of 5 ns, using “TIME BORROWING” principle as shown in FIG # 2. In FIG # 2, FF1 is replaced with LATCH1 which is POSITIVE LEVEL sensitive. This OPENS the LATCH1 at the same time as FF1 at 0ns, but closes LATCH1 at 2.5 ns ( at negative edge of CLK1), unlike FF1.

So, PATH # 1 has extra 2.5 ns to borrow from next cycle (as LATCH1 closes at 2.5 ns). Time borrowed by PATH#1 = 2 ns ( PATH # 1 delay (7 ns) – CLK period (5 ns). PATH#1 can use the entire 2.5 ns, but uses only 2 ns, leaving a positive slack of 0.5 ns.

Since LATCH1 closes at 2.5 ns, there is NO TIMING VIOLATION, as data from PATH # 1 -> LATCH1 arrived 0.5 ns before LATCH1 is closed. Output of LATCH1 is immediately available for combinatorial PATH#2. PATH#2 starts right where PATH #1 left off, as shown in the fig. (this is important to remember, as Prime Time uses this principle for Time Borrowing, while reporting). PATH#2 adds 1 ns delay from where PATH#1 left off (@ 2 NS ) also referred as the start point (not the pin G of LATCH1) for FF2.

PATH#2 could have used upto 3 ns (0.5 ns slack from previous stage + 2.5 ns of half-clk-period of current cycle), but uses only 1 ns. Valid data is available for capture FF2 at 3 ns. Since rising edge of capture FF2 happens at 5 ns, FF2 has positive slack of 2 ns (5 ns – 3 ns).

This should clear the concept of borrowing time from next cycle & using the slack from previous cycle.

Timing is met with NO changes to clock, but just by replacing FF1 with LATCH1. Time borrowing stops once you hit the pipeline with a Flip flop. Ideally, to exploit Time Borrowing principle fully, pipeline should employ only LATCHES.

Why NOT negative edge triggered flip-flop instead of latch ?

If we replace LATCH1 with negative edge triggered flip-flop FFN1 as in FIG#2B, PATH#1 will still have that extra 2.5 ns (half clock period) to borrow from next clock cycle, just as in case of LATCH. So why NOT use FLIP-FLOPs which are preferred in any design methodology over LATCH.

On the input side, a negative flop will behave just the same way as a latch. The value addition a latch brings, is on the output side. The transparency nature of the LATCH will help the succeeding stage use positive slack (leftover, if any) in the current stage OR pass on the negative slack in the current stage to succeeding stage so that slack ripples through the pipe, till it hits a section of the pipe with positive slack.
By comparing FIG#2 (above) and FIG#2B (below) waveforms, one can understand the usefulness of the LATCH over negative edge FLOP.

In FIG#2B, in case of negative edge flop, the data appears at the input of PATH#2 (output of FFN1) at time = 2.5 ns. With PATH#2 consuming 1 ns, data arrives at the output of PATH#2 at 3.5 ns (2.5ns + 1ns) and we are lucky to have a positive slack of 1.5 ns (5 ns -3.5 ns).
Now, consider a situation where PATH#2 requires 2.7ns, instead of 1ns. Available time = 2.5 ns. Required time by PATH# 2 is 2.7 ns. We have a timing violation at the clock edge at t = 5 ns, with negative slack of 0.2 ns.

In case of positive level sensitive latch, data appears at the input of PATH#2 (output of LATCH1) at time = 2ns, because of the transparency nature of the latch (in case of negative edge flop data appeared at input of PATH#2 at t = 2.5ns). Assuming PATH#2 requires 2.7 ns, data arrival time at second positive clock edge is 2ns + 2.7ns = 4.7ns. Second positive clock edge occurs at t = 5 ns. We have a positive slack of 0.3 ns (5 – 4.7)ns.

As seen from the waveform comparison in FIG#2B, having negative edge triggered flop in place of positive level sensitive latch, the positive slack 0.5 ns (between t=2ns and t=2.5ns) available in PATH#1 is wasted by edge based behavior of flop (look at the GREEN colored PATH#2 in Fig 2B). The level sensitive nature of the latch, makes use of the prior cycle’s positive slack of 0.5ns in the current cycle.

The above concept is basis for TimeBorrowing principle using latches.

Understanding “Time Borrowing” in real designs

In FIG # 3, there are 4 positive level sensitive latches. LATCH #1 and LATCH #3 are controlled by CLK1, LATCH #2 and LATCH #4 are controlled by CLK2. Relationship between CLK1 and CLK2 is as shown in fig.

PATH #1, PATH#2, PATH#3 and PATH #4 represent combinatorial cloud with delays indicated in the fig. For simplicity, all 4 latches assumed to have 0 ns propagation delay, 0 ns setup and hold time.

A] SCENARIO 1:

In SCENARIO 1, PATH # 1 delay = 6 ns; PATH # 2 delay = 1 ns; PATH # 3 delay = 8 ns; PATH # 4 delay = 1 ns.

LATCH#1 is opened at point (1) at CLK1. Data from LATCH#1 through PATH#1 is available 6 ns later from launch point (1) at CLK1. Implies, valid data is available 1 ns (6 ns – 5 ns) after LATCH#2 is opened at point (2) of CLK2. Since PATH#2 has enough slack, PATH#1 was able to borrow 1 ns from PATH #2.

Similarly, PATH#3 with delay of 8 ns, borrowed 3 ns from succeeding stage. In either case, slack in prior stages (between pts (2) and (3), pts (4) and (5) ) is NOT used fully, as PATH #2 and PATH # 4 has delays less than half-a-clk period.

B] SCENARIO 2:

In SCENARIO 2, PATH # 1 delay = 6 ns; PATH # 2 delay = 1 ns; PATH # 3 delay = 2 ns; PATH # 4 delay = 1 ns.

In scenario 2, PATH#3 has delay of 2 ns – which is less than the half-cycle period of CLK ( 5ns ). Implies, NO BORROWING required from succeeding stage. This also means, had there been a FF instead of LATCH#4, it wouldn’t have made a difference.

NOTE: FF in place of LATCH#4, stops time borrowing we have seen from LATCH#1 -> LATCH# 3, at launch edge of FF.

C] SCENARIO 3:

In SCENARIO 2, PATH # 1 delay = 6 ns; PATH # 2 delay = 7 ns; PATH # 3 delay = 5 ns; PATH # 4 delay = 3 ns.

In SCENARIO 3, there is 100% time borrowing – it’s simply a huge combinatorial block between LATCH at start and the last LATCH, where each stage “AUTOMATICALLY” borrowing time and rippling through to the final outputs. LATCHES in between are just transparent delay elements.

In this scenario, if the circuit above had 4 FFs instead of 4 LATCHES, the 4 stage FF based pipeline would have consumed 28 NS. Using Time Borrowing principle, pipe delay has been reduced to 20 ns.

If the principle of Time Borrowing is understood, it’s easy to figure out the max permissible “BORROW TIME”.

What is Time Stealing ?

Time Stealing can be deployed when a specific logic partition needs additional time. The additional time required, should be deterministic at the time of the design. Then one can adjust the clock phase of capture FF (FF2), so that data arrival time at the capture edge of FF2, will not violate setup.

In FIG # 4, PATH # 1 stole a time of 4 NS (CLK2 offset, not the time borrowed by PATH # 1) from PATH #2 ‘s available time of 10 NS, leaving PATH # 2 with 6 NS. Since PATH # 2 needs only 1 NS, there is enough time for FF3 to capture data at 20 NS.

Pipeline stages, with 10 NS < delays < 14 NS can have their FFs CLK pins hooked to CLK2.

Difference between Time Stealing and Time Borrowing:

Time Stealing will not AUTOMATICALLY use the left over slack from previous stage. It is forced to steal from the succeeding stage, and leave less time to the succeeding stage. It is designer’s responsibility to make sure the succeeding stage delay is < CLK_PERIOD – PHASE_SHIFT.

In Time Borrowing, latch transparency helps in making use of the slack left in previous cycle ripple through the pipeautomatically, without interfering with clock phases.

6 comments:

Unknown1 March 2020 at 05:36
Very good explanation.. But image quality is too poor.. Please upload the image once again please..
Anonymous2 July 2020 at 02:09
Really helpful, Thanks!
Ankit Berde15 July 2020 at 22:33
Hi, your explanation is correct. Data launched at N posedge of ff1 should be checked for setup after 1.5T, at a negedge of latch, though the tool by default considers the immediate negedge of latch (after 0.5T) for setup check. And even practically, data launched at N+1 posedge of flop will disturb the previous Nth data at the negedge of latch. How to deal with this?
Mike8 March 2021 at 13:40
This article is copied from http://ohotspot.blogspot.com/2012/09/time-borrowing-and-time-stealing.html. This is a Copyright Infringement. No doubt picture quality is so poor. FOR CLEAR PICTURES PLEASE GO TO http://ohotspot.blogspot.com/2012/09/time-borrowing-and-time-stealing.html
Mike8 March 2021 at 13:41
Original article is published in 2012 by http://ohotspot.blogspot.com/2012/09/time-borrowing-and-time-stealing.html. These folks shamelessly copied

Mantra VLSI

Pages

Thursday, 3 July 2014

Time borrowing and Time stealing

Time Borrowing

6 comments:

Learn more

Read More..