1.1ASYNCHRONOUS INTERFACE – CDCGUIDELINE1.1.1INTRODUCTIONASIC design is becoming more complex due to more and more IP integrated in a chip, and data is frequently transferred from one clock domain to another domain. Clock domain crossing issue becomes more and more important vector in a multi-clock, stable work chip.This document mainly introduce below topics:a. Where will occur CDC;b. What problem will occur due to CDC issue;c. How to design CDC logic correctly.1.1.2APPLICATION AREAIn a multi-clock design, clock domain crossing occurs whenever data is transferred from a flop driven by one clock to a flop driven by another clock. As it is shown in Figure 1-1,Figure 1-1 Clock domain crossing*Note: definition of terminology:Source clock: Clock A in figure 1-1 is defined as source clock;Destination clock: Clock B in figure 1-1 is defined as destination clock;Source clock domain: All the logic design whose reference clock is Clock A, like flip-flop FA in figure 1-1;Destination clock domain: All the logic design whose reference clock is Clock B, like flip-flop FB in figure 1-1;1.1.3PROBLEM DEFINITIONMeta-stability, glitch, multi-fanout and re-convergence may occur in an asynchronous design, they may cause design entering an un-anticipant state and result in function error.1.1.3.1 Meta-stabilitySignal propagate cross asynchronous domains may create meta-stability if setup or hold time violation occurred, shown as figure 1-2.Figure 1-2 Meta-stable issue1.1.3.2 GlitchLogic in the synchronization path result in glitches due to propagation delays, these glitches may get latched and result in false pulses at the synchronizer output, shown as figure 1-3.*Note:synchronization path: below path can be defined as synchronization path,1. Path from source clock domain to destination clock domain, such as the path from Qof DA1/DA2 to the D of DB1 in figure 1-3;2. Path from Q to D of two flip-flops in destination clock domain, such as the path from Qof DB1 to D of DB2 in figure 1-3.Figure 1-3 Glitch issue1.1.3.3 Multi-fanoutMulti-fanout on the synchronization path may result in different value at the synchronizer output due to different propagation delay, shown as figure 1-4.Figure 1-4 Multi-fanout issue1.1.3.4 Re-convergence(信号重汇聚)Re-convergence signals after synchronization may result in functional error, as it is shown as figure 1-5.Figure 1-5 Re-convergence issueRe-convergence logic is a special CDC issue which need logic designer pay more attention to, because all of CDC issue except Re-convergence, like meta-stable, multi-fanout and glitch, could be detected by CDC checking tool(for example cadence’s CONFORMAL). However, some of Re-convergence issue is so complex that it is hard to check by tools, take deep re-convergence issue for example, which is shown as figure 1-6.Figure 1-6, Deep re-convergence issueCLK_A: rhflrhflrhflrhflrhflrhflrhflrhflrCLK_B: llrflrflrflrflrflrflrflrflrflrflrfD0_A: rhhhflllllllllllllllllllllllllllllD0_B: rhhhflllllllllllllllllllllllllllllD0_C: rhhhflllllllllllllllllllllllllllllD0_D: rhhhflllllllllllllllllllllllllllllD1_A: llllrhhhflllllllllllllllllllllllllD1_B: llllrhhhflllllllllllllllllllllllllD1_C: lllllrhhhfllllllllllllllllllllllllD1_D: lllllrhhhfllllllllllllllllllllllllD2_A: llllllllrhhfllllllllllllllllllllllD2_B: llllllllrhhfllllllllllllllllllllllD2_C: lllllllllllrhhflllllllllllllllllllD2_D: lllllllllllrhhflllllllllllllllllllD3_E_syn: lllllllllllrhhflllllllllllllllllllD3_F_syn: llllllllllllllrhhfllllllllllllllllD3_G_syn: llllllllllllll llllll llllllllllllllFigure 1-7 Timing diagram of deep re-convergenceFrom the figure 1-6, it can find that the circuit has two levels re-convergence, the first level re-convergence gate is flip-flop marked with blue color, and the second is flip-flop marked with red color. For cadence CDC tool, it would identify first level re-convergence and report them, but it can not identifysecond level re-convergence. Whichever level re-convergence occurred, it may result in functional error, as it is shown as figure 1-7. It is tool limitation. 1.1.4IMPLEMENTATIONThis chapter introduces several logic schemes to design clock domain crossing logic correctly. These schemes could keep data transferring stably between different clock domains. Nearly all the clock domain crossing issue could be avoided if designer follow the design scheme introduced in the chapter.1.1.4.1 Two synchronizer schemeFor one bit signal cross different clock domain, a general solution is using two flip-flops to sync two cycles in destination clock domain, but its pre-condition is the signal from source clock domain should hold long enough for destination clock to sample, in other words, the frequency of clock A should be less than clock B. Its circuit can be shown as figure 1-8:Figure 1-8 two flip-flop syncWhen use two synchronizer scheme, designer should keep no combinational cell (except inverter and buffer) in CDC path(*note), otherwise, glitch issue shown in figure 1-3 and multi-fanout issue shown in figure 1-4 may occurs. Note:CDC path: the path from Q of flip-flop in source clock domain to the D of flip-flop in destination clock domain, for example, the path from FA/Q to FB1/D in the figure 1-8.As a example of two synchronizer application, figure 1-9/1-10 take glitch and multi-fanout issue for example, shows how to design these logic.Figure 1-9 solution of glitch issueFigure 1-10 solution of multi-fanout issue1.1.4.2 MUX structure sync schemeFor multi-bit signals cross among different clock domain, MUX scheme(shown as figure 1-11) can be used to keep clock domain crossing correctly.MUX scheme adapts to the logic design that, a group of data would transfer form one clock domain to another, and there is a marked signal who indicates data stability when it assert, like Sel in the figure 1-11. Data from source clock keep unchanged until its marked signal detected and two filp-flops synced in destination clock domain, see figure 1-12 for detail.Figure 1-11 MUX structure schemeClock A: lrhflrhflrhflrhflrhflrhflrhflrhflrh Clock B: lrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrData : *ddddddddddddddxddddddddddddddddddxSel : lllrhhhfllllllllllllrhhhfllllllllll Sel_a : lllllrhhhflllllllllllrhhhflllllllll Sel_b : llllllllrhhhflllllllllllrhhflllllllDmux_o:“”””””””xdddddddddddddddxddddddddddDout :“”””””””””*ddddddddddddddd*ddddddddFigure 1-12 MUX scheme timing diagram1.1.4.3 Handshake schemeHandshake is a design based on a protocol that, source clock domain assert request to destination clock domain and will not stop request until it gets the grant from destination clock domain; Destination clock domain receives the request, and continuously asserted grant till it finds the request from source clock domain is de-asserted. Handshake design can be a simple feedback synchronizer, or full handshake or half handshake.1.1.4.3.1 Feedback synchronizerFigure 1-13 shows a logic implementation circuit of feedback synchronizer. For this circuit, there is no limitation between the frequency of clock A and clock B, frequency of clock A could be less than clock B, or more than clock B. However, this circuit adopt to the case that, signal A is consisted of 1T pulse signal and the time from current pulse de-asserted to next pulse asserted is more than (2 clock A + 2 clock B) time.5aa5 a55a 5aa5 a55a 5aa5 a55aFigure 1-13 full feedback synchronizer logicA timing diagram in figure 1-14 shows the work process of full feedback synchronizer circuit.CLKA : lcccccccccccccccccccccccccccccccccc A: lrflllllllrflllllllllllllllllllllll FA0/Q: llrflllllllrfllllllllllllllllllllll FA1/D:llrhhhhhhhfrhhhhhhhflllllllllllllllFA1/Q: lllrhhhhhhhfrhhhhhhhfllllllllllllll FA2/Q: llllllllrhhhhfllrhhhhflllllllllllll FA3/Q: lllllllllrhhhhfllrhhhhfllllllllllll CLKB :lrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrf FB1/Q:lllllrhhhhhflrhhhhhflllllllllllllll FB2/Q:lllllllrhhhhhflrhhhhhflllllllllllll FB3/Q:lllllllllrhhhhhflrhhhhhflllllllllll B:lllllllllrhhhhhflrhhhhhflllllllllllFigure 1-14 full feedback synchronizer timingFigure 1-15 shows a simple feedback synchronizer logic which may usuallyuse in logic design, and figure 1-16 is its timing diagram.Figure 1-15 Simple feedback synchronizer logicFigure 1-16 Simple timing diagram of feedback synchronizer1.1.4.3.2 REQ-GNT schemeFigure 1-17 shows the handshake implementation principle. Request and grant transfer between Tx and Rx need to double synchronization, and data need to hold until Tx receive grant disable.Figure 1-17 handshake implementationFull handshake flow(see figure 1-18):1. Tx asserts request signal (REQ) to Rx;2. When Rx receives request (RE Q _B1), it asserts grant signal (GNT) to Tx;3. When Tx detect grant signal (GNT_A1), it dis-asserts its request;4. Finally, Rx detect request dis-assert, it also dis-assert its grant;5. When Tx detect grant dis-assert, this transfer finish.Figure 1-18 Full handshake timing diagramFull handshake will occupy five clock cycles in Tx domain and six clock cycles in Rx clock domain. Full handshake is a safest mode to implement data transfer for Tx and Rx always know status each other.Half-handshake is same as full handshake except Tx/Rx dis-assert their request/grant signals before receiving response.1.1.4.4 Asynchronous FIFO synchronizer schemeWhen data burst transfer between two clock domain, FIFO synchronizer is a common solution.Figure 1-19 show the implementation diagram of FIFO synchronizer, it include dual-port RAM, write/read control module and two flop-flops synchronizer.In the implementation of FIFO synchronizer, the key issue is FIFO status indicator signal generation, they are full, half full, empty, half empty. Generally, write/read pointer use gray code, then they can use two flip-flops synchronizer to sync directly.wclk wfullwinc wdatarclkremptyrincrdata Figure 1-19 Dual-clock asynchronous FIFO2n-Queue FIFO is a typical example of asynchronous FIFO, who is a small FIFO whose depth is 2n, any value can be assigned to n according to design need. Theoretically, 2n-Queue FIFO can stably transfer signal in a logic whose frequency of source clock is n times as fast as destination clock.Detail verilog code implementation of 2n-Queue FIFO can be seen Appendix A (2n-Queue FIFO implementation code).1.1.5LAYOUT GUIDENo1.1.6LIMITATIONNo1.1.7APPENDIX A2n-Queue FIFO implementation code:module USBD_ASFF2(RST_, ICLK, OCLK, DIN, FULL, OUTE, OUTD) ; input RST_ ;input ICLK ; //Write domain clockinput OCLK ; //Read domain clockinput DIN ; //write active indicatoroutput FULL;output OUTE,OUTD;parameter PTR_1BIT=1;reg[PTR_1BIT:0] WR_PTR,RD_PTR ; ///read/write address, gray code reg FULL ;wire [PTR_1BIT:0] Wr_Ptr_Nxt ;wire [PTR_1BIT:0] Rd_Ptr_Nxt ;wire [PTR_1BIT:0] Rd_Ptr_Wclk ; //readwire [PTR_1BIT:0] Wr_Ptr_Rclk ;reg EMPTY;/************************* define gray count************************/parameter gray0 = 2'b00,gray1 = 2'b01,gray2 = 2'b11,gray3 = 2'b10;/************************************************ translation of signals ==> RD connects to the EMPTY for***********************************************/wire RD = !EMPTY;/************************************************ Write point* - when DIN is active, go to next gray count***********************************************/// always@(WR_PTR or DIN) Wr_Ptr_Nxt <= #1 gray_inc(WR_PTR) ; // chad;assign Wr_Ptr_Nxt = gray_inc(WR_PTR);always@(posedge ICLK or negedge RST_ ) beginif( !RST_ ) beginWR_PTR <= 2'b0 ;end else if( DIN) beginWR_PTR <= Wr_Ptr_Nxt ;endend/************************************************ Read point* - when RD is active, go to next gray count***********************************************/// always@(RD_PTR or RD ) Rd_Ptr_Nxt <= #1 gray_inc(RD_PTR) ;// chad;assign Rd_Ptr_Nxt = gray_inc(RD_PTR);always@(posedge OCLK or negedge RST_ ) beginif( !RST_ ) beginRD_PTR <= 2'b0 ;end else if( RD ) beginRD_PTR <= Rd_Ptr_Nxt ;endend/******************************************* Full condition* - sync read pointer to write clock domain******************************************/USBD_CDCS DNT_FL1(.Q(Rd_Ptr_Wclk[1]),.D(RD_PTR[1]),.CK(ICLK),.R(RST_)); USBD_CDCS DNT_FL0(.Q(Rd_Ptr_Wclk[0]),.D(RD_PTR[0]),.CK(ICLK),.R(RST_));wire NFull= DIN &(Wr_Ptr_Nxt[0]==(~Rd_Ptr_Wclk[0]) & Wr_Ptr_Nxt[1]==(~Rd_Ptr_Wclk[1])); wire FullX = NFull |(WR_PTR[0]==(~Rd_Ptr_Wclk[0]) & WR_PTR[1]==(~Rd_Ptr_Wclk[1]));always@(posedge ICLK or negedge RST_ )if( !RST_ ) FULL <= 1'b0 ;else FULL <= FullX ;/************************************************ EMPTY* - sync write pointer to read clock domian***********************************************/USBD_CDCS DNT_EP1(.Q(Wr_Ptr_Rclk[1]),.D(WR_PTR[1]),.CK(OCLK),.R(RST_)); USBD_CDCS DNT_EP0(.Q(Wr_Ptr_Rclk[0]),.D(WR_PTR[0]),.CK(OCLK),.R(RST_));wire NEMPTY = (Rd_Ptr_Nxt == Wr_Ptr_Rclk) & RD ; wire EMPTYX = (RD_PTR == Wr_Ptr_Rclk) ;wire EMPTYD = NEMPTY | EMPTYX ;always@( posedge OCLK or negedge RST_ ) begin if( !RST_ ) EMPTY <= 1'b1 ;else EMPTY <= EMPTYD ;endwire OUTD = !EMPTYD;wire OUTE = !EMPTY;function[PTR_1BIT:0] gray_inc ;input[PTR_1BIT:0] gray ;begincase(gray) // synopsys parallel_case full_casegray0 : gray_inc = gray1 ;gray1 : gray_inc = gray2 ;gray2 : gray_inc = gray3 ;gray3 : gray_inc = gray0 ;default:gray_inc = 2'bxx;endcaseendendfunctionendmodulemodule USBD_CDCS(R, CK, D, Q);input R;input CK;input D;output Q;reg Q;reg QX;always @ (negedge R or negedge CK)beginif (~R)QX <= 1'b0;elseQX <= D;endalways @ (negedge R or posedge CK) beginif (~R)Q <= 1'b0;elseQ <= QX;endendmodule。