DesignCon 2011Signal and Power Integrity for a 1600 Mbps DDR3 PHY in Wirebond Package June Feng, Rambus Inc.[Email: jfeng@]Ralf Schmitt, Rambus Inc.Hai Lan, Rambus Inc.Yi Lu, Rambus Inc.AbstractA DDR3 interface for a data rate of 1600MHz using a wirebond package and a low-cost system environment typical for consumer electronics products was implemented. In this environment crosstalk and supply noise are serious challenges and have to be carefully optimized to meet the data rate target. We are presenting the signal and power integrity analysis used to optimize the interface design and guarantee reliable system operation at the performance target under high-volume manufacturing conditions. The resulting DDR3 PHY was implemented in a test chip and achieves reliable memory operations at 1600MHz and beyond.Authors BiographyJune Feng received her MS from University of California at Davis, and BS from Beijing University in China. From 1998 to 2000, she was with Amkor Technology, Chandler, AZ. She was responsible for BGA package substrate modeling and design and PCB characterization. In 2000, she joined Rambus Inc and is currently a senior member of technical staff. She is in charge of performing detailed analysis, modeling, design and characterization in a variety of areas including high-speed, low cost PCB layout and device packaging. Her interests include high-speed interconnects modeling, channel VT budget simulation, power delivery network modeling and high-frequency measurements.Ralf Schmitt received his Ph.D. in Electrical Engineering from the Technical University of Berlin, Germany. Since 2002, he is with Rambus Inc, Los Altos, California, where he is a Senior Manager leading the SI/PI group, responsible for designing, modeling, and implementing Rambus multi-gigahertz signaling technologies. His professional interests include signal integrity, power integrity, clock distribution, and high-speed signaling technologies.Hai Lan is a Senior Member of Technical Staff at Rambus Inc., where he has been working on on-chip power integrity and jitter analysis for multi-gigabit interfaces. He received his Ph.D. in Electrical Engineering from Stanford University, M.S. in Electrical and Computer Engineering from Oregon State University, and B.S. in Electronic Engineering from Tsinghua University in 2006, 2001, and 1999, respectively. His professional interests include design, modeling, and simulation for mixed-signal integrated circuits, substrate noise coupling, power and signal integrity, and high-speed interconnects.Yi Lu is a senior systems engineer at Rambus Inc. He received the B.S. degree in electrical engineer and computer science from U.C. Berkeley in 2002 with honors. In 2004, he received the M.S. degree in electrical engineering from UCLA, where he designed and fabricated a 3D MEMS microdisk optical switch. Since joining Rambus in 2006, he has been a systems engineer designing various memory interfaces including XDR1/2 and DDR2/3.IntroductionThe memory bandwidth requirement of multimedia consumer electronic products like HDTV systems is constantly increasing, driven by the adoption of advanced features like frame rate up-scaling and 3D projection. At the same time, consumer electronic products remain very cost sensitive, targeting low-cost package and system environment to reduce the overall bill of materials. This creates the need for high-speed memory interfaces implemented in low-cost system environments. In order to address this need we have designed a 1600 Mbps DDR3 memory interface PHY in a wirebond package targeting a low-cost system environment with a 4-layer PCB stack-up typical for cost-optimized consumer electronic products.Designing a DDR3 memory interface for such a high data rate is not trivial. The memory device itself requires more than 40% of the bit time for internal timing, leaving little more than half of the bit time for all channel and PHY timing errors. Meeting these requirements in a wirebond package, using a 4-layer PCB stackup, is a serious challenge. Bond wire coupling in the package and coupling in the PCB routing, which only allows a microstrip routing in a 4-layer stackup, lead to increased crosstalk in the interface system. Additionally, the bond wire inductance leads to higher supply noise, causing power supply induced jitter (PSIJ) as well as simultaneous switching output (SSO) noise in the interface system. The interface design therefore requires a careful optimization of signal and power integrity in the entire system, from the controller PHY to the DRAM component pin.In this paper we will present the signal and power integrity analysis used to optimize the interface design and assuring reliable operation at the target data rate in a low-cost system environment. First we will present the analysis of power supply induced jitter. For this, we analyzed the supply noise spectrum generated in the system and the sensitivity of the system to this noise. With this analysis we were able to predict and optimize the jitter in the PHY, making sure the PHY will meet the tight jitter requirements of the DRAM device.Next we analyze the channel margin loss due to ISI, crosstalk, and SSO. Special emphasize is given to crosstalk and SSO noise, since these are the major contribution to margin loss in the channel timing. A careful optimization of PHY floor plan, package design, and PCB routing was implemented to minimize the margin loss due to crosstalk and SSO.Finally, the system timing was verified for the full range of process variations of the controller PHY and channel parameter variations for low-cost system environments typical for cost-sensitive consumer electronics products. This analysis provides the confidence that the final system will meet the target performance with high yield under High-Volume Manufacturing conditions.The DDR3 PHY was implemented on a test chip and achieves reliable memory operation for a data rate of 1866 Mbps using DDR3 memories of a 1600 Mbps speed grade.I.System Environment and SI/PI ChallengesThe design target of the DDR3 PHY described in this paper are high-performance consumer electronic systems with a bandwidth requirement of up to 6.4GB/s in a low-cost system environment. This bandwidth is achieved using a x32 PHY running at a data rate of 1600Mbps.In order to support a low system cost implementation, the PHY was designed in a 4-layer wirebond package. Such a package is significantly less expensive than flip-chip packages, however it poses challenges for signal and power integrity especially at higher data rates. Additionally, the PHY is designed for a PCB with only 4 layers, further reducing the final system cost. Finally, silicon area, pad count, and decoupling requirements are carefully optimized to minimize total system cost for the memory subsystem.The goal for this design effort was to achieve a PHY implementation that will reliable achieve the targeted data rate under high volume manufacturing (HVM) conditions using any DDR3 DRAM meeting the JEDEC spec at the targeted data rate. In order to meet this goal it was not enough to design and analyze the PHY alone. Instead, the PHY had to be analyzed in the targeted system environment, optimizing system and PHY implementation concurrently. Closing the voltage and timing budget on the system level resulted in PHY design requirements necessary to achieve the targeted interface system performance in the final implementation using a low-cost system environment.Creating a cost-efficient high-speed memory interface requires careful analysis of power and system integrity [1]. The bond wires in a wirebond package contribute significantly to the inductance of the power distribution network (PDN) of the interface. Inductance in the supply path causes supply noise when the current dissipation of the PHY is changing. This supply noise causes voltage distortions and timing variations, generating timing jitter on interface signals as well as internal PHY signals. The contribution of bond wires to the supply inductance can be reduced by adding additional supply pads to the design, but this would increase the PHY width and ultimately increase the PHY cost and is therefore not advisable. Instead, the number and placement of supply bond wires has to be carefully optimized to achieve the necessary supply noise targets in the design.Bond wires also lead to crosstalk between different signal lines. This is a severe signal integrity challenge especially at higher data rate as targeted for this design. Routing the signaling channel on a 4-layer PCB only allows for microstrips instead of striplines, which adds further crosstalk to the signaling channel. As a result, crosstalk is a major signal integrity challenge for the implementation of a high-speed DDR3 interface in this low-cost system environment, and the PHY design has to meet tight timing and voltage requirements to allow for the distortions added in the package and the PCB routing channel.II.Power Integrity AnalysisII.1.Power Integrity ChallengesPower Integrity is an important design consideration for high-speed interfaces. Supply noise in the PHY causes waveform distortions and delay variations, resulting in jitter, on interface signals and internal signals inside the PHY. Designing a high-speed interface system in a low-cost 4-layer wirebond package is particular challenging, since the supply inductance of such packages is comparably high and the bond wires allow rail-to-rail coupling between noisy digital supplies and very noise sensitive analog supplies. In order to achieve a high data rate it is therefore necessary to carefully analyze supply noise and its impact on the PHY circuits and interface characteristics.In general, there are two Power Integrity challenges in the design of high-speed interface systems that are best analyzed separately.The first power integrity challenge in high-speed interfaces is the distortion of signal quality and timing of the interface signals during Simultaneous Switching Outputs (SSO) events. SSO noise is a common problem in interface systems using single-ended signaling like DDR3 and it is discussed in detail in previous works [5]. Since the impact of SSO is strongly influenced by the interaction of the interface PHY with the external channel implementation in package and PCB, we will discuss SSO impact as part of the channel analysis. It is analyzed using a signal and power integrity co-simulation model described in a later chapter.The second challenge is the supply noise inside the PHY cause by the circuit activity of the PHY itself. This activity generates noise on all supply rails inside the PHY, including sensitive analog supplies, due to self-induced current changes or noise coupling from other system elements. This supply noise affects the performance of circuits inside the PHY, and in particular, it creates jitter in the timing circuits controlling the internal and channel timing of the interface system. The impact of power supply induced jitter (PSIJ) on the system margin of the interface has to be carefully analyzed and optimized to ensure reliable operation at the target data rate.In a DDR3 interfaces the timing on the DQ data bus is most critical, since these signals are transmitted at the full (double) data rate. The critical timing parameters on this bus are defined relative to data strobe signals, DQS. As a result, jitter that is shared between the DQ data signals and the DQS strobe signal is not affecting the system timing margin. This is particularly helpful during WRITE access, when both the DQ and DQS signals are generated in the PHY. Only PSIJ components due to DQ and DQS mismatches have to be taken into consideration for this operation. This mismatch can be minimized with a careful design of the timing paths inside the PHY.The clock signal, CK, generated by the PHY acts as a timing reference source for the internal DRAM timing and has to meet various jitter requirements defined in the DRAM specification. It also acts as timing reference for the control and address signals on the CA bus, but timing requirements on this bus are less critical since the CA bus only operates at half the data rate of the DQ bus. Meeting the jitter requirements of the DRAM specification, however, is not necessarily sufficient to operate the DDR3 interface at highdata rates. Jitter on the CK signal increases the output hold time parameter tQH of the DRAM device, reducing system margin during READ operations. It is therefore advisable to keep jitter on the CK signal very low, if possible even lower than required by the DRAM specification, to gain system margin during READ access. In the following chapter we will present a detailed PSIJ analysis for the CK signal path inside the PHY. II.2.Power Supply Model and Simulation ResultsThe prediction of power supply noise plays a vitally important role in defining the voltage and timing budget for this low-cost wirebond DDR3 PHY design targeting at 1600Mbps up to 1866Mbps. Besides the common concern on the dynamic range of the supply noise, it is also crucial to understand the supply noise impact on the system timing jitter, or, power supply noise induced jitter (PSIJ). Previously, a systematic approach for predicting PSIJ by combining the supply noise spectrum and the clocking circuit jitter sensitivity has been developed [2]. The methodology flow is shown in Figure 1. In order to estimate the supply noise impact on jitter, this method seeks to obtain the jitter spectrum, J(f), which in turn can be obtained by multiplying the supply noise spectrum, V(f), and jitter sensitivity profile, S(f), all in frequency domain. The jitter sensitivity profile is solely determined by the circuit realization and independent of the circuit activity. On the other side, the supply noise spectrum is determined by both the power delivery network and the current profile, a variable depending on different circuit activity and data pattern. The following sections will first describe the supply noise analysis to obtain V(f) and then discuss the jitter sensitivity results of S(f) so that the final prediction of PSIJ in the DDR3 system can be evaluated. Four supplies are used in the implemented DDR3 test system, including VDDP, VDDA, VDDIO, and VDDR. The PLL is supplied by the dedicated VDDP supply. The clock distribution circuits operate on VDDA. The I/O circuits use VDDIO. The rest of the circuits, mainly the digital logic circuits for the data path, operate on VDDR. Since the entire clocking circuits are on VDDP, VDDA, and VDDR, it is expected that the main jitter contribution comes from the noise on these three supplies. The following discussions will focus on these three supplies, which are highly jitter sensitive.Figure 1. Methodology for predicting supply noise impact on jitter (PSIJ) [2].VOff-Chip PDN On-Chip PDN Current ProfileFigure 2. Power supply model for pre-layout supply noise simulation.Figure 2 shows the power supply model topology used for the supply noise analysis. As shown in the figure, three components are required including off-chip PDN, on-chip PDN, and supply current profile. The off-chip PDN is modeled by passive RLC components resulting from voltage regulator, PCB, and package parasitics as well as low and medium frequency decoupling capacitors. The on-chip PDN represents the physical power grids from die pads to rest of the chip, typically includes RC parasitics and very importantly, on-chip decaps. The third component is the current profile, which is extracted from the circuit simulation and applied as the stimulus to the PDN. In order to evaluate both the worst-case switching noise and the steady state supply noise, it is desired to have the current profile extracted under the DDR3 PHY operating condition for bus turn-around. Figure 3 shows the data waveform under a WRITE-NOP-READ bus turn-around condition as well as the corresponding current profiles for VDDR, VDDA, and VDDP supplies. As can be seen from the figure, the VDDA and VDDP current profiles are independent of the operation modes while average VDDR current shifts significantly between the active WRITE/READ mode and the NOP mode.WRITEREAD NOP avg=21mApeak=32mA avg=163mApeak=638mA avg=102mApeak=599mA avg=165mA peak=655mAavg=102mA peak=596mAavg=21mA peak=32mAavg=21mA peak=32mA avg=115mA peak=622mA avg=102mA peak=597mA 80 back-to-back PRBS, BL=4~300ns 80 back-to-back PRBS, BL=4~300nsFigure 3: Supply current profiles for bus turn-around, representing 300ns of continuous WRITE and 300nsof continuous READ with 150ns NOP in between.vddr 10.2 mV pp 10.4 mV pp 18.8 mV pp ~20mV DC shift 4.4 mV pp 18.7 mV pp4.2 mV ppDue to Standby/ActivePower Mode TransitionvddavddpWRITEREAD NOPFigure 4: Overview of VDDR, VDDA, and VDDP supply noise for the DDR test system.The power supply noise analysis is performed by applying the above current profiles to the power supply model shown in Figure 2. The overview of the supply noise simulation results are summarized in Figure 4. As the dedicated supply to PLL alone, the VDDP noise , independent of the activity mode, is around 5mVpp and. Comparing to the VDDP noise, the VDDA noise is significantly higher at around 19mVpp due to the strong switching activity generated by the clock buffers in the clock distribution circuits. The VDDA noise is also independent of activity mode and it remains stable as long as the clock distribution stays on. The VDDR noise exhibits strong dependence on mode of operation. The VDDR supply experiences significant DC IR shift between the active WRITE/READ mode and the non-active NOP mode. What matter the most are the switching noise during the transitions between the active and non-active operation modes and the steady state noise during the normal active modes for continuous WRITE or READ operation. The former usually leads to the worst-case supply voltage collapse and the latter determines how much net jitter impact it has on the timing budget of the system. As shown by the figure, the bus turn-around switching noise is as high as 25mVpp and the steady state noise is around 10mVpp. As will be discussed shortly, VDDP has the highest jitter sensitivity followed by VDDA and VDDR while its supply noise is relatively small. The net jitter contributions from the supply noise on each of these domains are discussed in the following sections.Figure 5-7 shows the details of the simulated supply noise in time-domain and frequency-domain under WRITE and READ conditions. The VDDR simulation results are shown in Figure 5, where the time-domain results indicate that the peak-to-peak noise is around 10mV. The noise spectrum results show that the major component is at the 1066MHz data rate, with sub-harmonic at 533MHz, and its higher-order harmonics. The obtained noise spectrum will be used to compute the PSIJ impact. Similarly, the VDDA simulation results in Figure 6 show that the swing is around 19mV with major frequency components are at 533MHz and 1066MHz. The VDDP noise simulation results are shown in Figure 7. The peak-to-peak noise is around 5mV and the frequency components are the PLL reference clock, its output clock and their higher-order harmonics.10.2 mVpp 8.2 mVpp10.4 mVpp(a) (b)Data rate @1066MHz TX/RX CLK@533MHzLF/MF noise Data rate @1066MHzTX/RX CLK@533MHz LF/MF noise(c) (d) Figure 5: Simulation results of VDDR supply noise. (a)Supply noise during WRITE, (b)Supply noise during READ, (c)Spectrum of supply noise during WRITE, and (d)Spectrum of supply noise during READ. 18.8 mVpp 18.7 mVpp(a) (b) TX/RX CLK@533MHz LF/MF noise Data rate@1066MHz HF Data rate @1066MHzTX/RX CLK @533MHz LF/MF noiseHF(c) (d)Figure 6. Simulation results of VDDA supply noise. (a)Supply noise during WRITE, (b)Supply noise during READ, (c)Spectrum of supply noise during WRITE, and (d)Spectrum of supply noise duringREAD.4.4 mVpp 4.2 mVpp(a) (b) half CK freq@266MHzLF/MF noise VCO freq harmonicsREFCLK @133MHzCK@533MHzhalf CK freq@266MHzLF/MF noiseVCO freqharmonicsREFCLK@133MHzCK@533MHz (c) (d)Figure 7. Simulation results of VDDP supply noise. (a)Supply noise during WRITE, (b)Supply noise during READ, (c)Spectrum of supply noise during WRITE, and (d)Spectrum of supply noise duringREAD.II.3.Jitter Sensitivity and Jitter SpectrumPSIJ sensitivity is defined in frequency domain as the system jitter response to sinusoidal supply noise. Its magnitude profile represents how much jitter is induced by the supply noise with one unit of swing. Its phase profile represents how much phase difference between the supply noise and its induced jitter sequence in the steady state. The PSIJ sensitivity is solely determined by the circuit implementation and is independent of different circuit activity. Therefore, it is a system transfer function for characterizing the jitter impact induced by the supply noise. It serves as a key linking parameter between the supply noise as the stimulus and the jitter impact as the output response. The PSIJ sensitivity extraction methodology has been previously reported in [2]. It is applied here to extract the CK PSIJ sensitivity profiles for the DDR3 test system. The PSIJ sensitivity results for VDDR, VDDA, and VDDP are shown in Figure 8(a)-(c), respectively. As seen in the figure, the PSIJ sensitivity of VDDR and VDDA are relatively lower than that of VDDP. This is expected since the most sensitive block in the entire clocking path is the PLL circuit, which is solely supplied by VDDP. The VDDP sensitivity, as shown in Figure 8(c), exhibits a band-pass behavior with its peak of about 1ps/mV at around 10MHz, which roughly corresponds to the PLL loop bandwidth. The VDDA sensitivity, as shown in Figure 8(b), exhibits a low-pass behavior. This is also expected because VDDA supplies the entire clock distribution circuitry, where the major jitter sensitivity characteristic is due to the clock buffer delay change caused by the supply voltage variation, up to the circuit bandwidth.(a) (b) (c)Figure 8: Simulated DDR3 PSIJ sensitivity profiles for (a)VDDR, (b)VDDA, and (c)VDDP The final PSIJ is derived by combing the PSIJ sensitivity, S(f), and the supply noise spectrum, V(f). Each of these two required components has been addressed as above. One can compute the jitter spectrum J(f) as follows:SfJ(Eq. 1)(ffV))(()The above jitter spectrum is a comprehensive characterization on the supply noise impact on jitter. It reveals magnitude and location of all the jitter components and relates their sources to the supply noise frequency components. It quantifies what frequency components of the supply noise make the most significant contribution to the final jitter impact. Moreover, the jitter spectrum serves as the basis to derive many important aspects of the jitter. For example, the time-domain jitter sequence is computed as follows:SVfifftft j(Eq. 2)ifftJ)]))(f)](([([By applying the above procedure, the jitter induced by the supply noise is derived to estimate the PSIJ contribution to the total jitter in the test DDR3 system. The results are summarized in Figures 9-11. Figure 9 shows the VDDR PSIJ prediction results for continuous WRITE and READ modes. Figure 9(a) is the simulated jitter spectrum due to the VDDR noise during WRITE, showing that the major jitter components are at the CK frequency and the data rate. The corresponding time-domain jitter sequence is computed by using Eq.2 and is shown in Figure 9(c). From the figure, the peak-to-peak jitter is found to be around 3.3ps. Figure 9(e) further shows the histogram of the jitter sequence so that the PSIJ statistical property can be revealed, e.g., distribution form, peak-to-peak value, and deviation, etc. Similarly, the VDDR PSIJ results under READ condition in terms of jitter frequency-domain spectrum, time-domain sequence, and statistical histogram are shown in Figure 9(b)(d)(f). The peak-to-peak jitter for READ is found to be around 2.9ps, which is slightly less than that in WRITE. Figure 10(a)-(f) show the VDDA PSIJ results. Although it is expected that the results are independent on WRITE or READ, the results under these two conditions are shown in the figure as a sanity check. As seen from the figure, the major jitter components are at the CK frequency at 533MHz and its 2nd harmonic at 1066MHz. The peak-to-peak jitter is found to be around 2.4ps for WRITE and 2.2ps for READ. Figure 11(a)-(f) show the VDDP PSIJ results. Although the VDDP has the highest jitter sensitivity, the noise in its supply domain is not as big as those in VDDR or VDDA. As a result, the peak-to-peak jitter due to VDDPnoise is found to be about 1.8ps for WRITE and 2.0ps for READ. The major jitter contribution comes from the PLL reference clock at 133MHz as well as its 2nd and 3rd harmonics.(a) (b)(c) (d)(e) (f)Figure 9: Simulated vddr PSIJ results for continuous WRITE and READ. (a) Spectrum of jitter induced by VDDR noise during WRITE, (b)Spectrum of jitter induced by vddr noise during READ, (c)VDDR PSIJ jitter sequence during WRITE, (d)VDDR PSIJ sequence during READ, (e) VDDR PSIJ histogram duringWRITE, and (f)VDDR PSIJ histogram during READ.(a) (b)(c) (d)(e) (f)Figure 10: Simulated vdda PSIJ results for continuous WRITE and READ. (a) Spectrum of jitter induced by VDDA noise during WRITE, (b)Spectrum of jitter induced by VDDA noise during READ, (c)VDDA PSIJ jitter sequence during WRITE, (d)VDDA PSIJ sequence during READ, (e) VDDA PSIJ histogram during WRITE, and (f)VDDA PSIJ histogram during READ.(a) (b)(c) (d)(e) (f)Figure 11: Simulated vddp PSIJ results for continuous WRITE and READ. (a) Spectrum of jitter induced by VDDA noise during WRITE, (b)Spectrum of jitter induced by VDDA noise during READ, (c)VDDA PSIJ jitter sequence during WRITE, (d)VDDA PSIJ sequence during READ, (e) VDDA PSIJ histogram during WRITE, and (f)VDDA PSIJ histogram during READ.Although the above PSIJ results represent the steady state activity mode for continuous WRITE or READ, it is also important to estimate the worst-case pathological jitter impact. Since neither VDDA nor VDDP noise should be dependent on the activity mode, the major variable in noise source is the VDDR noise. However, the basis to construct such cases is not the supply noise spectrum itself. Instead, the determining factor is the peak jitter sensitivity frequency location. As suggested by the VDDR PSIJ sensitivity shown in Figure 8(a), the peaking occurs at 5~10MHz with about 0.5ps/mV. Therefore, the worst-case VDDR PSIJ should occur when the VDDR supply noise has major components at 5~10MHz. Such case can be emulated by stitching the active mode current profile with the non-active mode current profile with a repetition rate of 5MHz. Figure 12 shows the PSIJ results under such conditions. Figure 12(a) and (b) show the VDDR supply noise waveforms for pathological WRITE-NOP and READ-NOP cases. The DC shift is about 20mV between the active and non-active mode and the peak-to-peak noise is about 25mV. The corresponding PSIJ jitter spectrum are plotted in Figure 12(c) and (d), showing jitter components as high as 10ps in magnitude at 5MHz. The resulting jitter sequences are plotted in Figure 12(e) and (f), where the significant 5MHz jitter component as well as its 10MHz harmonics can be clearly seen. Recall that the peak-to-peak supply noise for active mode is about 19mV during normal continuous WRITE or READ and the resulting PSIJ is about 3.3ps. In the pathological case, the peak-to-peak supply noise is about 25mV, which is 1.3x larger than that in the normal active mode. But the resulting jitter is about 34ps, which is 10x higher than that in the normal active mode. The constructed pathological case is thus useful to estimate the worst-case or upper bound of the PSIJ impact in the system.。