当前位置:文档之家› 时序分析之3优化策略

时序分析之3优化策略

Quartus II Software Design Series : Optimization Optimization Techniques –Timing Optimization© 2009 Altera CorporationTiming Optimizationn General Recommendationsn Analyzing Timing Failuresn Solving Typical Timing Failures© 2009 Altera CorporationGeneral Recommendationsn Clocksn I/On Asynchronous Control Signalsn Many of these suggestions are found in Timing Optimization Advisor & Quartus II Handbook© 2009 Altera CorporationClocksn Optimize for Speed-Apply globally-Apply hierarchically-Apply to specific clock domainn Enable netlist optimizations n Enable physical synthesis© 2009 Altera CorporationGlobal Speed Optimizationn Select speed-Default is balanced-Area-optimized designs may also show speed improvements n May result in increased logic resource usage© 2009 Altera CorporationIndividual Optimizationn Optimization Technique logic option -Use Assignment Editor or Tcl to apply to hierarchical block n Speed Optimization Technique for Clock Domains logic option-Use Assignment Editor or Tcl to apply to clock domain or between clock domains© 2009 Altera CorporationSynthesis Netlist OptimizationsnFurther optimize netlists during synthesis n Types-WYSIWYG primitive resynthesis-Gate-level register retimingCreated/modified nodesnoted in Compilation Report© 2009 Altera CorporationWYSIWYG Primitive Resynthesis n Unmaps 3rd-party atomnetlist back to gates &then remaps to Alteraprimitives-Not intended for use withintegrated synthesisn Considerations-Node names may change-3rd-party synthesis attributesmay be lostl Preserve/keep-Some registers may besynthesized away© 2009 Altera CorporationGate-Level Register Retimingn Moves registers across combinatorial logic to balance timingn Trades between critical & non-critical pathsn Makes changes at gate level© 2009 Altera CorporationPhysical Synthesisn Re-synthesisbased on fitteroutput-Makesincrementalchanges thatimprove resultsfor a givenplacement-Compensates forrouting delaysfrom fitter© 2009 Altera CorporationPhysical Synthesisn Types-Targeting performance:l Combinational logicl Asynchronous signal pipeliningl Register duplicationl Register retiming-Targeting fittingl Physical synthesis for combinatorial logicl Logic to memory mappingn Effort-Trades performance vs. compile time-Normal, extra or fastn New or modified nodes appear in Compilation Report© 2009 Altera CorporationCombinational LogicnSwaps look-up table (LUT) ports within LEs to reduce critical path LEsf gab -criticalc d ef ga e c d b© 2009 Altera CorporationAsynchronous Control Signalsn Improve Recovery & Removal Timingn Make control signal non-global-Project-widel Assignments ÞSettings ÞFitter Settings ÞMore Settings -Individuallyl Set Global Signal logic option to Offn Enable “Automatic asynchronous signal pipelining” option (physical synthesis)© 2009 Altera CorporationAsynchronous Signal PipeliningnAdds pipeline registers to asynchronous clear or load signals in very fast clock domainsaclraclrD Q aclraclr D QDQD QaclraclraclrAdded pipeline stageGlobal clock delay© 2009 Altera CorporationDuplicationn High fan-out registers or combinatorial logic duplicated & placed to reduce delayN© 2009 Altera CorporationRegister Retimingn Uses fewer registers than pipelining-Trade off the delay between timing-critical and non-critical paths -Reduce switching-Does not change logic functionalityD Q10 ns D QD Q 5 nsD Q7 ns D QD Q8 ns© 2009 Altera CorporationTiming Optimizationn General Recommendationsn Analyzing Timing Failuresn Solving Typical Timing Failures© 2009 Altera CorporationAnalyzing Timing FailuresnTypical synchronous path-Registers can be internal or external to FPGAREG1REG2Input Failure External Internal Output Failure Internal External Failure within Clock DomainInternalInternal© 2009 Altera CorporationSlack Equations Setup Slack Equation:(latch edge + T clk2 –T su ) –(launch edge + T clk1+ T co + T data )Hold Slack Equation:(launch edge + T clk1+ T co + T data ) –(latch edge + T clk2 + T h )Data ArrivalData RequiredT su , T h , T co are usually fixed values; Function of siliconData ArrivalData Required© 2009 Altera CorporationSlack Equations (cont.)Setup Slack Equation:(latch edge + T clk2 –T su ) –(launch edge + T clk1+ T co + T data )Hold Slack Equation:(launch edge + T clk1+ T co + T data ) –(latch edge + T clk2 + T h )Data ArrivalData RequiredData ArrivalData RequiredTiming issues show up here© 2009 Altera CorporationTypical Timing Errors n Clock delays (T clk1 or T clk2)-Ripple/gated clocks-Non-global routingn Data path delay (T data ) -Fan-out-Too many logic levels-Poor placement-Physical limitations© 2009 Altera CorporationExploring Failures in Quartus II Softwaren Technology Map Viewer-Graphically shows number of logic levelsn Chip Planner-Graphically shows placementn TimeQuest path analysis-Highlights clock/path delays-Highlights fan-out-Highlights number of logic levels-And just about everything else© 2009 Altera CorporationTechnology Map ViewernAccessing Technology Map Viewer -Right-click in TimeQuest report and choose Locate Path or Locate Endpoints nView number of logic levels in failing paths© 2009 Altera CorporationChip PlannerChoose Chip Plannerand the Chip Plannerdisplays the placement,routing & timinginformation for that pathn Accessing Chip Planner-Right-click in TimeQuest report and choose Locate Path or Locate Endpoints n View placement of nodes in timing path as well as chosen routing© 2009 Altera CorporationTimeQuest Path AnalysisLogic Delay Path DelaysInterconnect Delay ClockDelayn Provides ALL detailed information pertaining to timing path© 2009 Altera CorporationFurther Path Analysisn Always start with worst slack path(s)-Fixing worst path(s) may give Fitter freedom to fix other failing pathsn In TimeQuest reports, list top 50-100 failing paths and look for common source, intermediate or destination nodes-Sometimes start or end nodes are bits of same bus-Sometimes paths with different source or destination nodes have common intermediate nodes© 2009 Altera CorporationTiming Optimizationn General Recommendationsn Analyzing Timing Failuresn Solving Typical Timing Failures© 2009 Altera CorporationSolving Typical Timing FailuresWe’ll look at some cases of timing failures, how to identify them and possible solutions. It is possible for you to have several at once.1)Too many logic levels2)Fan-out signals3)Conflicting physical constraints4)Conflicting timing assignments5)Tight timing requirements© 2009 Altera CorporationCase 1) Too Many Logic Levelsn Increases T data, thus increasing data arrival time n How to verify-Technology Map Viewer on failing path-TimeQuest detailed path analysis© 2009 Altera CorporationCase 1) Technology Map ViewerRight-click on failing path and select Locate Endpoints or PathThis path has 8 levels of logic© 2009 Altera CorporationCase 1) TimeQuestNote number oflevels of logic in data arrival path© 2009 Altera CorporationCase 1) Possible Solutionsn Add multi-cycle assignments if design allows n Add pipeline registers-Reduces logic levels-Adds latencyn Enable register retiming (physical synthesis) -Redistributes logic around registers reducing number of levels-Increases compile timen Recode logic to be more efficient-Reduces logic levels-May need to focus on implementation Changes Launch & Latch EdgesChanges Tdata Changes T dataChanges Tdata© 2009 Altera CorporationCase 1) Pipeline Registersn Add pipeline registers to reduce Tdata© 2009 Altera CorporationCase 1) Focus on Implementation n HDL coding decisions will greatly impact resulting synthesis-May need to code with resulting synthesis in mindn See Quartus II handbook chapter, “Recommended HDL Coding Styles”n Great material on HDL coding© 2009 Altera CorporationTip #1 -Reduce Embedded IFs n Don’t embed IF statements-Use CASE statements insteadVHDLVerilog© 2009 Altera CorporationTip #1 -Reduce Embedded IFs (cont.)nResulting hardware interpretation© 2009 Altera CorporationTip #2 -Use System Verilog Unique Casen Verilog CASE implies one-to-many relationshipn Verilog CASE statement is implemented as a priority encoderi.e. embedded IF statementsn System Verilog is a superset of Verilogn Use “unique” qualifier to prevent priority encoder© 2009 Altera CorporationUnique and Priorityn unique and priority keywords apply to case statements or if/else chainsn unique implies non-overlapping case items or conditional expressionsn priority implies just the oppositeunique case(state)S0:S1:S2:endcaseNo more parallel_case!© 2009 Altera CorporationEnabling SystemVerilog-2005 n GUIn Source-level control (for IP etc) n Per-file basis Verilog-2001remains the default// synthesis VERILOG_INPUT_VERSION SYSTEMVERILOG_2005module(input byte a, b, output logic);set_global_assignment –name VERILOG_FILE –rev SYSTEMVERILOG_2005© 2009 Altera CorporationTip #3: CASE synthesis directivesn Don’t use synthesis directives -parallel_case-full_casen Great paper discusses the perils of CASE synthesis directives-"full_case parallel_case",the Evil Twins of Verilog Synthesisl(http://www.sunburst-/papers/CummingsSNUG1999Boston_FullParallelCase.pdf)© 2009 Altera CorporationCase 2) Fan-Out Signalsn Timing failures from fan-out are more often a matter of where than of how many-High fan-out in itself can force nodes to spread out or can result in slow routingl Increases routing delay and thus T datal Proximity is key in FPGAs & newer CPLDsn Typical problem cases:-Memory control signals-Clock enables© 2009 Altera CorporationCase 2) Fan-Out Signals (cont.)n How to verify-Locate high fan-out signals as possible causesl TimeQuest path analysisl Non-Global High Fan-Out Signals table in Compilation Report (Fitterfolder ÞResource section)-Use Chip Planner to verify locations of nodes© 2009 Altera CorporationCase 2) TimequestFanout of 4108 with interconnect delay of 4.429 ns InterconnectDelay© 2009 Altera CorporationCase 2) Possible Solutionsn Add multi-cycle assignments if design allowsn Put high fan-out signals on globals -Reduces delays-Subject to resource availability-Global insertion delay may make thisoption not validn Turn on physical synthesis-Duplicates logic to reduce fan-out-Longer compilation time & higher utilization Changes Launch & Latch EdgesChanges TdataChanges Tdata© 2009 Altera CorporationCase 2) Possible Solutions (cont.) n Use max_fanout constraints -Simple to do-Trial & error process, multiplecompilesn Manual duplication of logic-Reduces fan-out-Allows user to intelligently controlhow each copy is used in design-May be a time intensive process depending on how signal is distributed Changes Tdata Changes Tdata© 2009 Altera Corporationn Examine Fitter report for global & non-global signalsn Fixed number of global signals in a given devicen Fitter algorithms may auto-promote high fan-out signals (see fitter messages)© 2009 Altera Corporationn Manually promote signals with global assignmentn Thru TCL interfaceset_instance_assignment -name GLOBAL_SIGNAL ON -to inst1nThru GUI© 2009 Altera CorporationCase 2) Physical Synthesisn Options to try-Combinational physical synthesisl Performs duplication for combinatorial nodes -Register duplicationn See Quartus II handbook chapter “Netlist Optimizations & Physical Synthesis”-Explains features in detail-Lists caveats and exceptions© 2009 Altera CorporationCase 2) MAX_FANOUT Constraintn Controls the number of destinations so the fan-out count does not exceed the value specifiedn Thru TCL interfaceset_instance_assignment -name MAX_FANOUT <integer>-to <instance>nThru GUI© 2009 Altera CorporationCase 2) Manual Duplication n Two methods:1.Manual duplication in source code 2.Manual Logic Duplication assignmentnManual Logic Duplication Assignment -Duplicates the source node, and uses the new duplicated node to fanout to the destination node。

相关主题