The timing optimization techniques are common for RTL based designs on different Xilinx FPGA platform including the XC7Z007S-1CLG225C FPGA device on MiniZed board.
Timing closure consists of the design meeting all timing requirements. It is easier to reach timing closure if you have the right HDL and constraints for synthesis. In addition, it is important to iterate through the synthesis stages with improved HDL, constraints, and synthesis options, as shown in the following figure.
To successfully close timing, follow these general guidelines:
When initially not meeting timing, evaluate timing throughout the flow.
Focus on worst negative slack (WNS) of each clock as the main way to improve total negative slack (TNS).
Review large worst hold slack (WHS) violations (<-1 ns) to identify missing or inappropriate constraints.
Revisit the trade-offs between design choices, constraints, and target architecture.
Know how to use the tool options and Xilinx® design constraints (XDC).
Be aware that the tools do not try to further improve timing (additional margin) after timing is met.
The following sections provide recommendations for reviewing the completeness and correctness of the timing constraints using methodology design rule checks (DRCs) and baselining, identifying the timing violation root causes, and addressing the violations using common techniques.
Checking for Valid Constraints:
Review the Check Timing section of the Timing Summary report to quickly assess the timing constraints coverage, including the following:
All active clock pins are reached by a clock definition.
All active path endpoints have requirement with respect to a defined clock (setup/hold/recovery/removal).
All active input ports have an input delay constraint.
All active output ports have an output delay constraint.
Timing exceptions are correctly specified.
In addition to check_timing, the Methodology report (TIMING and XDC checks) flags timing constraints that can lead to inaccurate timing analysis and possible hardware malfunction. You must carefully review and address all reported issues.
Checking for Positive Timing Slacks:
The following timing metrics reflect the design timing score. Numbers must be positive to meet timing.
The Timing Summary report provides high-level information on the timing characteristics of the design compared to the constraints provided. Review the timing summary numbers during signoff:
Total Negative Slack (TNS)
The sum of the setup/recovery violations for each endpoint in the entire design or for a particular clock domain. The worst setup/recovery slack is the worst negative slack (WNS).
Total Hold Slack (THS)
The sum of the hold/removal violations for each endpoint in the entire design or for a particular clock domain. The worst hold/removal slack is the worst hold slack (WHS).
Total Pulse Width Slack (TPWS)
The sum of the violations for each clock pin in the entire design or a particular clock domain for the following checks:
Minimum low pulse width
Minimum high pulse width
Minimum period
Maximum period
Maximum skew (between two clock pins of a same leaf cell)
Worst Pulse Width Slack (WPWS)
The worst slack for all pulse width, period, or skew checks on any given clock pin.
The Total Slack (TNS, THS or TPWS) only reflects the violations in the design. When all timing checks are met, the Total Slack is null.
The timing path report provides detailed information on how the slack is computed on any logical path for any timing check. In a fully constrained design, each path has one or several requirements that must all be met in order for the associated logic to function reliably.
The main checks covered by WNS, TNS, WHS, and THS are derived from the sequential cell functional requirements:
Setup time
The time before which the new stable data must be available before the next active clock edge to be safely captured.
Hold requirement
The amount of time the data must remain stable after an active clock edge to avoid capturing an undesired value.
Recovery time
The minimum time required between the time the asynchronous reset signal has toggled to its inactive state and the next active clock edge.
Removal time
The minimum time after an active clock edge before the asynchronous reset signal can be safely toggled to its inactive state.
A simple example is a path between two flip-flops that are connected to the same clock net.
After a timing clock is defined on the clock net, the timing analysis performs both setup and hold checks at the data pin of the destination flip-flop under the most pessimistic, but reasonable, operating conditions. The data transfer from the source flip-flop to the destination flip-flop occurs safely when both setup and hold slacks are positive.
Checking That Your Design is Properly Constrained
Before looking at the timing results to see if there are any violations, be sure that every synchronous endpoint in your design is properly constrained.
Runcheck_timingto identify unconstrained paths. You can run this command as a standalone command, but it is also part ofreport_timing_summary. In addition,report_timing_summaryincludes an Unconstrained Paths section where N logical paths without timing requirements are listed by the already defined source or destination timing clock. N is controlled by the-max_pathoption.
After the design is fully constrained, run thereport_methodologycommand and review the TIMING and XDC checks to identify non-optimal constraints, which will likely make timing analysis not fully accurate and lead to timing margin variations in hardware. To identify and correct unrealistic target clock frequencies or setup path requirement, use thereport_qor_assessmentcommand.
Fixing Issues Flagged by check_timing
Thecheck_timingTcl command reports that something is missing or wrong in the timing definition. When reviewing and fixing the issues flagged bycheck_timing, focus on the most important checks first. Following are the checks listed from most important to least important.
No Clock and Unconstrained Internal Endpoints
This allows you to determine whether the internal paths in the design are completely constrained. You must ensure that the unconstrained internal endpoints are at zero as part of the Static Timing Analysis signoff quality review.
Zero unconstrained internal endpoints indicate that all internal paths are constrained for timing analysis. However, the correct value of the constraints is not yet guaranteed.
Generated Clocks
Generated clocks are a normal part of a design. However, if a generated clock is derived from a master clock that is not part of the same clock tree, this can cause a serious problem. The timing engine cannot properly calculate the generated clock tree delay. This results in erroneous slack computation. In the worst case situation, the design meets timing according to the reports but does not work in hardware.
Loops and Latch Loops
A good design does not have any combinational loops, because timing loops are broken by the timing engine. The broken paths are not reported during timing analysis or evaluated during implementation. This can lead to incorrect behavior in hardware, even if the overall timing requirements are met.
No Input/Output Delays and Partial Input/Output Delays
All I/O ports must be properly constrained.
Multiple Clocks
Multiple clocks are usually acceptable.AMDrecommends that you ensure that these clocks are expected to propagate on the same clock tree. You must also verify that the paths requirement between these clocks does not introduce tighter requirements than needed for the design to be functional in hardware.
If this is the case, you must useset_clock_groupsorset_false_pathbetween these clocks on these paths. Any time that you use timing exceptions, you must ensure that they affect only the intended paths.
Multiple clocks are usually acceptable.AMDrecommends that you ensure that these clocks are expected to propagate on the same clock tree. You must also verify that the paths requirement between these clocks does not introduce tighter requirements than needed for the design to be functional in hardware.
If this is the case, you must useset_clock_groupsorset_false_pathbetween these clocks on these paths. Any time that you use timing exceptions, you must ensure that they affect only the intended paths.
Fixing Issues Flagged by report_methodology
Thereport_methodologycommand reports additional constraints and timing analysis issues, which you must carefully review before and after running the place and route tools. This section describes the main XDC and TIMING categories of checks, along with their relative impact on timing closure and hardware stability. You must focus on resolving the checks that impact timing closure first.
Methodology DRCs with Impact on Timing Closure
The DRCs shown in the following table flag design and timing constraint combinations that increase the stress on implementation tools, leading to impossible or inconsistent timing closure. These DRCs usually point to missing clock domain crossing (CDC) constraints, inappropriate clock trees, or inconsistent timing exception coverage due to logic replication. They must be addressed with highest priority.
Baselining the Design for timing closure:
Reviewing Clock Relationships:
You can view the relationship between clocks using the report_clock_interaction Tcl command. The report shows a matrix of source clocks and destination clocks. The color in each cell indicates the type of interaction between clocks, including any existing constraints between them. The following figure shows a sample clock interaction report.
Analyzing and Resolving Timing Violations
Reducing Net Delay Caused by Congestion
Device congestion can potentially lead to difficult timing closure if the critical paths are placed inside or next to a congested area or if the device utilization is high and the placed design is hardly routable. In many cases, congestion will significantly increase the router runtime. If a path shows routed delays that are longer than expected, analyze the congestion of the design and identify the best congestion alleviation technique
Congestion Area and Level Definition
AMD device routing architecture comprises interconnect resources of various lengths in each direction: North, South, East, and West. A congested area is reported as the smallest square that covers adjacent interconnect tiles (INT_XnYm) or CLB tiles (CLE_M_XnYm) where interconnect resource utilization in a specific direction is close to or over 100%. The congestion level is the positive integer which corresponds to the side length of the square. The following figure shows the relative size of congestion areas on an AMD device versus clock regions.
Interconnect Congestion Level in the Device Window
The Interconnect Congestion Level metric highlights the largest contiguous area in which routing resources are overused. By default, this metric is based on estimation, which is similar to the congestion level after initial routing. Actual routing can also be displayed if routing exists. After placement or after routing, you can display this congestion metric by right-clicking in theDevicewindow and selectingMetric>Interconnect Congestion Level.
The Interconnect Congestion Level metric provides a quick visual overview of any congestion hotspots in the device. The following figure shows a placed design with several congested areas. This metric is based on the current interconnect demand and availability with a threshold of 0.9 (that is, 90% routing usage). The range is 0.1 to 0.9.
Example of Congestion per CLB in the Device Window
Reducing Clock Skew:
To meet requirements such as high fanout clocks, short propagation delays, and low clock skew,AMDdevices use dedicated routing resources to support the most common clocking schemes. Clock skew can severely reduce timing budget on high frequency clocks. Clock skew can also add excessive stress on implementation tools to meet both setup and hold when the device utilization is high.
The clock skew is typically less than 300 ps for intra-clock timing paths and less than 500 ps for timing paths between balanced synchronous clocks. When crossing resource columns, clock skew shows more variation, which is reflected in the timing slack and optimized by the implementation tools. For timing paths between unbalanced clock trees or with no common node, clock skew can be several nanoseconds, making timing closure almost impossible.
To reduce clock skew:
Review all clock relationships to ensure that only synchronous clock paths are timed and optimized.
Review the clock tree topologies and placement of timing paths impacted by higher clock skew than expected, as described in the following sections.
Identify the possible clock skew reduction techniques, as described in the following sections.
Using Intra-Clock Timing Paths:
Timing paths with the same source and destination clocks that are driven by the same clock buffer typically exhibit very low skew. This is because the common node is located on the dedicated clock network, close to the leaf clock pins, as shown in the following figure.
Limiting Synchronous Clock Domain Crossing Paths:
Timing paths between synchronous clocks driven by separate clock buffers exhibit higher skew, because the common node is located before the clock buffers. That is, the common node is farther from the leaf clock pins, resulting in higher pessimism in the timing analysis. The clock skew is even worse for timing paths between unbalanced clock trees due the delay difference between the source and destination clock paths. Although positive skew helps with meeting setup time, it hurts hold time closure, and vice versa.
In the following figure, three clocks have several intra and inter clock paths. The common node of the two clocks driven by the MMCM is located at the output of the MMCM (red markers). The common node of the paths between the MMCM input clock and MMCM output clocks is located on the net before the MMCM (blue marker). For the paths between the MMCM input clock and MMCM output clocks, the clock skew can be especially high depending on theclkin_bufBUFGCE location and the MMCM compensation mode.