Sunday, 18 February 2018

ASIC timing constraints via SDC: How to correctly specify a multiplexed clock?


Introduction


Having found multiple, sometimes conflicting or incomplete information on the internet and in some training classes about how to create timing constraints in SDC format correctly, I'd like to ask the EE community for help with some general clock generating structures I have encountered.


I know that there are differences on how one would implement a certain functionality on an ASIC or FPGA (I have worked with both), but I think there should be a general, correct way to constrain the timing of a given structure, independent of the underlying technology - please let me know if I'm wrong on that.


There are also some differences between different tools for implementation and timing analysis of different vendors (despite Synopsys offering a SDC parser source code), but I hope that they are mainly a syntax issue which can be looked up in the documentation.


Question


This is about the following clock multiplexer structure, which is part of the clkgen module which is again part of a larger design: Clock multiplexer schematic



While the ext_clk input is assumed to be generated externally to the design (entering through an input pin), the clk0 and clk4 signals are also generated and used by the clkgen module (see my related ripple clock question for details) and have associated clock constraints named baseclk and div4clk, respectively.


The question is how to specify the constraints such that the timing analyser



  1. Treats cpu_clk as a multiplexed clock which can be either one of the source clocks (fast_clk or slow_clk or ext_clk), taking the delays through the different AND and OR gates into account

  2. While at the same time not cutting the paths between the source clocks which are used elsewhere in the design.


While the simplest case of an on-chip clock multiplexer seems to require just the set_clock_groups SDC statement:


set_clock_groups -logically_exclusive -group {baseclk} -group {div4clk} -group {ext_clk}

...in the given structure, this is complicated by the fact that clk0 (via the fast_clk output) and clk4 (via slow_clk) are still used in the design, even if cpu_clk is configured to be ext_clk when only use_ext is asserted.



As described here, the set_clock_groups command as above would cause the following:



This command is equivalent to calling set_false_path from each clock in every group to each clock in every other group and vice versa



...which would be incorrect, since the other clocks are still used elsewhere.


Additional Information


The use_clk0, use_clk4 and use_ext inputs are generated in such a way that only one of them is high at any given time. While this could be used to stop all clocks if all use_* inputs are low, the focus of this question is on the clock multiplexing property of this structure.


The X2 instance (a simple buffer) in the schematic is just a place-holder to highlight the issue of automatic place&route tools being usually free to place buffers anywhere (such as between the and_cpu_1/z and or_cpu1/in2 pins). Ideally, the timing constraints should be unaffected by that.



Answer



Define divide by 1 clocks on the and_* nets and declare them to be physically exclusive. Cadence RTL compiler handles the situation correctly by generating 3 timing paths for registers clocked by cpu_clk (one path each for one clock). Registers directly driven by clk0, clk4 and clk_ext have their own timing arcs.



create_generated_clock -source [get_ports clk0] \
-divide_by 1 -name and_clk0 [get_pins and_cpu_1/Y]

create_generated_clock -source [get_ports clk4] \
-divide_by 1 -name and_clk4 [get_pins and_cpu_2/Y]

create_generated_clock -source [get_ports clk_ext] \
-divide_by 1 -name and_clk_ext [get_pins and_cpu_ext1/Y]

set_clock_groups \

-physically_exclusive \
-group [get_clocks and_clk0] \
-group [get_clocks and_clk4] \
-group [get_clocks and_clk_ext]

No comments:

Post a Comment

arduino - Can I use TI's cc2541 BLE as micro controller to perform operations/ processing instead of ATmega328P AU to save cost?

I am using arduino pro mini (which contains Atmega328p AU ) along with cc2541(HM-10) to process and transfer data over BLE to smartphone. I...