How can a processing delay be explicitly declared in VHDL?

Thursday, 30 November 2017

How can a processing delay be explicitly declared in VHDL?

Taking as an example the following simple multiplication:

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

entity multiplier is

  port(
    clk : in std_logic;
    a   : in std_logic_vector(15 downto 0);
    b   : in std_logic_vector(15 downto 0);
    p   : out std_logic_vector(31 downto 0)
    -- ready : out std_logic;
  );
end multiplier;

architecture IMP of multiplier is


begin
  process (clk)
  begin
    if clk'event and clk = '1' then
      p <= a * b;
    -- ready <= '1';
    -- else
    -- ready <= '0';
    end if;

  end process;
end IMP;

At the raising edge of input clock, 'a' and 'b' are assumed to be valid, and the product 'p' is processed. For more complex and slower processes, what are the options in VHDL to have an 'operation finished' output?

I've put the ready symbol just as a way to try to exemplify what I'm trying to achieve. I'm assuming that the process could be complicated enough to last more than a clock period.

Answer

Short answer: you can't.

First off, remove the std_logic_unsigned and std_logic_arith packages. They're non-standard (credit: https://electronics.stackexchange.com/a/188677/148777). Use numeric_std instead.

I'm not even sure multiplication (*) is defined for std_logic_vector. If it is, it shouldn't be. It should be defined only for unsigned or signed types, or similar.

In the end, VHDL has no idea how long a * b will take. Maybe the synthesis tool knows your hardware has a multiplication unit and uses that. Maybe it's implemented as a big messy combinational logic block.

In a synchronous design (like your clocked process here), you don't care exactly how long an operation takes, as long as it's finished in time for the next register stage to read it. That is, the propagation delay in the source register, plus the propagation delay in the combinational logic (the multiplication), plus the wire routing delay, must be less than one clock cycle.

You ensure this constraint is met by running static timing analysis (STA). You tell the tool that your clock is (e.g.) 8ns long, and it checks that your multiplication implementation (plus delays, as mentioned) is shorter than that. If it passes, good! If not, you have issues.

You can specify that you know your multiplication will take "n" clock cycles. But if you don't know how it's synthesized, you can't know that. You could try synthesizing, see how long it takes, and count at least that many clock cycles before trying to check the output.

Or, you can write the multiplication logic yourself. When doing that, you can add pipeline stages to ensure each sub-operation is done in one clock cycle, and you know that it will take one clock cycle times the number of pipeline stages to complete. As a bonus, if you design the logic yourself you can add an "output valid" signal. (Hint: this is a good approach.)

Or, you can brute-force the approach and slow down your clock.

In summary there's no simple solution. If you rely on the synthesis tool's implementation of a multiplication operation, you can't know how long it takes until the synthesis is done. If you don't perform STA and you don't inform the STA tool that you're relying on the operation to take "n" clock cycles, you won't be able to notice if a subsequent synthesis run gives you different results.

Blog

Thursday, 30 November 2017