fpga - When is it neater to use VECTOR representations vs INTEGERs?

Wednesday, 23 November 2016

fpga - When is it neater to use VECTOR representations vs INTEGERs?

In the comment thread on an answer to this question: Wrong outputs in VHDL entity it was stated:

"With integers you don't have control or access to the internal logic representation in the FPGA, while SLV lets you do tricks like utilizing the carry chain efficiently"

So, in what circumstances have you found it neater to code using a vector of bits representation than using integer s in order to access the internal representation? And what advantages did you measure (in terms of chip area, clock frequency, delay, or otherwise.)?

Answer

I've written the code suggested by two other posters in both vector and integer form, taking care to have both versions operate in as similar way as possible.

I compared the results in simulation and then synthesised using Synplify Pro targetting Xilinx Spartan 6. Code samples below are pasted from working code, so you should be able to use them with your favourite synthesiser and see if it behaves the same.

Firstly, the downcounter, as suggested by David Kessner:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity downcounter is
    generic (top : integer);
    port (clk, reset, enable : in  std_logic; 
         tick   : out std_logic);
end entity downcounter;

Vector architecture:

architecture vec of downcounter is
begin

    count: process (clk) is
        variable c : unsigned(32 downto 0);  -- don't inadvertently not allocate enough bits here... eg if "integer" becomes 64 bits wide
    begin  -- process count
        if rising_edge(clk) then  
            tick <= '0';
            if reset = '1' then
                c := to_unsigned(top-1, c'length);
            elsif enable = '1' then
                if c(c'high) = '1' then
                    tick <= '1';

                    c := to_unsigned(top-1, c'length);
                else
                    c := c - 1;
                end if;
            end if;
        end if;
    end process count;
end architecture vec;

Integer architecture

architecture int of downcounter is
begin
    count: process (clk) is
        variable c : integer;
    begin  -- process count
        if rising_edge(clk) then  
            tick <= '0';
            if reset = '1' then
                c := top-1;
            elsif enable = '1' then

                if c < 0 then
                    tick <= '1';
                    c := top-1;
                else
                    c := c - 1;
                end if;
            end if;
        end if;
    end process count;
end architecture int;

Results

Code-wise, the integer one seems preferable to me as it avoid the to_unsigned() calls. Otherwise, not much to choose.

Running it through Synplify Pro with top := 16#7fff_fffe# produces 66 LUTs for the vector version and 64 LUTs for the integer version. Both versions make much use of the carry-chain. Both report clock speeds in excess of 280MHz. The synthesiser is quite capable of establishing good use of the carry chain - I verified visually with the RTL viewer that similar logic is produced with both. Obviously an up-counter with comparator will be bigger, but that'd be the same with both integers and vectors again.

Suggested by ajs410:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity clkdiv is
    port (clk, reset : in     std_logic;
        clk_2, clk_4, clk_8, clk_16  : buffer std_logic);
end entity clkdiv;

Vector architecture

architecture vec of clkdiv is

begin  -- architecture a1


    process (clk) is
        variable count : unsigned(4 downto 0);
    begin  -- process
        if rising_edge(clk) then  
            if reset = '1' then
                count  := (others => '0');
            else
                count := count + 1;
            end if;
        end if;

        clk_2 <= count(0);
        clk_4 <= count(1);
        clk_8 <= count(2);
        clk_16 <= count(3);
    end process;

end architecture vec;

Integer architecture

You have to jump through some hoops to avoid just using to_unsigned and then picking bits off which would clearly produce the same effect as above:

architecture int of clkdiv is
begin
    process (clk) is
        variable count : integer := 0;
    begin  -- process
        if rising_edge(clk) then  
            if reset = '1' then
                count  := 0;
                clk_2  <= '0';
                clk_4  <= '0';

                clk_8  <= '0';
                clk_16 <= '0';
            else
                if count < 15 then
                    count := count + 1;
                else
                    count := 0;
                end if;
                clk_2 <= not clk_2;
                for c4 in 0 to 7 loop

                    if count = 2*c4+1 then
                        clk_4 <= not clk_4;
                    end if;
                end loop; 
                for c8 in 0 to 3 loop
                    if count = 4*c8+1 then
                        clk_8 <= not clk_8;
                    end if;
                end loop; 
                for c16 in 0 to 1 loop

                    if count = 8*c16+1 then
                        clk_16 <= not clk_16;
                    end if;
                end loop; 
            end if;
        end if;
    end process;
end architecture int;

Results

Code-wise, in this case, the vector version is clearly better!

In terms of synthesis results, for this small example, the integer version (as ajs410 predicted) does produce 3 extra LUTs as part of the comparators, I was too optimistic about the synthesiser, although it is working with an awfully obfuscated piece of code!

Vectors are a clear win when you want arithmetic to wrap-around (counters can be done as a single line even):

vec <= vec + 1 when rising_edge(clk);

if int < int'high then 
   int := int + 1;

else
   int := 0;
end if;

although at least it's clear from that code that the author intended a wrap around.

Something I've not used in real-code, but pondered:

The "naturally-wrapping" feature can also be utilised for "computing through overflows". When you know that the output of a chain of additions/subtractions and multiplications is bounded, you don't have to store the high bits of the intermediate calculations as (in 2-s complement) it'll come out "in the wash" by the time you get to the output. I'm told that this paper contains a proof of this, but it looked a bit dense for me to make a quick assessment! Theory of Computer Addition and Overflows - H.L. Garner

Using integers in this situation would cause simulation errors when they wrapped, even though we know they'll unwrap in the end.

And as Philippe pointed out, when you need a number bigger than 2**31 you have no choice but to use vectors.

Blog

Wednesday, 23 November 2016