fpga - VHDL: Using '*' operator when implementing multipliers in design

Thursday, 5 March 2015

fpga - VHDL: Using '*' operator when implementing multipliers in design

Present day FPGAs have built in DSP blocks, the latest FPGAs even have built in IEEE-754 compliant floating point units.

It is possible to create DSP entity/module using a GUI after selecting the required parameters in it, and then instantient it in the design.

When do we need to do such micromanagment in a design of instantaiting actual DSP blocks and when do we just enter a '*' operator in the code and let the synthesis tool handle the low level details? Which is better?

There are many different type of multiplication algorithms when it comes to binary multiplication. Since now we have built in DSP blocks on silicon and even built in floating point multipliers, does this mean that all those algoriths have not effectively become obsolete.

Answer

I've done this a few times myself.

Generally, the design tools will choose between a fabric implementation and a DSP slice based on the synthesis settings.

For instance, for Xilinx ISE, in the synthesis process settings, HDL Options, there is a setting "-use_dsp48" with the options: Auto, AutoMax, Yes, No. As you can imagine, this controls how hard the tools try to place DSP slices. I once had a problem where I multiplied an integer by 3, which inferred a DSP slice - except I was already manually inferring every DSP slice in the chip, so the synth failed! I changed the setting to No, because I was already using every dsp slice.

This is probably a good rule of thumb (I just made up): if your design is clocked at less than 50 MHz, and you're probably going to use less than 50% of the DSP slices in the chip, then just use the *, +, and - operators. this will infer DSP slices with no pipeline registers. This really limits the top speed. (I have no idea what happens when you use division)

However, if it looks like you're going to run the slices closer to the max speed of the DSP slice (333 MHz for Spartan 6 normal speed grade) Of you're going to use all of the slices, you should manually infer them.

In this case, you have two options.

Option 1: manually use the raw DSP instantiation template. Option 2: use a IP block from Xilinx Core Generator. ( I would use this option. At the same time, you will learn all about core gen, which will help in the future)

Before you do either of these, read the first couple of pages of the DSP slice user guide. In the case of the Spartan 6, (DSP48A1), that would be Xilinx doc UG389: http://www.xilinx.com/support/documentation/user_guides/ug389.pdf

Consider the Core Generator option first. I usually create a testing project in Core Generator for the part I'm working with, where I create any number of IP blocks just to learn the system. Then, when I'm ready to add one to my design in ISE, I right click in the Design Hierarchy, click new source, and select "IP (CORE Generator & Architecture Wizard)" so that I can edit and regenerate the block directly from my project.

In Core gen, take a look at the different IP blocks you can choose from - there are a few dozen, most of which are pretty cool.

The Multiplier Core is what you should look at first. Check out every page, and click the datasheet button. The important parts are the integer bit widths, the pipeline stages (latency) and any control signals. This produces the simplest possible block by taking away all the ports you don't need.

When I was building a 5 by 3 order IIR filter last year, I had to use the manual instantiation template since I was building a very custom implementation, with 2 DSP slices clocked 4x faster than the sample rate. It was a total pain.

Blog

Thursday, 5 March 2015