The Art of FPGA Design - Post 15

16 Oct 2018

Counters, Adders and Accumulators

One of the most common operation encountered in digital hardware design, especially for digital signal processing applications, is addition. This actually covers a large group of fundamental building blocks, like up/down binary counters, adders/subtractors, comparators, accumulators and so on. The signal types operated on can be IEEE.numeric_std SIGNED/UNSIGNED for integer operands, the user defined SFIXED introduced earlier, or the default VHDL-2008 type with the same name for fixed point operands, or even floating point numbers, not only single or double precision formats but even arbitrary size user defined formats. As long as you do not add and subtract STD_LOGIC_VECTORs anything goes.

The floating point case is actually very rarely used in FPGA hardware design as opposed to software design where it is pretty common. The reason is that while it is perfectly possible to build floating point operators using fabric resources - and there is very good and free IP available in the Vivado IP Catalog that you can use - the efficiency of floating point adders in terms of area and speed is about one order of magnitude lower than the equivalent fixed point implementations. Almost always it is better to use higher precision fixed point representations and techiniques like scaling and block floating point to address high dynamic range and overflows, you can always achieve similar numerical performance as a floating point implementation using less resources and running faster. In the near future this situation might change, for example, the just announced new Xilinx ACAP family Versal includes thousands of hard primitives called DSP58 that are able to do floating point adds and multiplies, as well as hundreds of AI Engines in the AI Core sub-family, each one able to do 8 single precision floating point multiply and accumulate operations at GHz speeds. If you are interested on the new Xilinx Versal ACAP family and its AI Engines you can find more information here:

https://www.xilinx.com/products/silicon-devices/acap/versal.html

For now we will focus on fixed point adders, I will discuss floating point operators in a separate blog post.

It actually doesn't matter if we are operating on integer or fixed point numbers, or if they are signed or unsigned, the underlying fabric resource used in all cases is the same, the CARRY8 (or CARRY4 in the case of older 7-series FPGA families) primitive that we have already encountered in an earlier post in the context of wide AND, OR and XOR logic gates. A CARRY8 is simply an 8-bit version of the CARRY4 and in can even be broken down into two independent CARRY4s. Now we will look at the CARRY8 primitive for its main intended use, building counters, adder/subtractors, comparators and accumulators.

As a general rule, the expected utilization for an N-bit counter, adder/subtractor, comparator or accumulator is always N LUT6es and N/8 CARRY8s (or N/4 of CARRY4s), plus of course N FFs since by definition counters and accumulators are sequential blocks and it is always a good design practice to pipeline as much as possible the adders or comparators too. From a speed point of view, if the inputs are driven by other fabric FFs and the outputs are registered all these blocks count as one logic level (through the LUT6) as the CARRY8 carry chain is very fast. Unless you have very long carry chains, typically over 32 bits, the timing penalty of the cascade carry logic is usually negligible. If this becomes an issue it is always possible to insert a FF in the middle of the carry chain and pipeline it. Speeds over 500MHz are always possible with these types of designs and even 800MHz is not out of the question if you design carefully.

At first the operation of the CARRY8 primitive may seem obscure, but it is easy to understand once you see that the logic equations for a full 1-bit full adder, which has three inputs A, B and CI and two outputs, O and CO are:

  O<=A xor B xor CI;                         -- a 3-input XOR 
  CO<=(A and B) or (B and CI) or (CI and A); -- this is a 3-input majority voter

You can build an N-bit adder with N such 1-bit full adders by connecting the CO output of one to the CI input of the next. The CI of the first 1-bit adder is the carry in input for the N-bit adder and the CO output of the last 1-bit adder is the carry out of the N-bit adder. Of course, adders of virtually any length can be created by cascading multiple CARRY8s or CARRY4s.

In the CARRY8 diagram shown above, the XOR gates inside the CARRY8 primitive together with an A xor B function implemented by the O6 output of the LUT6 create the 3-input XOR, while the MUXCY implements the 3-input majority voter function if A is connected to the MUXCY first input either directly or passing through the LUT6 and using the O5 output. It is not immediately obvious but if you analyze the operation of the MUXCY you will see that it does indeed implement a 3-input majority voter function.

For an up counter, adder or accumulator the carry input is normally tied to zero. For a down counter, subtractor or comparator we need to use the 2's complement of the B operand. This can be easily achieved by inverting every bit of B and setting the carry input to one. The B inverters can be absorbed into the LUT6es so subtraction is equivalent in terms of speed and area to addition. Dynamic selection between add and subtract with an extra input ADDSUB (which is 0 for add and 1 for subtract) can also be achieved for free by making O6<=A xor B xor ADDSUB and connecting ADDSUB to the carry input of the CARRY8 chain, which is exactly what a 2's complement operation does. For counters and accumulators one of the two operands is the registered output of the adder, while a comparator is essentially a subtractor for which we only look at the carry output or the sign of the result. The LUT6_2 has still a few inputs left which can be used to implement a load function, mux between two possible operands and so on. The FFs clock enable and synchronous reset are also available to enable and reset the counter or accumulator at no extra cost.

Rather than providing here coding examples for the countless variations of the blocks we talked about so far, I want to point out a very valuable and often ignored resource which is the Language Templates that can be found in the Vivado IDE Tools menu. For example, this is how a behavioral accumulator would be coded in VHDL:

There are of course many more such coding examples, both for VHDL and Verilog and not just for behavioral inference but also device primitive instantiations, essential HDL syntax and so on. You can simply cut and paste coding templates directly into your design. This is an essential resource for beginners as well as advanced HDL designers - I have been doing FPGA design for almost 30 years now and I still use the Language Templates on a daily basis.

The main conclusion to take from this post is that the best way to build counters, adders, comparators and accumulators is by inferring them from behavioral code using coding examples found in the Vivado IDE Language Templates, not by structural primitive instantiations or using IP Catalog items. There is only one notable exception where primitive instantiations might be required, which will make the object of the next blog post, where we will see how to get two adders for the price of one.

Back to the top: The Art of FPGA Design