The Art of FPGA Design - Post 10

8 Sep 2018

Instantiating LUT6 Primitives

In the previous post we have already seen how to instantiate FPGA primitives, SRL16s in that case. The role of the synthesis tool is to take HDL behavioral code and translate it into a netlist of FPGA fundamental building blocks called primitives. This is very much like the software design flow, where a C compiler takes C code and produces machine code that a processor can execute. The same way a C compiler lets you embed assembly code into your C program, the HDL synthesis tool let's you use FPGA primitives in your VHDL code. But like in the software case, just because you can it doesn't mean that you always should. There are significant drawbacks to lower level design flows and there should be compelling reasons for a designer to go on that path. However, even if you do not end up using such low level design flows every day, understanding what instructions your microprocessor executes, or what are the basic building blocks of your FPGA can lead to much better programs respectively designs. Designing for the underlying FPGA architecture is the key to achieving better results as a hardware designer.

At its most fundamental level, every hardware design is made out of combinatorial and sequential elements. The basic sequential element is the D FF (flip-flop), a clocked element that can store one bit of information. A modern FPGA has tens to hundreds of thousands of them. The synthesis tool has no problem inferring them from behavioral code, and although there a number of primitives called FDCE, FDPE, FDRE and FDSE with clock enables and various kinds of synchronous/asynchronous set/rests there is very little need to instantiate them - it is as easy if not easier to describe them behaviorally and you always get the synthesis result you would normally expect.

On the other hand, the combinatorial logic can be in theory as complex as you want, virtually any boolean logic function of any number of boolean input arguments. Xilinx FPGAs however have only one type of primitive for combinatorial logic, a 6-input LUT (look-up table) which comes in two flavors, LUT6 and LUT6_2. Virtually any logic function of any number of inputs can be obtained by connecting two or more LUT6es in a tree like structure using programmable routing resources. And this is what an FPGA is in a nutshell.

Hierarchical structures in an HDL design are a fundamental idea - you can decompose a more complex thing into smaller pieces, then express these as sets of even smaller and simpler pieces recursively, as deep as you want. Primitive instantiations are not different except that they are leaf nodes in the tree-like design hierarchy, there is no need to further describe what they are made out of or what their functionality is. In that sense the LUT6 primitive is such a simple building block, a combinatorial function of up to 6 inputs. You use it in a design by instantiating it, which means specifying the signals connected to its 6 inputs and output and then by choosing which one of the 2⁶⁴ possible logic functions of 6 inputs you want through a 64-bit generic parameter called INIT. The component definition of a LUT6 looks like this:

  component LUT6
    generic(INIT:BIT_VECTOR(63 downto 0));
    port(I0,I1,I2,I3,I4,I5:in STD_LOGIC;
         O:out STD_LOGIC); 
  end component;

And here is where a real problem occurs, it is not trivial by any means to come up with the INIT value that corresponds to the desired logic function, for example a 6-input XOR function has an INIT value of 64x"6996966996696996", a 6-input AND is 64x"8000000000000000" but this is as far as I can go creating INIT values by hand from the boolean logic equation. Anything more complicated than this is very tedious and error prone. This is actually the reverse of the Karnaugh Map problem of creating a boolean logic equation out of a binary vector or matrix, which is not a trivial one either:

https://en.wikipedia.org/wiki/Karnaugh_map

The good news is that there is a very neat coding trick that solves this problem in a very simple and elegant way, which is the real reason for this post. Lets say, as an example that we want to use a LUT6 to implement an AOI logic gate, three 2-input ANDs, followed by a 3-input OR and then inverted. The trick is to define 6 "magic" constants and then let the synthesis tool build the INIT value for us starting from the logic equation that we need. We could have named the six constants any other way but by using the same names as the LUT6 input ports the implementation becomes really obvious:

  constant I0:BIT_VECTOR(63 downto 0):=X"AAAAAAAAAAAAAAAA";
  constant I1:BIT_VECTOR(63 downto 0):=X"CCCCCCCCCCCCCCCC";
  constant I2:BIT_VECTOR(63 downto 0):=X"F0F0F0F0F0F0F0F0";
  constant I3:BIT_VECTOR(63 downto 0):=X"FF00FF00FF00FF00";
  constant I4:BIT_VECTOR(63 downto 0):=X"FFFF0000FFFF0000";
  constant I5:BIT_VECTOR(63 downto 0):=X"FFFFFFFF00000000"; 
  signal I:STD_LOGIC_VECTOR(5 downto 0);
  signal O:STD_LOGIC; 
begin 
  aoi:LUT6 generic map(INIT=>not((I0 and I1) or (I2 and I3) or (I4 and I5)))
           port map(I0=>I(0),I1=>I(1),I2=>I(2),I3=>I(3),I4=>I(4),I5=>I(5),O=>O);

This of course is just a silly example, after all you can achieve exactly the same thing in VHDL in much simpler behavioral ways, like this:

  O<='0' when I(1 downto 0)="11" or I(3 downto 2)="11" or I(5 downto 4)="11" else '1';

or even:

  O<=not((I(0) and I(1)) or (I(2) and I(3)) or (I(4) and I(5)));

The point here is not doing the LUT mapping by hand through LUT6 primitive instantiations, this is actually the synthesis tool's job. But when for some reason synthesis does not produce what we expect or want, it is good to know how to achieve that. Another advantage of LUT primitive instantiation is that you can manually place such a primitive at a particular XmYn slice location using loc or rloc attributes and even inside a slice using the bel attribute.

Finally, what is the LUT6_2 primitive? The only difference is that it has two outputs, O5 and O6. You can think of it as two LUT5s with common inputs I0 to I4. The output of one LUT5 is O5 and a mux between the two LUT5 outputs using I5 as the select input becomes O6:

  component LUT6_2
    generic(INIT:BIT_VECTOR(63 downto 0));
    port(I0,I1,I2,I3,I4,I5:in STD_LOGIC;
         O5,O6:out STD_LOGIC);
  end component;

LUT6_2.O6 is the same thing as LUT6.O and the INIT generic is also identical. LUT6 is actually a subset of LUT6_2, with the O5 output not used. Smaller versions called LUT1, LUT2, LUT3, LUT4 and LUT5 also exist but these are just particular cases of a LUT6, with some inputs not used so LUT6_2 is the basic FPGA LUT primitive, all others are just particular cases.

And this is all one needs to know about LUT primitive instantiations in Xilinx FPGAs. In the next post I will present a more complex example where LUT6 primitive instantiations actually make a big difference compared with synthesis inference from behavioral code.

Back to the top: The Art of FPGA Design

Top Comments

DAB over 7 years ago +1

Nice post. DAB

DAB over 7 years ago

Nice post.

DAB
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel