element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
FPGA
  • Technologies
  • More
FPGA
Blog The Art of FPGA Design - Post 19
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join FPGA to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: fpgaguru
  • Date Created: 13 Nov 2018 3:57 PM Date Created
  • Views 2111 views
  • Likes 3 likes
  • Comments 1 comment
  • xilinx
  • fpgafeatured
  • vhdl
  • guest writer
Related
Recommended

The Art of FPGA Design - Post 19

fpgaguru
fpgaguru
13 Nov 2018

Using the Carry-Save Adder, The Constant Coefficient Multiplier

 

Multiplications in Xilinx FPGAs are done using DSP48s, which are primitives that consist of a 25x18 signed multiplier, a 25-bit preadder and a 48-bit postadder/accumulator. In UltraScale/UltraScale+ FPGA families the signed multiplier is 27x18 and the post adder has three inputs instead of just two. Depending on the FPGA size and family there are hundreds to thousands of such DSP48 primitives, that are able to do one multiply and two adds every clock at frequencies from 500 MHz to over 800 MHz.

 

I will need a few posts to talk about the DSP48s but before we do that I want to talk today about a particular type of multiplier where one of the two operands is always a constant - the Constant Coefficient Multiplier or KCM. While you can always use a DSP48 for that, tying one of the two factors to a constant value, there is a more efficient carry save adder fabric based implementation.

 

Let's assume that we want to do a signed 27x18 multiplication similar to the DSP48 one, but where the 18-bit factor is a constant. We will see that we can do this with the equivalent of only four adders, even less in some particular cases. A KCM multiplication can always be implemented with the classic shift and add algorithm but where we only add the shifted versions of the multiplicand for which the multiplier has non-zero bits. This means that we can use an adder tree of the kind we saw in the previous post and in theory we need to add up to N terms, where N is the bit size of the constant coefficient, 18 in our case. If we use simple ripple carry 2-input adders we will need up to 17 of them, when the coefficient is 0x3FFFF for example. But we have seen that using 3-input carry save adders that number can be reduced to only 9, an improvement of 2x in the size of the design. Another disadvantage of this solution is that the number of terms to be added depends heavily on the coefficient value - the more non-zero bits there are the more adders we will need.

 

However, with a clever math trick we can further reduce the number of adders, to a point where four 3-input adders are enough to implement a KCM for any 18-bit constant coefficient value. The trick is called Canonical Signed Digit (or bit in the binary case) representation. Rather than going into too much math detail, here is a link to a Wikipedia page that explains the basic idea:

 

https://en.wikipedia.org/wiki/Canonical_signed_digit

 

If you remember some earlier posts, subtraction is as easy to do as addition, including in the 3-input CSA version. Rather than using 0 and 1 to represent the constant multiplier we can use -1, 0 and +1. The number representation is no longer unique and among the large number of possible encodings there is always one where at least half of the bits are always 0. So we can always find a CSD representation for any constant coefficient where we need to add or subtract at most 9 terms of shifted multiplicands. We already know that a ternary adder tree of 9 terms has only four nodes - we just reduced the size of the design from the naïve 17 adders solution to one that requires only 4, a more than 4x improvement.

 

As an exercise let's build a 27x18 constant multiplier where the coefficient is sqrt(0.5)=0.70710678... A good fractional approximation is sqrt(0.5)≈46341/65536=0.7071075439453125, which is just 1E-6 bigger than the exact value.

The hexadecimal representation of 46341 is 0xB505, which has 7 non-zero bits so we do not even need the CSD representation with positive or negative bits, we can however chose how we create groups of three bits and we can take advantage of a common 3-input partial sum, which lets us implement the KCM for sqrt(0.5) with only 2 carry save adders. To do this we start with the binary representation of the constant coefficient:

 

sqrt(0.5)≈46341/65536=0xB505/0x10000

 

            1111111

10.1234567890123456

00.1011010100000101

 

The 18-bit value can be written as the addition of three partial sums:

 

            1111111

10.1234567890123456

00.1000010000000100

00.0010000100000001

00.0001000000000000

 

The first partial term is a 3-input sum of the multiplicand right shifted by 1, 6 respectively 14 bits. The second term is just the first term right shifted by 2 bits, while the third partial term is simply the multiplicand right shifted by 4 bits.

If I is the variable input and O is the output, we can compute the KCM O<=sqrt(0.5)*I with only two 3-input adders using an intermediate result X:

 

X<=SHIFT_RIGHT(I,1)+SHIFT_RIGHT(I,6)+SHIFT_RIGHT(I,14)

O<=X+SHIFT_RIGHT(X,2)+SHIFT_RIGHT(I,4)

 

We can reuse now the CSA3 module created earlier and write the KCM in VHDL very easily:

 

use IEEE.NUMERIC_STD.all;  
use work.TYPES_PKG.all; 

entity KCM is                      -- O=Sqrt(0.5)*I
 
generic(PIPELINE:BOOLEAN:=TRUE); -- LATENCY will be 2 if PIPELINE else 0
 
port(CLK:in STD_LOGIC:='0';
       I:
in SFIXED;
       O:
out SFIXED);
end KCM; 

architecture FAST of KCM is
 
signal X:SFIXED(I'high downto I'low-14);
 
signal Y:SFIXED(I'range);
begin
-- X=SHIFT_RIGHT(I,1)+SHIFT_RIGHT(I,6)+SHIFT_RIGHT(I,14)
  a1:entity work.CSA3 generic map(PIPELINE=>PIPELINE)
                     
port map(CLK=>CLK,
                               A=>SHIFT_RIGHT(I,1),
                               B=>SHIFT_RIGHT(I,6),
                               C=>SHIFT_RIGHT(I,14),
                               P=>X);
-- Y is I delayed by one clock if PIPELINE=TRUE
  sd:entity work.SDELAY generic map(SIZE=>BOOLEAN'pos(PIPELINE))
                       
port map(CLK=>CLK,
                                 I=>I,
                                 O=>Y);
-- O=X+SHIFT_RIGHT(X,2)+SHIFT_RIGHT(Y,4)
  a2:entity work.CSA3 generic map(PIPELINE=>PIPELINE)
                     
port map(CLK=>CLK,
                               A=>X,
                               B=>SHIFT_RIGHT(X,2),
                               C=>SHIFT_RIGHT(Y,4),
                               P=>O);
end FAST;

 

Note again the use of the arbitrary fixed point precision unconstrained SFIXED type, this module will multiply a signed fixed point input I of any range by 46341/65536 and then truncate the result to the arbitrary range specified by O. With the PIPELINE generic we can have either a purely combinatorial implementation or one with a latency of 2 clocks which will run easily at speeds in the 800MHz range.  Here is the synthesis result when the I and O ports have both a SFIXED(1 downto -16) range and PIPELINED is TRUE:

 

image

 

This concludes the series of posts dedicated to the efficient 3-input carry save adder and also the quick look at various CLB primitives like LUT6, SRL16 and CARRY8. We will continue next week by starting to discuss the DSP48 multiply/accumulate primitive.

 

Back to the top: The Art of FPGA Design

  • Sign in to reply
  • DAB
    DAB over 6 years ago

    Very good series on the use of FPGA.

     

    DAB

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube