element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
FPGA
  • Technologies
  • More
FPGA
Blog The Art of FPGA Design - Post 22
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
FPGA requires membership for participation - click to join
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: fpgaguru
  • Date Created: 4 Dec 2018 7:41 PM Date Created
  • Views 1437 views
  • Likes 5 likes
  • Comments 2 comments
  • xilinx
  • fpgafeatured
  • vhdl
  • guest writer
Related
Recommended

The Art of FPGA Design - Post 22

fpgaguru
fpgaguru
4 Dec 2018

The DSP48 Primitive - Behavioral Symmetric FIR Inference

 

The DSP48 primitive has an optional preadder function, which can be used to compute things like PCOUT=PCIN+(A+D)*B, which when used for implementing symmetric or anti-symmetric FIRs can reduce the number of multipliers used in half.

 

The following diagram shows how such a symmetric FIR is built using the case N=4, a symmetric FIR with 8 taps as an example:

image

The forward data delay line is identical to the one for the non-symmetric FIR but now we have a second backward data delay line. While the pipelining of the P adder cascade chain forced us to increase the number of forward delays per tap from one to two, in the case of the backwards going delay line the number of delays per tap must be decreased from one to zero - all the D inputs of all the DSP48s are driven by a single version of the input data delayed by 2*N. While this is not scalable in the sense that as the number of taps increases this net will become a critical timing bottleneck, it is always possible to replicate the single 2*N delay line in a way that makes timing closure at full speed still possible. Symmetric and anti-symmetric FIRs are easy to implement since the pre-adder can also do subtraction. Another interesting variation is even or odd symmetric FIRs. The odd-symmetric of 2*N-1 taps case can be easily reduced to the even 2*N case by increasing the forward delay in the first DSP48 in the chain from one to two clocks and dividing the last coefficient by 2, so the two cases are not fundamentally different and can both be implemented with the same code, which would look like this:


library
IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.all; 

use
work.types_pkg.all; – VHDL93 version of package providing SFIXED type support 

entity SYMMETRIC_SYSTOLIC_FIR is
 
generic(N:INTEGER; ODD:BOOLEAN:=FALSE; -- if ODD the FIR orser is 2*N-1 else it is 2*N
          ANTISYMMETRIC:BOOLEAN:=FALSE;
          BEHAVIORAL:
BOOLEAN:=TRUE);
 
port(CLK:in STD_LOGIC;
       CI:
in SFIXED_VECTOR; -- set of N symmetric coefficients, filter order is 2*N if even or 2*N-1 if odd - in this case set the middle coefficient to half the desired value
       I:in SFIXED;         -- forward data input
       O:out SFIXED);       -- filter output
end SYMMETRIC_SYSTOLIC_FIR; 

architecture TEST of SYMMETRIC_SYSTOLIC_FIR is
 
signal ID:SFIXED(I'range);
begin
 
assert I'length<28 report "Input Data width must be 27 bits or less" severity warning; assert CI'length/N<19 report "Coefficient width must be 18 bits or less" severity warning;

  sd:
entity work.SDELAY generic map(SIZE=>2*N-1)
                       
port map(CLK=>CLK,
                                 I=>I,
                                 O=>ID);

  ib:
if BEHAVIORAL generate
      
type TAC is array(0 to N) of SFIXED(I'range);
      
signal AC:TAC;
      
type TPC is array(0 to N) of SFIXED(I'high+(CI'high+1)/N+LOG2(N) downto I'low+CI'low/N);
      
signal PC:TPC;
    
begin
       AC(AC'low)<=I;
       PC(PC'
low)<=(others=>'0');
       lk:
for K in 0 to N-1 generate
           
signal A1,A2,D,AD:SFIXED(I'range):=(others=>'0');
           
signal B:SFIXED((CI'high+1)/N-1 downto CI'low/N):=(others=>'0');
           
signal M:SFIXED(A2'high+B'high+1 downto A2'low+B'low):=(others=>'0');
           
signal P:SFIXED(PC(K+1)'range):=(others=>'0');
         
begin
           
process(CLK)
           
begin
             
if rising_edge(CLK) then
                D<=ID;
               
if not ODD and K=0 then -- remove one A delay for the first tap if filter is even symmetric
                  A2<=AC(K);
               
else
                  A1<=AC(K);
                  A2<=A1;
               
end if;
               
if ANTISYMMETRIC then
                  AD<=RESIZE(D-A2,AD);
               
else
                  AD<=RESIZE(D+A2,AD);
               
end if;
                B<=ELEMENT(CI,K,N);
-- register for the coefficient inputs
                M<=B*AD;            -- multiplier internal register
                P<=RESIZE(M+PC(K),PC(K+1)); -- post-adder output register
             
end if;
           
end process;
            AC(K+
1)<=A2; -- A cascade output
            PC(K+1)<=P;  -- P cascade output
         
end generate;
       O<=RESIZE(PC(PC'
high),O'high,O'low); -- truncate the final sum to match the O output port range
   
end generate;
end TEST;

 

The top level instantiation module is identical to the one we used for the non-symmetric FIR case in the previous post, we just have two more generics to select between even/odd and symmetric/anti-symmetric FIR structures.

Unfortunately this symmetric FIR example is also the case where we reach limits of behavioral inference. While the synthesis result is functionally correct, it is far from optimal. The symmetric FIR filter implementation should use just 4 DSP48s and a few LUTRAMs for the SDELAY but the pre-adders are not mapped in the DSP48s and use fabric carry chains instead. The ideal synthesis result should consist only of the five blocks highlighted, four DSP48E2s and the SDELAY module:

image

The DSP48 inference has its limits and the same rule we used before applies - if we get the expected results this is by far the best coding style, it is the most compact, easier to understand and maintain and to a certain extent even portable, but if we do not get the result that we want then primitive instantiations are the way to go. There are of course some drawbacks when we use this design flow and the DSP48 primitive with its 50 generics and 50 ports is an extreme example for that.

 

In the next post I will introduce a generic wrapper for the DSP48E2 primitive in an attempt to make instantiation easier and combine the advantages of the two design flows.

 

Back to the top: The Art of FPGA Design

  • Sign in to reply

Top Comments

  • DAB
    DAB over 6 years ago +1
    Nice post. It would be really nice if you would run the filter on a waveform and walk us through how the filter works and clarify why this instantiation is better than an analog alternative. DAB
  • fpgaguru
    fpgaguru over 6 years ago in reply to DAB

    Why DSP48  primitive instantiation is better and when it should be used will make the object of the next few blog posts, so keep reading ;-).

     

    As for general Digital Signal Processing theory and how these filters work, that could be the object of a full other blog, maybe after I complete this one I will do that but I plan on at least 50 weekly posts on The Art of FPGA Design so it will be a while. In the meantime here is a list of free DSP books that could be used to get familiar with the field:

     

    Entry level, DSP with Python:

    http://greenteapress.com/thinkdsp/thinkdsp.pdf

     

    More advanced:

    http://ptgmedia.pearsoncmg.com/images/9780137027415/samplepages/0137027419.pdf

     

    Big, engineering oriented:

    https://users.dimi.uniud.it/~antonio.dangelo/MMS/materials/Guide_to_Digital_Signal_Process.pdf

     

    Rick Lyons, the author of the Understanding Digital Signal Processing books https://www.amazon.com/Richard-G.-Lyons/e/B00IZ4NJGK/ref=dp_byline_cont_book_1  maintains a list of free DSP resources here (it's 10 years old but still relevant):

    https://www.dsprelated.com/showarticle/56.php

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • DAB
    DAB over 6 years ago

    Nice post.

     

    It would be really nice if you would run the filter on a waveform and walk us through how the filter works and clarify why this instantiation is better than an analog alternative.

     

    DAB

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube