element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
FPGA
  • Technologies
  • More
FPGA
Blog The Art of FPGA Design - Post 20
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join FPGA to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: fpgaguru
  • Date Created: 20 Nov 2018 4:15 PM Date Created
  • Views 2037 views
  • Likes 5 likes
  • Comments 1 comment
  • xilinx
  • fpgafeatured
  • vhdl
  • guest writer
Related
Recommended

The Art of FPGA Design - Post 20

fpgaguru
fpgaguru
20 Nov 2018

The DSP48 Primitive

 

This post will start a longer series dedicated to the DSP48 primitive, a MAC (multiply/accumulate) block which is the workhorse for any kind of signal processing design that requires lots of mathematical operations beyond simple additions or subtractions, which are well handled with fabric based implementations that use the dedicated carry chain primitives.

 

The DSP48, of which there are multiple flavors, one for each Xilinx FPGA family, started as a signed 18x18 multiplier in the earliest Virtex devices, about 20 years ago. Over time the size of the multiplier has increased to 25x18, then 27x18 and a 48-bit post adder and a 25/27-bit preadder have been added.

 

Simplifying things a bit we can say that a DSP48 computes expressions like P=(A+D)*B+C, where A and D are 25 or 27 bits, B is 18 bits and P and C are 48 bits, all signed numbers. By the way, the variable names I used in the above expression match the DSP48 input and output port names, which is of course a good coding practice.

 

Leaving the historical families behind, we will focus on 7-Series (Spartan7, Artix7, Kintex7, Virtex7 and Zynq7000), which contain a primitive called DSP48E1 and UltraScale/UltraScale+ (Kintex, Virtex and Zynq MPSoC), which have the newer 27x18 flavor called DSP48E2. Xilinx FPGAs contain from as few as 10 DSP48s in the smallest Spartan7 device XC7S6 to as many as 12,288 in some of the largest Virtex UltraScale+ devices, VU13P and VU29P. Similarly, the data sheet maximum clock speed is between 464MHz in the slowest speed grade Spartan7 and Artix7 to 891MHz in the fastest speed grade UltraScale+. This means that the peak performance of DSP48s in the fastest speed grade VU13P device is almost 11TMACs (11 thousand billions of 27x18 multiplications and 48-bit additions every second).

 

While data sheet numbers are generally values that cannot be easily achieved in normal designs, this is not the case with the DSP48 - as a general rule of thumb, whatever the datasheet DSP48 fMAX value is for a particular device family and speed grade, that level of performance can be relatively easily achieved if proper design rules are followed.

 

Even more importantly, these multiply and accumulate operations are not independent of each other, in typical designs the vast majority of them are sums of products, in some cases of many such terms. FIR filters, complex multiplications, FFTs, linear algebra matrix operations, convolutional neural networks are just a few examples. All DSP48 primitives are organized in vertical columns spanning the entire height of a device, with dedicated cascade connections between them going up along the column. These dedicated cascade chains do not use normal fabric routing so they do not add to routing congestion and their speed is not affected by unrelated logic. You can chain all the DSP48s in a column and compute a huge sum of products at full speed, without impacting or being affected by the rest of the design in the fabric. The DSP48s not only implement the multiplications, but the additions required to calculate the sum of products are also free, provided by the post adders and the dedicated column cascade routing.

 

Obviously, the devil hides in the details, as it always does. While you can indeed achieve maximum DSP48 fMAX, this requires pipelining. There are multiple optional registers inside the DSP48 and they have to be all used to reach that speed. The properly pipelined DSP48 latency is 4 clocks if the A+D preadder is used and 3 clocks if it is bypassed. Lower latencies can be achieved at the cost of a reduced clock rate but this is generally not a good design choice, since it leads to a less efficient design. The columnar nature of the DSP48s makes it easy to compute the sum of products but transferring the operands and the result from and to the fabric or between columns could become a placement problem.

 

Finally, as with the other coding examples we have seen earlier, the synthesis tool performance is mixed - when it works it works well and in most cases it is perfectly possible to infer DSP48s from behavioral code and still achieve optimum device utilization and speed. But most cases is not all cases and it is not uncommon to have to instantiate DSP48 primitives to achieve the desired level of performance. While you can infer a DSP48 without a detailed knowledge of how the primitive works, you cannot instantiate one without that knowledge.

 

We will explore both coding styles in the next posts dedicated to the DSP48 primitive.

 

Back to the top: The Art of FPGA Design

  • Sign in to reply
  • RAJAT99
    RAJAT99 over 2 years ago

    Avnet boards come with speed grade -1, -2 etc. What does it mean?

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube