The Art of FPGA Design Season 2 Post 21

5 Aug 2025

You cannot always trust AIs

I asked four major AI chat bots the same question:

"What is the most efficient way to implement IIR filters? Reply in 40 words or less."

Here are the responses:

Perplexity AI:

"Use a cascade of second-order (biquad) sections in Direct Form II for IIR filter implementation. This approach ensures computational efficiency, numerical stability, and is suitable for both software and hardware, especially when paired with careful scaling and quantization."

ChatGPT 4:

"The most efficient way to implement IIR filters is using Direct Form II Transposed structure, which minimizes memory usage and improves numerical stability by reducing the number of delay elements and limiting quantization noise propagation."

Claude Sonnet 4:

"Use Direct Form II (transposed) structure for best numerical stability and minimal memory. Implement with fixed-point arithmetic on embedded systems or floating-point for general use. Cascade second-order sections for higher-order filters to reduce quantization errors."

Grok 4:

"The most efficient way to implement IIR filters is using direct form II transposed structure, minimizing memory and computational complexity with biquad sections, optimized coefficients, and fixed-point arithmetic for real-time processing on DSPs or microcontrollers."

All of them recommend the Direct Form II using a biquad cascade architecture and all of them are completely wrong. By a factor of 2.5x.

An IIR filter of order N will require 2.5N multiplications/output sample when implemented with a biquad cascade (every byquad has order 2 and uses 5 multiplications). This architecture also suffers from all the drawbacks mentioned in an earlier post, potential instability, coefficient quantization sensitivity issues, limit cycles, very difficult to pipeline, etc.

A much better architecture is a Lattice Wave Digital Filter structure, a 2-phase polyphase decomposition, with two branches implemented as cascades of second order all-pass sections, which I will describe in greater detail in a future post. An LWDF IIR filter of order N only needs N multiplications and there is even a bireciprocal version, very similar to FIR half-band filters, where half of the coefficients are zero, so only 0.5N multiplications/output sample.

The most common topology for all pass sections is called Richards, coming from microwave transmission line theory. There are two equivalent discrete implementations, Type 1 and Type 2. They are functionally equivalent and differ only in the numerical rounding noise propagation as a function of the coefficient value.

Type 1-B and 2-B can be used as first order all pass sections and cascading A and B Types creates a second order all pass section. Cascading second order sections and optionally one first order section if N is odd can create an all pass IIR filter of any order N. So the first and second order all-pass sections using this Richards topology look like this:

The problem with these all pass building blocks is that they minimize the number of delays at the cost of extra adders. A second order Richards all pas section has two multiplications, two delays and six additions. For an FPGA implementation this is suboptimal, we would like to use two DSP primitives and no fabric adders, which means two multipliers and four adders. Fortunately, there is an alternate all pass section implementation, which only uses two multipliers and four adders, at the cost of more delays, which is not a problem for an FPGA implementation. Here are how the first and second order sections look like:

and these are their H(z) transfer functions and the conditions for stability:

In general we also want to consider a packing factor n, where we replace every delay with n delays and z^-1 becomes z^-n in all formulas above.

The magnitude of the transfer function is obviously always 1.0, that's what all pass filters are all about. Calculating the phase of the transfer function is relatively easy but I drew the line at manually computing the derivative for the group delays. So I used the free Mathematica Online tool to calculate that for both first and second order all-pass sections with a packing factor n:

We can now use these formulas to compute both the phase and the group delay at any frequency f. Since a cascade of all pass sections is still all pass, both its phase and group delay are the sums of individual sections phases and group delays and we can use that to analyze any IIR filter built with first and second order sections. .

In the next posts, I will show first how to pipeline the first and second order all pass blocks so that they fit efficiently into DSP primitives and run at the maximum possible clock frequency. Next we will see how these efficient all pass sections can be used to implement many types of IIR filters, including almost linear phase ones.

In conclusion, there is no reason one should use biquad cascades, the LWDF architecture is superior in every way.

To close on the whole AI subject and for full disclosure, I used Perplexity.AI to come up with the correct Mathematica syntax.

Back to the top: The Art of FPGA Design Season 2

Top Comments

fpgaguru 3 months ago in reply to BigG +2

The first two links are AI slop, not necessarily incorrect, but providing little to no useful information. The third link is interesting but there is very little detail on the FPGA side of things. I skimmed…

DAB 3 months ago

There is no short cut to understanding how to properly implement filters with FPGA technology.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
michaelkellett 3 months ago in reply to BigG

AI isn't my thing but AMD/Xilinx to make the Versal AI range with built in AI cores:

https://www.xilinx.com/content/dam/xilinx/publications/solution-briefs/xilinx-versal-ai-compute-solution-brief.pdf

These don't look low power, or cheap or suitable for "edge" computing.

Altera/Intel doesn't seem to have an equivalent (although they talk about "AI-infused FPGA fabric" I couldn't easily find a part !)

It may not be a coincidence that AMD were established with GPUs before they bought Xilinx.

MK
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
BigG 3 months ago in reply to fpgaguru

Thanks for the feedback (yes examples weren't the best - hence saying random).

I should've qualified the question as I left it a little too open. Anyway, you picked up on a possible niche area by saying that FPGA might be of use for edge (embedded) devices. I'd agree here as other design facets come into play here, such as energy consumption. Still, it sounds like a long way off, i.e. still mainly an academic possibility, due to complexities involved.

I thought, maybe FPGA has value once an optimum neural network has been found for a particular AI application, e.g. say for noise signature detection, then you design the decision tree hardware to suit. But it sounds like I'm barking up the wrong tree here (still lots to learn).
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
fpgaguru 3 months ago in reply to BigG

The first two links are AI slop, not necessarily incorrect, but providing little to no useful information. The third link is interesting but there is very little detail on the FPGA side of things. I skimmed through the github repository but I could not find anything FPGA related there.

About the general FPGA and neural networks/AI question, I do not think they are a very good match. For training, FPGAs cannot compete with GPUs and especially with custom ASIC engines, there is simply no way. For inference, especially at the edge, maybe. FPGAs make sense for applications where the volume is low and there is a need for flexibility, short time to market or prototyping. It will always be a niche application and in danger to be replaced with an ASIC as soon as it becomes established and the volumes rise.
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel
BigG 3 months ago in reply to BigG

Some random sources on FPGA and neural networks/AI:

https://milvus.io/ai-quick-reference/is-it-possible-to-implement-a-neural-network-on-an-fpga

https://hardwarebee.com/fpga-for-ai-artificial-intelligence-ultimate-guide/

https://arxiv.org/html/2506.11760v1

Your FPGA blogs have been fantastic and would love to understand this application better, if you ever find the time.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel