The Art of FPGA Design Season 2 - Post 1

20 Oct 2020

Crossing the gap between mathematics and physics

This entire second season of The Art of FPGA Design will be about Digital Signal Processing. This is a vast subject one cannot cover in one book, let alone a few blog posts. It is a strange field which requires mastering both mathematics to express the algorithms and electrical engineering to come up with their implementation. We can talk about algorithms without having to worry about their implementation and that's math, or digital hardware design that does no signal processing and that's fine too. But doing signal processing with digital hardware is something that requires at least some level of expertise in both fields, and that's a tall order.

The reason for that is the fact that while mathematics is the language of science and engineering, mathematicians are rarely interested in the physical implementation details, while engineers, while needing math as a language to specify the problem and the solution lose interest in the finer theoretical details because they have to worry about more mundane issues like timing closure, power dissipation and costs, things that mean little to a mathematician. For this reason there is a significant barrier between the two sides of DSP design and overcoming it is not a trivial task.

The blog will focus on crossing this gap between math and physics, not the actual algorithms or the details of the implementation, like which language or tool flow to use, what coding style and other details like that. The reason is that there are excellent resources, many of them free, that are already covering both areas. So this blog will not be a great introduction to either. While I will assume at least some kind of familiarity with both sides, you do not have to be an expert in either to be able to follow along.

As a hardware designer with little exposure to the math behind every single type of filter, if you can read an algorithm in its mathematical form, that would be good enough to serve as a starting point. If you are a system architect with little or no experience in digital hardware design and you do not know what the difference between an FPGA and an ASIC is, I hope you will get a better understanding of the fact that coming up with the algorithm or mathematical formula that solves a design problem is not the end of the story, in fact it is just the beginning.

The two sides of this coin are like two different countries. Not only do people there speak two different languages, they are making wrong assumptions about the other side. Here is an example.

A system architect and a hardware designer are part of a team that needs to build a filter. The system architect decides what kind of filter is best, what order it should be, he's talking about impulse responses, convolution theorem and frequency transfer functions, concepts that may or may not mean anything to the hardware designer. She is concerned about what FPGA to use, how much it will cost, if it will fit into the available power budget, what should the system clock rate be and so on, things that probably also mean little for the system architect. But they need to meet somewhere in the middle and this is what the system architect is coming up with:

"There is a discrete signal x[n] and for every input sample you need to compute an output y[n] according to the formula above. The input and output sample rates are equal to 400 Msps, which means millions of samples per second. N is constant and equal to 39, but it might get higher or lower when this filter becomes part of an entire system and everything will be tested together. The N coefficients h[k] are constant for now but that too could change."

The system architect could explain that the coefficients are actually the filter's impulse response, that the sum is a convolution, the entire system is linear and time invariant, that the Fourier transform of the filter's impulse response is the frequency transfer function but the hardware designer needs not know all that. What she really needs to ask are things about the precision with which data and coefficients are represented, if the coefficients are symmetric or not and if yes is N odd or even, if this a half-band filter or not and how many such filters need to run in parallel, but she doesn't do that.

Such questions may puzzle the system architect, isn't the simple math formula clear and good enough? Yes, it turns out that the coefficients are always symmetric and N is odd and the filter is a half-band one so some coefficients are always zero but why bother, that's just a particular case and the original formula is simpler and clearer and more elegant. And yes, there are 64 channels and there is a need to build 64 such filters running in parallel, but they are all identical and the same formula applies to all of them. From his point of view his job is done.

The hardware designer does not ask all these questions, she just takes the formula given to her and implements it as a hardware design running at 400 MHz, using 39 multipliers and 38 adders. She makes it a generic design, so it will be easy to change the filter order and the coefficient values whenever the System Architect changes his mind. This one channel module is then instantiated 64 times, for a total of 2496 multipliers. She is able to close timing and the design goes into production. As the manager of this team you are happy with the result and being able to deliver on schedule and are ready to move on to the next project.

But it turns out that those questions left unasked mattered. Because of the coefficient symmetry and the particular type of FPGA they plan to use, it would be possible to implement a half-band FIR with a quarter of the resources a general non-symmetric one would require, so only 10 multiplications per channel are actually needed instead of 39. Not only that, but with careful design the FPGA could close timing at 800 MHz, which would reduce the number of resources by another factor of 2, which means only 320 multipliers are actually needed in total for all 64 channels. This is 8 times less than the initial estimate! These savings are so significant that they could have used a smaller and cheaper FPGA and saved $10 on the total bill of materials. The estimated production volume is 10K units per year, which would be enough to hire another system architect or hardware designer but not both. As the manager of this team, what would you hire, another system architect or another hardware designer? Or maybe your boss should hire another program manager?

The disconnect runs deeper than just a language barrier. When the system architect says x[n] he means the entire infinite series of samples. x[n−10] is simply ten samples in the past and x[n+5] five samples in the future, they are both equally valid from a mathematical point of view. For the hardware designer x[n] means the current sample, the only one available right now. You do not have access to x[n−10] at time n unless you buffer all the samples in between, and this has a cost associated with it. The further in the past you need to go, the more resources you need to use and that is not free. As for x[n+5] that's just not available at all, unless you have time travel in your toolbox of design tricks. Time means different things to the two, for the mathematician time is just another variable in his equation and the formula exists outside time itself. The hardware designer lives in the real world, where everything happens in the present, access to the future is impossible and access to the past is possible, but it is not free.

Then there is the problem of numbers. The mathematician operates with real or complex numbers. To him that means infinite precision, x[1]=π would be perfectly OK, for example. But the reality is that we cannot represent infinite precision real numbers in hardware, as this would require an infinite amount of resources. In particular, it is impossible to represent the number π as an exact numerical value. In hardware we can only work with numbers of the form K^2ⁿ , where K and n are integers and both must have a finite range, they cannot be of arbitrary sizes. This quantization applies to both data, coefficients and intermediate results. The fact is that the hardware implementation will never match the mathematical model exactly, and it is impossible to make the two identical. It gets worse, because of quantization and possible overflows, the system is neither linear nor time invariant. Yes, it is possible to increase the number of bits used for K and n and reduce these undesired effects to an insignificant level, but this is not free and how much precision do you really need? You can double the number of bits used for representing the numbers, but this would also double the number of resources you need and increase cost and power dissipation needlessly.

Finally, the hardware designer will have to change the algorithm to achieve an efficient implementation in terms of resource utilization, system clock speed and so on. She would have to consider things like pipelining and resource sharing, and she would have to make these changes in a way that does not break the original algorithm. Increasing pipelining, for example will increase the filter latency. This might be irrelevant for the entire system, or at least easy to compensate, or it might be totally unacceptable. She needs to clear these changes as well as the quantization decisions with the system architect before deciding on the final implementation.

It is also possible that there are different ways to implement a particular algorithm, or even completely different algorithms that would solve the same problem in different ways. Which one is best might not be obvious unless you consider both sides of the design process at the same time. We will see for example that we can implement an FIR equation like the one in the example above in two different ways, a direct form and a transpose form. From a mathematical point of view, they are identical. They are both described by the same formula, and the mathematician could not tell the two apart. But which one is better from an implementation point of view depends on the filter order, the type of FPGA you are using, and the amount of extra pipelining latency that the system can tolerate.

I hope this long introduction did not bore you to death, but I needed to point out this major disconnect that few people are aware of and make a case for a need for more rigorous methods and techniques to address it. In the next two posts we will take a look at the basic building blocks mathematicians and hardware designers use to describe DSP algorithms respectively implement them as a first step in the attempt to build a bridge over this large gap.

Back to the top: The Art of FPGA Design Season 2

DAB over 4 years ago

Far from bored.

This step is critical for any implementation, it just gets more detailed when you consider solving the problem with an FPGA.

Well done.

DAB
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel