Multichannel Symmetric FIRs
In the last two posts we have considered the case when the FPGA clock frequency is faster than the FIR sample rate. The ratio between the system clock and the data sample rate is called the overclocking factor M. We have seen that there are two ways to take advantage of this, either implement M identical FIR channels with K DSP48s, where the filter order N is K for non-symmetric FIRs, 2*K for even-symmetric ones and 2*K-1 for the odd-symmetric case, or implement a single FIR using only K/M DSP48s. I have addressed all three types of FIR symmetry for the second case but only non-symmetric multi-channel designs for the first case. In this post we will look at even and odd symmetric multi-channel FIRs.
Like we did for the non-symmetric multi-channel FIR design, we start with a non-pipelined even-symmetric, single rate FIR, using for illustration a filter with K=8 and M=4 We then replace every single delay in the design with M delays, which transforms it into an M-channel, single rate, even-symmetric FIR:
The next step is deciding where to make the pipelining cuts. We need three longitudinal cuts to pipeline the pre-adder and the multiplier and one transversal cut for each filter tap to pipeline the post-adder:
We need to add one register for every wire that crosses a pipeline cut in the forward, input to output direction and remove a delay for every wire crossing a cut in the opposite direction. If you need a refresher on how pipeline cuts work, you can look back at Post 4. In the next diagram, the registers added for crossing the three longitudinal cuts are marked in magenta, the ones added for the transversal cuts in yellow, and the ones removed in cyan:
Note that there is one M-delay register at the last tap which remains unaffected because neither its input wire, nor its output one, cross a pipeline cut. Except for that middle delay, all the forward data delay line registers are now M+1 deep, while all the reverse ones are M-1. We can no longer use the DSP48 A cascade since there are only one or two delays inside the DSP48 for that, and we need at least 3 when M is 2 or larger. This next diagram shows what resources can fit inside DSP48s, and which ones, marked in white, need to be implemented in the FPGA fabric:
All the blocks that Like all the other FIR implementations we have seen so far, this one too is very efficient and scalable to any filter size.
For the odd-symmetric case we can change directly this pipelined implementation, instead of repeating all the steps. We need to make two changes. First, we need to remove the unique M-delay pipeline register mentioned earlier. This leads to the same data sample being multiplied with the last coefficient and added to the output result twice, so we correct this by dividing that coefficient by a factor of two. Alternatively, we could also scale up all the other coefficients by a factor of two, or leave the coefficients unchanged and remove the pre-adder for the last DSP48 stage:
This concludes the posts dedicated to overclocking FIRs, next we will examine the opposite case, when the system clock rate is smaller than the data sample rate.
Back to the top: The Art of FPGA Design Season 2