The Single Rate odd-symmetric FIR
In the last post we have examined the even-symmetric FIR, a filter of order N=2*K. The main conclusion was that we only need K DSP48s to implement such a filter, and we came up with a basic building block that is both efficient in terms of device utilization, generic and scalable. We can build filters of virtually any size simply by cascading this block, with no speed degradation as the filter gets larger.
We will now consider the odd case, when N=2*K-1. The mathematical starting point would look like this:
It is not fundamentally different from anything we have seen so far. The coefficients are symmetric and the there is an odd number of them, with a single central coefficient, C(4) in this example.
We could go once more through all the steps we went through with the even-symmetric case. But we can avoid all that and start instead from the result we have achieved already, see The Art of FPGA Design Season 2 - Post 6 and change that to make the filter odd. Here is for reference the even-symmetric pipelined FIR implementation we will start from, with each DSP48 colored differently and one fabric delay shown in white:
The odd-symmetric FIR is shorter by one tap, and we can easily achieve that if we change the fabric delay length from 2*K-1 clocks to 2*K-2. The undesired side effect of this is that the center tap coefficient will be twice what we really need, so we compensate for that by replacing C(4) with C(4)/2, which gives us the following pipelined odd-symmetric FIR implementation:
This is mathematically correct and does exactly what we need. Except that we are using fixed point representations for coefficients and data and dividing the center coefficient which is always the largest one might have a negative impact on accuracy. If the original quantized coefficient C(4) was even, then using C(4)/2 instead will have no numerical precision impact and the design above is perfectly OK. But if the quantized coefficient was odd, then this scheme will lead to a loss in precision.
There are however two different ways around this problem. As mentioned above, the center coefficient is always the largest one, and it dictates the fixed point range for representing the entire coefficient set. Let's assume with no loss of generality that the coefficients are scaled so that the center coefficient is in the range [0.5,1.0). If all other coefficients are smaller in absolute value than 0.5 then we can do the scaling differently and gain, rather than lose dynamic range:
Because all the other coefficients besides the center tap one are smaller than 0.5 we can double them with no risk of overflow and this will give us one extra LSB of precision when they get quantized. The filter output will then be twice as large now, but we can easily compensate for that by dividing it by 2 after the last post-adder, an operation which is free in hardware.
If some coefficients are larger than 0.5 then we cannot use the trick described above but we can still avoid the loss of precision introduced by dividing the center tap coefficient by 2. This requires however changing the last building block, which is now different from all the others:
We just remove the D input path and the pre-adder for the last DSP48, making sure to maintain the latency through the A input path unchanged. There is no need to divide C(4) now, and there will be no loss of precision.
In conclusion, we can use the same building block of a DSP48 with a pre-adder to implement both even and odd symmetric single rate FIRs:
Depending on how we decide to scale for the middle tap in the odd case, we might have to make some minor changes to the last DSP48, but we can build both types of symmetric FIRs with essentially the same building block.
In the next post we will see if it is possible or not to have a scalable, symmetric, low latency implementation based on the transposed FIR architecture, similar to the non-symmetric one we saw previously in The Art of FPGA Design Season 2 - Post 5
Back to the top: The Art of FPGA Design Season 2