Taking advantage of coefficient symmetry in Polyphase FIRs
We have seen in previous posts that when the FIR coefficients are symmetric, we can use a DSP48 feature called a pre-adder and reduce the number of multipliers required in half. Essentially, an FIR of order N can be implemented with N/2 DSP48s.Taking advantage of the filter symmetry is important, especially when the FIR is large or there are many instances of such filters in a design. The DSP48s are a scarce resource and reducing their utilization number by a factor of 2x is very significant.
While the prototype filters used for polyphase interpolation and decimation FIRs are virtually always symmetric, the individual filter branches are generally not. With the very notable exception of 2-phase FIRs when the prototype filter order is odd, the individual branches of a polyphase FIR are not symmetric. We can illustrate this with the example of a 3-phase interpolator we have used earlier:
While the middle filter branch is symmetric, the first and last branches are not. This is not helpful, especially in a time division multiplexed implementation, where we want to use the same hardware resources to implement all the polyphase filter branches. If the number of polyphase branches M is odd, then only one branch is symmetric, while for M even there is no symmetric branch at all.
For this reason people tend to assume that even when the prototype filter is symmetric, it is not possible to take advantage of that with of polyphase FIRs and reduce the number of multipliers in half. This is not true. As a general rule, it is always possible to arrange things such that an M-phase FIR with a symmetric prototype filter of order N can be implemented using N/M/2 multipliers instead of the usual N/M. The reason this is so is the fact that while individual branches are not symmetric, they can always be grouped in pairs of branches that are transposed of each other. In the example above, the first and last branch have coefficient sets that are transposed of each other, [C0, C3, C6, C8, C5, C2] respectively [C2, C5, C8, C6, C3, C0]. The middle branch has the coefficient set [C1, C4, C7, C7, C4, C1] and is already symmetric. This fact can be used to reduce the number of multipliers in half. It does not matter what the number of polyphase branches M is, or if M is even or odd, you will always be able to group the non-symmetric ones in transposed pairs. So let's consider such a pair of branches of an interpolating FIR:
We will use z transform notation to keep the formulas simpler. Here H0(z) and H1(z) are the transfer functions of the two branches, they are polynomials in z−1. While neither one is symmetric, they are transposed of each other. A direct consequence of this fact is that H0(z)+H1(z) will be a symmetric transfer function, while H0(z)-H1(z) will be anti-symmetric. So we will replace the original non-symmetric transfer functions with these symmetric respectively anti-symmetric ones:
This does not produce exactly the result we need, but with the help of a post-adder and subtractor and some divide by 2s which are free in a hardware implementation, we can recover the desired results. For the cost of two adders, we can now reduce the number of multipliers needed in half. Not only that, but we can implement both branches with the same hardware resources since the pre-adder in a DSP48 can dynamically change to a pre-subtractor using the ALUMODE input port of the primitive.
A very similar solution exists for the polyphase decimator. The individual polyphase branches are identical in the two cases, they are just connected differently:
The same trick is used to convert a pair of non-symmetric but transposed FIR branches into a symmetric and an anti-symmetric one. This time, however, we need to use a pre-adder and pre-subtractor to make sure we get the correct result:
The net result is exactly the same, for the price of two extra adders we can again reduce the number of required multipliers in half by taking advantage of the symmetry of the modified polyphase branches.
If K=ceil(N/M) is the number of multipliers required by a classic, non-symmetric polyphase FIR implementation, this modified version will require only K/2 multipliers at the cost of one extra adder per branch. We have to use the ceiling function for the case when N is not an exact multiple of M. In the non-symmetric case this is handled in general by zero padding the set of N coefficients to K*M. For the symmetric version, this zero padding has to be done in a symmetric way, adding an equal number of zeros at the beginning and at the end of the coefficient set. If the zero padding is not done this way, the branch pairs will no longer be transposed of each other and the symmetry will be lost.
In particular, when K*M is even but N is odd, it is impossible to zero pad the coefficient set without breaking symmetry. The workaround for this situation is to increase slightly the prototype filter order from N to N+1, or even better to K*M. This will only improve the filter performance at no extra cost and will also make the order of the prototype filter even. This is not a problem, there is nothing special about odd order FIRs and there is nothing an even order FIR could not do equally well, with the very special exception of half-band polyphase filters. But these are very special cases, we already have efficient symmetric implementation for them as we have seen in Post 10 and Post 11.
Back to the top: The Art of FPGA Design Season 2