Sorting Networks - The Verification Problem
In the final post on this subject of parallel sorting using FPGAs I will talk about the difficult issue of how to verify that such a design actually works. That is when given a set of any N input numbers in any arbitrary order the design outputs them in ascending sorted order. There are two main verification strategies for hardware designs, based on formal proof and exhaustive testing. The first one uses logical mathematical methods to prove that a particular assertion, in this case that a sorting network design actually works, is true. As you can imagine this is not trivial and requires that the sorting network is algorithmically generated, usually using recursion - this is actually true in the case of Batcher's odd-even mergesort algorithm but it is not in the case of the optimal sorting networks, which are more or less random. This is one of the reasons why we do not even know what the optimal sorting networks are for N>10. Exhaustive testing on the other hand works on any sorting network and is based on testing every single possible input combination - if all of them yield a sorted output then the design obviously works.
The problem with exhaustive testing is that the number of different input sets is virtually infinite, even for very small Ns. Instead of testing all possible different input combinations of any numbers we can only test all the possible permutations of the numbers from 1 to N. While finite, the number of such permutations is N!, which grows extremely fast with N. For example 16!=20.9·1012 so exhaustive testing for even such a small case is not practical.
An important concept in sorting network testing is the 0-1 principle. Instead of testing all the N! possible input permutations it is enough to test all the possible combinations of inputs that have only values of 0 or 1. There are "only" 2N such input test cases and 216=6.6·104 , a much smaller number. It can be mathematically proved that if a sorting network correctly sorts all possible input combinations of 0s and 1s then it will also correctly sort any permutations of the numbers from 1 to N. The actual testing is pretty simple, an N-bit binary counter can generate all possible combinations of N 0s and 1s by simply counting from 0 to 2N−1 . On the output side the checking is also very simple, two priority encoders determine the position of the first 0 from the bottom and of the first 1 from the top, if the first index plus 1 is equal to the second index then the output is sorted, otherwise we have an error. Writing a simulation testbench along these ideas is easy and the simulation times for N up to 16 or even a bit more are probably reasonable, considering any task that takes less than a day to run as reasonable. We can accelerate this testing process for larger sorting networks by doing the testing in hardware, using an FPGA. It is easy to create a test design along the same idea as the testbench, an N-bit binary counter driving a sorting network of size N with the base type UNSIGNED(0 downto 0), followed by two priority encoders and a comparator. Testing an N=16 sorting network in hardware this way would take only 73us. Since sorting networks of binary elements are really small, for larger Ns we can even have multiple instances working in parallel on subsets of the input test values. But the execution time still grows exponentially with N and if we want to test exhaustively an N=64 sorting network with 128 test instances running in parallel in an FPGA at 891MHz it would take over 5 years!
Things are not that bad with the Batcher sorting network design however, especially if we apply the 0-1 principle. Remember that the algorithm consists of dividing the input data into two halves, sorting them separately and then merging the two sorted halves back into the final result. Let's assume for now that the two smaller N/2 sorters are known to be working, then we only need to prove that the final merge step works. The number of distinct sorted lists of size N/2 made out of only 0s and 1s is actually N2 +1 so the total number of input test vectors required to test a merger of size N is just (N2 +1)2 . The only thing missing is the proof that the two N/2 sorters do also work. This requires the proof through exhaustive testing that an N/2 merger works and the assumption that N/4 sorters work. We apply this reasoning recursively until we reach mergers and sorters of size 1 or 2, which are trivial to test. In the end, all we need to test to validate a sorter of size N are mergers of sizes N, N/2, N/4... etc. Through this clever combination of formal recursive testing of the entire mergesort design combined with polynomial exhaustive testing only for the merge step of the algorithm we can test the N=64 Batcher sorting network with only 33*33+17*17+9*9+5*5+3*3=1493 different test cases, which is virtually nothing compared with the previous case of exhaustively testing an optimal sorting network of the same size.
We can now conclude this long sorting network saga with the final observation that while the Batcher odd-even mergesort algorithm is not optimal, in the sense that it will not produce the smallest possible sorting network for larger Ns, it is efficient enough and its recursive definition makes both a generic implementation as well as a complete mixed formal and exhaustive verification possible for any size N. If you are faced with the task of sorting small to medium sets of data at very high rates in FPGA hardware then Batcher's algorithm is probably the best possible choice we know of so far.
Back to the top: The Art of FPGA Design