Our earlier lightweight CPU benchmarking provided some confidence that the various boards tested had no major performance faults and were working roughly inline with expectations given their clock speed and processor families. Networking is an area of performance that either doesn't get measured much or that is measured by ad hoc means which are hard to compare, and implementation anomalies are known to occur occasionally.
To try to put this on a more quantitative and even footing, I've picked a network measurement system that has an extremely long pedigree, the TTCP family of utilities. This has evolved from the original "ttcp" of the 1980's through "nttcp" and finally into "nuttcp". It has become a very useful networking tool, simple to use with repeatable results, open source, cross-platform, and it works on both IPv4 and IPv6. It's in the Debian repository, and if the O/S to be tested doesn't have it then it can be compiled from sources just by typing 'make' on the great majority of systems. (I cross-compiled it for Angstrom.)
Usage is extremely simple. A pair of machines is required to test the link between them. One is nominated the 'server' and has "nuttcp -S" executed on it, which turns it into a daemon running in the background. The other is nominated the 'client', and all the tests are run from it regardless of desired direction. The two most common tests to run on the client are a Transmission Test (Tx) using "nuttcp -t server", and a Reception Test (Rx) using "nuttcp -r server", both executed on the client with the hostname or IP address of the 'server' provided as argument.
These simple tests transfer data at maximum rate in the specified direction over TCP (by default), for an interval of approximately 10 seconds, and on completion the measured throughput is returned in Mbps for easiest comparison with the rated Mbps speed of the link. Here is a table showing my initial tests executed on various ARM client boards through a gigabit switch, with the server (nuttcp -S) running on a 2.33GHz Core2 Duo machine possessing a gigabit NIC. The final set of results was obtained between the Core2 Duo and an old Xeon server over a fully gigabit network path, just to confirm that the Core2 Duo wasn't bottlenecked in the ARM board tests.
Max theoretical TCP throughput over 100Mbps Ethernet is 94.1482 Mbps with TCP TimeStamps, or 94.9285 w/o.
For fairness, rows are ordered by 4 attributes: 1) Fast or Gigabit, 2) TCP TS or not, 3) ARM Freq, 4) Rx Speed.
Submitter | Rx Mbps | Tx Mbps | Client Board | SoC | MHz | Limits | O/S, kernel, driver |
---|---|---|---|---|---|---|---|
selsinork | 30.60 | 17.28 | 233-OLinuXino | i.MX23 ARM926 | 233 | No TS | ArchLinux 3.7.2-2 |
morgaine | 93.84 | 72.82 | RPi Model B | BCM2835 | 700 | Raspbian 3.1.9+ #272 | |
morgaine | 93.84 | 93.75 | BB (white) | AM3359 | 720 | Angstrom v2012.01, 3.2.5+ | |
Tim.Annan | 94.14 | 91.74 | Gumstix Pepper | AM3359 | 600 | 100M mode | Yocto 9.0.0 Dylan, 3.2 |
morgaine | 93.82 | 76.94 | RPi Model B | BCM2835 | 800 | Raspbian 3.1.9+ #272 | |
morgaine | 93.82 | 78.71 | RPi Model B | BCM2835 | 800 | 7/2012 u/s | Raspbian 3.6.11+ #545 |
morgaine | 94.14 | 78.87 | RPi Model B | BCM2835 | 800 | 9/2013 u/s | Raspbian 3.6.11+ #545 |
morgaine | 93.80 | 93.75 | BBB | AM3359 | 1000 | Angstrom v2012.12, 3.8.6 | |
selsinork | 93.92 | 94.46 | Cubieboard2 | A20 | 912 | VLAN TS | Debian 7.1, 3.3.0+ |
morgaine | 94.16 | 94.14 | BBB | AM3359 | 1000 | Debian 7.0, 3.8.13-bone20 | |
selsinork | 94.33 | 94.55 | Cubieboard2 | A20 | 912 | No TS | Debian 7.1, 3.3.0+ |
selsinork | 94.91 | 94.90 | BBB | AM3359 | 1000 | No TS | Angstrom 3.8.6 |
selsinork | 94.94 | 94.91 | i.MX53-QSB | i.MX53 | 996 | No TS | 3.4.0+ |
selsinork | 243.30 | 454.88 | Sabre-Lite | i.MX6 | 996 | No TS | 3.0.15-ts-armv7l |
Tim.Annan | 257.79 | 192.22 | Gumstix Pepper | AM3359 | 600 | Gbit mode | Yocto 9.0.0 Dylan, 3.2 |
notzed | 371.92 | 324.49 | Parallella-16 | Zynq-70x0 | 800 | Ubuntu Linaro | |
selsinork | 525.18 | 519.41 | Cubietruck | A20 | 1000 | No TS | LFS-ARM 3.4.67 + gmac |
selsinork | 715.63 | 372.17 | Minnowboard | Atom E640 | 1000 | No TS | Angstrom 3.8.13-yocto |
morgaine | 725.08 | 595.28 | homebuilt | E6550 | 2330 | PCI 33MHz | Gentoo 32-bit, 3.8.2, r8169 |
selsinork | 945.86 | 946.38 | homebuilt | E8200 | 2666 | PCIe X1 | 32-bit, 3.7.0, e1000 |
In addition to the results displayed in the table, I also ran servers (nuttcp -S) on all my boards and kicked off transfers in both directions from the x86 machine, and then followed that with board-to-board transfers just to check that the choice of clients and servers was not affecting results. It wasn't, they are very repeatable regardless of the choice, the throughput always being limited by the slowest machine for the selected direction of transfer. Running tests multiple times showed that variations typically held to less than 0.5%, probably a result of occasional unrelated network and/or machine activity.
The above measurements were performed over IPv4. (See below for IPv6.)
Hint: You can run nuttcp client commands even if a server is running on the same machine, so the most flexible approach is to execute "nuttcp -S" on all machines first, and then run client commands on any machine from anywhere to anywhere in any direction.
Initial observations: The great uniformity in BeagleBone network throughput (both white and Black) stands out, and is clearly not affected by CPU clock speed. Raspberry Pi Model B clearly has a problem on transmit (now confirmed to be limited by CPU clock) --- I'll have to investigate this further after upgrading my very old Raspbian version. And finally, my x86 machinery and/or network gear is clearly operating at far below the rated gigabit equipment speed --- this will require urgent investigation and upgrades, especially of NIC bus interfaces.
Confirmation or disproval of my figures would be very welcome, as well as extending the tests to other boards and O/S versions.
Morgaine.
Addendum: Note about maximum theoretical throughput added just above the table after analysis in thread below.