element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC Network Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 69 replies
  • Subscribers 58 subscribers
  • Views 8492 views
  • Users 0 members are here
  • nuttcp
  • network
  • raspberry-pi
  • bbb
  • BeagleBone
  • throughput
Related

SBC Network Throughput

morgaine
morgaine over 12 years ago

Our earlier lightweight CPU benchmarking provided some confidence that the various boards tested had no major performance faults and were working roughly inline with expectations given their clock speed and processor families.  Networking is an area of performance that either doesn't get measured much or that is measured by ad hoc means which are hard to compare, and implementation anomalies are known to occur occasionally.

 

To try to put this on a more quantitative and even footing, I've picked a network measurement system that has an extremely long pedigree, the TTCP family of utilities.  This has evolved from the original "ttcp" of the 1980's through "nttcp" and finally into "nuttcp".  It has become a very useful networking tool, simple to use with repeatable results, open source, cross-platform, and it works on both IPv4 and IPv6.  It's in the Debian repository, and if the O/S to be tested doesn't have it then it can be compiled from sources just by typing 'make' on the great majority of systems.  (I cross-compiled it for Angstrom.)

 

Usage is extremely simple.  A pair of machines is required to test the link between them.  One is nominated the 'server' and has "nuttcp -S" executed on it, which turns it into a daemon running in the background.  The other is nominated the 'client', and all the tests are run from it regardless of desired direction.  The two most common tests to run on the client are a Transmission Test (Tx) using "nuttcp -t server", and a Reception Test (Rx) using "nuttcp -r server", both executed on the client with the hostname or IP address of the 'server' provided as argument.

 

These simple tests transfer data at maximum rate in the specified direction over TCP (by default), for an interval of approximately 10 seconds, and on completion the measured throughput is returned in Mbps for easiest comparison with the rated Mbps speed of the link.  Here is a table showing my initial tests executed on various ARM client boards through a gigabit switch, with the server (nuttcp -S) running on a 2.33GHz Core2 Duo machine possessing a gigabit NIC.  The final set of results was obtained between the Core2 Duo and an old Xeon server over a fully gigabit network path, just to confirm that the Core2 Duo wasn't bottlenecked in the ARM board tests.

 

 

Max theoretical TCP throughput over 100Mbps Ethernet is 94.1482 Mbps with TCP TimeStamps, or 94.9285 w/o.

For fairness, rows are ordered by 4 attributes: 1) Fast or Gigabit, 2) TCP TS or not, 3) ARM Freq, 4) Rx Speed.

 

Submitter
Rx Mbps
Tx Mbps
Client Board
SoC
MHz
Limits
O/S, kernel, driver
selsinork30.6017.28233-OLinuXinoi.MX23 ARM926233No TSArchLinux 3.7.2-2
morgaine93.8472.82RPi Model BBCM2835700Raspbian 3.1.9+ #272
morgaine93.8493.75BB (white)AM3359720Angstrom v2012.01, 3.2.5+
Tim.Annan94.1491.74Gumstix PepperAM3359600100M modeYocto 9.0.0 Dylan, 3.2
morgaine93.8276.94RPi Model BBCM2835800Raspbian 3.1.9+  #272
morgaine93.8278.71RPi Model BBCM28358007/2012 u/sRaspbian 3.6.11+ #545
morgaine94.1478.87RPi Model BBCM28358009/2013 u/sRaspbian 3.6.11+ #545
morgaine93.8093.75BBBAM33591000Angstrom v2012.12, 3.8.6
selsinork93.9294.46Cubieboard2A20912VLAN TSDebian 7.1, 3.3.0+
morgaine94.1694.14BBBAM33591000Debian 7.0, 3.8.13-bone20
selsinork94.3394.55Cubieboard2A20912No TSDebian 7.1, 3.3.0+
selsinork94.9194.90BBBAM33591000No TSAngstrom 3.8.6
selsinork94.9494.91i.MX53-QSBi.MX53996No TS3.4.0+
selsinork243.30454.88Sabre-Litei.MX6996No TS3.0.15-ts-armv7l
Tim.Annan257.79192.22Gumstix PepperAM3359600Gbit modeYocto 9.0.0 Dylan, 3.2
notzed371.92324.49Parallella-16Zynq-70x0800Ubuntu Linaro
selsinork525.18519.41CubietruckA201000No TSLFS-ARM 3.4.67 + gmac
selsinork715.63372.17MinnowboardAtom E6401000No TSAngstrom 3.8.13-yocto
morgaine725.08595.28homebuiltE65502330PCI 33MHzGentoo 32-bit, 3.8.2, r8169
selsinork945.86946.38homebuiltE82002666PCIe X132-bit, 3.7.0, e1000

 

 

In addition to the results displayed in the table, I also ran servers (nuttcp -S) on all my boards and kicked off transfers in both directions from the x86 machine, and then followed that with board-to-board transfers just to check that the choice of clients and servers was not affecting results.  It wasn't, they are very repeatable regardless of the choice, the throughput always being limited by the slowest machine for the selected direction of transfer.  Running tests multiple times showed that variations typically held to less than 0.5%, probably a result of occasional unrelated network and/or machine activity.

 

The above measurements were performed over IPv4.  (See below for IPv6.)

 

Hint:  You can run nuttcp client commands even if a server is running on the same machine, so the most flexible approach is to execute "nuttcp -S" on all machines first, and then run client commands on any machine from anywhere to anywhere in any direction.

 

Initial observations:  The great uniformity in BeagleBone network throughput (both white and Black) stands out, and is clearly not affected by CPU clock speed.  Raspberry Pi Model B clearly has a problem on transmit (now confirmed to be limited by CPU clock) --- I'll have to investigate this further after upgrading my very old Raspbian version.  And finally, my x86 machinery and/or network gear is clearly operating at far below the rated gigabit equipment speed --- this will require urgent investigation and upgrades, especially of NIC bus interfaces.

 

Confirmation or disproval of my figures would be very welcome, as well as extending the tests to other boards and O/S versions.

 

Morgaine.

 

 

Addendum:  Note about maximum theoretical throughput added just above the table after analysis in thread below.

  • Sign in to reply
  • Cancel

Top Replies

  • morgaine
    morgaine over 12 years ago in reply to Former Member +1
    coder27 wrote: Is your RPi overclocked to 1000? Excellent observation!!! The answer is no --- I wrote "1000" in the table entirely because it has been so long since I've messed significantly with the Pi…
Parents
  • Former Member
    Former Member over 12 years ago

    cubieboard A20

     

    transmit:

    113.3673 MB /  10.07 sec =   94.4586 Mbps 12 %TX 10 %RX 0 retrans 0.56 msRTT

     

    receive:

    112.5225 MB /  10.05 sec =   93.9224 Mbps 0 %TX 30 %RX 0 retrans 0.59 msRTT

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    Added your figures for Cubieboard A20 to the table, thanks!

     

    It's beginning to look asymptotic to around 95Mbps, which has me a bit puzzled.  I'm not at the stage of wanting to look at inter-frame gaps on the wire yet, but there may be more than meets the eye at first glance here.  After all, we know that our server sides aren't the limiting factor.

     

    selsinork wrote:

     

    [Tests in both directions] simultaneously on the cubie shows a slight drop in transmit to ~92Mbps which could be within normal measurement error range. Receive however appears to decline to ~30Mbps.

    Oh, very interesting indeed!  Unfortunately it won't be possible to simply conjure up some sort of "maximum combined Rx/Tx fabric throughput" scalar metric, because it may be the case that the fabric is contention-limited only at the highest rates of simultaneous Rx/Tx traffic --- only a family of curves is going to tell the whole story, and that's beyond the kind of measurement work I'm willing to carry out.

     

    For free, anyway. image

     

    Fortunately, most loads tend to max out only one direction at a time, so our nuttcp figures for single directions are still useful.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    It's beginning to look asymptotic to around 95Mbps, which has me a bit puzzled.

    I was going to suggest that we try a udp test instead of the default tcp since it would remove some of the overheads, however the result of that is exactly 1.0000 Mbps on both the cubie and the x86 system so either I'm doing something wrong or there are other problems with nuttcp. Suggestions welcome.  I may try netperf later to see if I can get better results.

     

    the 95Mbps figure is fairly accurate if we assume that nuttcp is measuring payload throughput. 125Mbps raw wire rate, 4B/5B encoding, subtract not just inter-frame gaps but ethernet and tcp headers. TCP will also incur some overhead due to being a reliable protocol and having to deal with sending acks and such like. I'll not pretend to understand how all of that works other than to know that it's potentially complex. UDP avoids most of that since it's send-and-forget.

     

    Someone else already did the numbers for us http://sd.wareonearth.com/~phil/net/overhead/

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Reply
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    It's beginning to look asymptotic to around 95Mbps, which has me a bit puzzled.

    I was going to suggest that we try a udp test instead of the default tcp since it would remove some of the overheads, however the result of that is exactly 1.0000 Mbps on both the cubie and the x86 system so either I'm doing something wrong or there are other problems with nuttcp. Suggestions welcome.  I may try netperf later to see if I can get better results.

     

    the 95Mbps figure is fairly accurate if we assume that nuttcp is measuring payload throughput. 125Mbps raw wire rate, 4B/5B encoding, subtract not just inter-frame gaps but ethernet and tcp headers. TCP will also incur some overhead due to being a reliable protocol and having to deal with sending acks and such like. I'll not pretend to understand how all of that works other than to know that it's potentially complex. UDP avoids most of that since it's send-and-forget.

     

    Someone else already did the numbers for us http://sd.wareonearth.com/~phil/net/overhead/

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Children
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    Excellent link, thanks selsinork!  I'll extract a few lines of most relevance here, leaving out the 802.1q lines for brevity as VLANs can be avoided when benchmarking.  I've highlighted some fields for reference below.

     

    From http://sd.wareonearth.com/~phil/net/overhead/, assuming no VLANs:

     

    TCP over Ethernet:
         Assuming no header compression (e.g. not PPP)
         Add 20 IPv4 header or 40 IPv6 header (no options)
         Add 20 TCP header
         Add 12 bytes optional TCP timestamps
         Max TCP Payload data rates over ethernet are thus:
              (1500-40)/(38+1500) = 94.9285 %  IPv4, minimal headers
              (1500-52)/(38+1500) = 94.1482 %  IPv4, TCP timestamps
              (1500-60)/(38+1500) = 93.6281 %  IPv6, minimal headers
              (1500-72)/(38+1500) = 92.8479 %  IPv6, TCP timestamps

              (9000-40)/(38+9000) = 99.1370 %  Jumbo IPv4, minimal headers
              (9000-52)/(38+9000) = 99.0042 %  Jumbo IPv4, TCP timestamps
              (9000-60)/(38+9000) = 98.9157 %  Jumbo IPv6, minimal headers
              (9000-72)/(38+9000) = 98.7829 %  Jumbo IPv6, TCP timestamps

    UDP over Ethernet:
         Add 20 IPv4 header or 40 IPv6 header (no options)
         Add 8 UDP header
         Max UDP Payload data rates over ethernet are thus:
              (1500-28)/(38+1500) = 95.7087 %  IPv4
              (1500-48)/(38+1500) = 94.4083 %  IPv6

              (9000-28)/(38+9000) = 99.2697 %  Jumbo IPv4
              (9000-48)/(38+9000) = 99.0485 %  Jumbo IPv6

    Theoretical maximum UDP throughput on GigE using jumbo frames:
              (9000-20-8)/(9000+14+4+7+1+12)*1000000000/1000000 = 992.697 Mbps

    Theoretical maximum TCP throughput on GigE without using jumbo frames:
              (1500-20-20-12)/(1500+14+4+7+1+12)*1000000000/1000000 = 941.482 Mbps

    Theoretical maximum UDP throughput on GigE without using jumbo frames:
              (1500-20-8)/(1500+14+4+7+1+12)*1000000000/1000000 = 957.087 Mbps

     

    Because the interframe gap in bit-times and the 4B/5B encoding are the same for both 100Mbps and 1Gbps Ethernet, the percentage calculations apply identically at both speeds when determining the maximum payload rate,  ie. one just has to apply the percentage to the data bitrate over the link.  This is clear from the fact that, taking TCP as an example, the cited 941.482 Mbps is just 94.1482 % of 1Gbps.  Likewise, at 100Mbps the corresponding maximum payload rate of TCP would be 94.1482 Mbps.

     

    (If examining the traffic on the wire, we would have to consider the higher symbol rate on the physical link which for 4B/5B encoding is 125% of the data bitrate.  However, this doesn't matter at the layer above on each host, since the data bitrate of a "100Mbps Ethernet" really is 100Mbps after decoding.)

     

    So, our measurements of around 94.1 Mbps seem to indicate that the BBB and Cubieboard2 reach their theoretical limit of performance for "IPv4, TCP timestamps" over 100Mbps Ethernet, although the absolute maximum of 94.9285 Mbps for "IPv4, minimal headers" still remains to be reached.  (Of course, we'll have to check whether TCP timestamps  are actually being sent to understand the results fully.)  Your measurements of 94.46Mbps for Cubieboard2 Tx and 946.38Mbps for Asus Tx show that this very highest limit of throughout is being approached.

     

    So, a very good result!

     

     

    PS. I've placed a one-line note about the 94.9285 Mbps limit just above the table to save wear and tear on eyeballs.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    Cubieboard2 reach their theoretical limit of performance for "IPv4, TCP timestamps" over 100Mbps Ethernet, although the absolute maximum of 94.9285 Mbps for "IPv4, minimal headers" still remains to be reached.

    On the cubieboard, cat /proc/sys/net/ipv4/tcp_timestamps shows as 1, on my x86 systems it's 0.  So as the server end is running on the x86, it's certainly possible for some disparity between TX & RX simply due to their being flaws in our methods - the two ends have different settings.

     

    theoretical max and reality are always going to disagree in one way or another and we could spend weeks trying to work out a 3Kbps difference. Probably not worth it.

     

    as an aside, my cubie2 results have a router, a NAT layer, and tagged vlans on the link from router to switch all sitting inbetween server and client. So given the additional overheads, still an impressive enough result.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to morgaine

    I wrote:

    (Of course, we'll have to check whether TCP timestamps  are actually being sent to understand the results fully.)

    The first step towards that is to check whether the systems under test have TCP timestamps enabled.  That's easy to discover on any normal Linux system by executing:

     

    cat  /proc/sys/net/ipv4/tcp_timestamps

    All of my machines have TCP timestamps enabled (value 1), including the four ARM boards I tested.  It seems to be the default in Linux.  So, our limiting TCP throughput is 94.1482 Mbps.  (The final confirmation will be when we see TCP timestamps in Wireshark.)

     

     

    Addendum.  Oops, nice overlap between our two posts there. image

     

    It's interesting to hear that one of your x86 machines does not  have it enabled.  That's easily remedied, but unfortunately it means that we can't be sure whether your previous results included timestamps or not.  Clearly they could be enabled everywhere or disabled everywhere for the simplest setup.  The other detail is more troubling:  because you're using VLAN tagging, your throughput limits will be different yet again, 93.9040 Mbps if TCP timestamps are being sent.  I've added "VLAN" under "Limits" for Cubieboard2.

     

    Addendum 2.  Although the article you linked does not specify throughput for the case of 802.1q without TCP timestamps, the missing lines are easy to calculate:

     

    TCP
        (1500-40)/(42+1500) = 94.6822 %  802.1q, IPv4, without TCP timestamps
        (1500-60)/(42+1500) = 93.3852 %  802.1q, IPv6, without TCP timestamps
        (9000-40)/(42+9000) = 99.0931 %  Jumbo 802.1q, IPv4, without TCP timestamps
        (9000-60)/(42+9000) = 98.8719 %  Jumbo 802.1q, IPv6, without TCP timestamps

    So, it seems that 94.6822 Mbps and 946.822 Mbps will be the limiting throughputs for the non-timestamping x86 server on your VLAN.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    All of my machines have TCP timestamps enabled (value 1), including the four ARM boards I tested. It seems to be the default in Linux.

     

    [...]

     

    It's interesting to hear that one of your x86 machines does not  have it enabled. 

    Actually, none of my x86 systems have it enabled.. remember when I said that I have mostly self compiled systems ?   It's likely I make different choices from the distro default of "turn on all the crap & bloat".  Enabling something that steals my bandwidth doesn't sound like me image

     

    That's easily remedied, but unfortunately it means that we can't be sure whether your previous results included timestamps or not. 

    I'd suggest they do not include timestamps, after reading RFC1323, but I don't know enough about the implementation detail to be sure. I can put wireshark in the middle and find out though.

     

    I'll redo the tests with both client and server on the same network to remove vlan, nat and timestamp stuff when I get time. 

     

    The current setup is mainly down to the main machines having public IP's, but not having enough free for the growing collection of Arm devices which therefore have to sit behind NAT. It'll take a bit of messing around to get a capable machine onto the same lan without other complications like Open vSwitch making interpreting the results even more interesting.

     

    The x86 to x86 results are good as those have uncomplicated networking, but should be taken as having timestamps disabled.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Actually, none of my x86 systems have it enabled.. remember when I said that I have mostly self compiled systems ?   It's likely I make different choices from the distro default of "turn on all the crap & bloat".

     

    Networking aside, it is indeed a very bad trend.  Although I don't use "desktop" Linux distributions myself, it's very sad to see major players like Gnome, KDE and Ubuntu taking Linux in the opposite direction from being slim and functional by composition to being fat and dysfunctional through adding layers upon layers and forever feeding the GUI monster.  Open software designers seem to have lost all esteem for inherent power-by-design, and now worship the power-by-adding-features meme common in other operating systems instead.

     

    The end result of this mess is not only bloat, but also something far worse --- an explosion of dependencies.  Adding features to applications very commonly brings in yet another suite of libraries, and so the dependency tree grows and grows and dependency management becomes ever more problematic.  At the end of this road lies a future of "everything depends on everything else".  Althought Gentoo automates dependency management, it doesn't try to hide away the very strong smell of this malaise, so I see the signs of impending calamity every few months on upgrades.  Software is heading along a road full of potholes and ending in a precipice.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    and now worship the power-by-adding-features meme common in other operating systems instead.

    Useful features are one thing, however the trend seems to be for things that nobody needs, wants, or will ever have any use for.

     

    But lets not get too off-topic, we could discuss just this aspect for weeks image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube