element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC Network Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 69 replies
  • Subscribers 57 subscribers
  • Views 7236 views
  • Users 0 members are here
  • nuttcp
  • network
  • raspberry-pi
  • bbb
  • BeagleBone
  • throughput
Related

SBC Network Throughput

morgaine
morgaine over 11 years ago

Our earlier lightweight CPU benchmarking provided some confidence that the various boards tested had no major performance faults and were working roughly inline with expectations given their clock speed and processor families.  Networking is an area of performance that either doesn't get measured much or that is measured by ad hoc means which are hard to compare, and implementation anomalies are known to occur occasionally.

 

To try to put this on a more quantitative and even footing, I've picked a network measurement system that has an extremely long pedigree, the TTCP family of utilities.  This has evolved from the original "ttcp" of the 1980's through "nttcp" and finally into "nuttcp".  It has become a very useful networking tool, simple to use with repeatable results, open source, cross-platform, and it works on both IPv4 and IPv6.  It's in the Debian repository, and if the O/S to be tested doesn't have it then it can be compiled from sources just by typing 'make' on the great majority of systems.  (I cross-compiled it for Angstrom.)

 

Usage is extremely simple.  A pair of machines is required to test the link between them.  One is nominated the 'server' and has "nuttcp -S" executed on it, which turns it into a daemon running in the background.  The other is nominated the 'client', and all the tests are run from it regardless of desired direction.  The two most common tests to run on the client are a Transmission Test (Tx) using "nuttcp -t server", and a Reception Test (Rx) using "nuttcp -r server", both executed on the client with the hostname or IP address of the 'server' provided as argument.

 

These simple tests transfer data at maximum rate in the specified direction over TCP (by default), for an interval of approximately 10 seconds, and on completion the measured throughput is returned in Mbps for easiest comparison with the rated Mbps speed of the link.  Here is a table showing my initial tests executed on various ARM client boards through a gigabit switch, with the server (nuttcp -S) running on a 2.33GHz Core2 Duo machine possessing a gigabit NIC.  The final set of results was obtained between the Core2 Duo and an old Xeon server over a fully gigabit network path, just to confirm that the Core2 Duo wasn't bottlenecked in the ARM board tests.

 

 

Max theoretical TCP throughput over 100Mbps Ethernet is 94.1482 Mbps with TCP TimeStamps, or 94.9285 w/o.

For fairness, rows are ordered by 4 attributes: 1) Fast or Gigabit, 2) TCP TS or not, 3) ARM Freq, 4) Rx Speed.

 

Submitter
Rx Mbps
Tx Mbps
Client Board
SoC
MHz
Limits
O/S, kernel, driver
selsinork30.6017.28233-OLinuXinoi.MX23 ARM926233No TSArchLinux 3.7.2-2
morgaine93.8472.82RPi Model BBCM2835700Raspbian 3.1.9+ #272
morgaine93.8493.75BB (white)AM3359720Angstrom v2012.01, 3.2.5+
Tim.Annan94.1491.74Gumstix PepperAM3359600100M modeYocto 9.0.0 Dylan, 3.2
morgaine93.8276.94RPi Model BBCM2835800Raspbian 3.1.9+  #272
morgaine93.8278.71RPi Model BBCM28358007/2012 u/sRaspbian 3.6.11+ #545
morgaine94.1478.87RPi Model BBCM28358009/2013 u/sRaspbian 3.6.11+ #545
morgaine93.8093.75BBBAM33591000Angstrom v2012.12, 3.8.6
selsinork93.9294.46Cubieboard2A20912VLAN TSDebian 7.1, 3.3.0+
morgaine94.1694.14BBBAM33591000Debian 7.0, 3.8.13-bone20
selsinork94.3394.55Cubieboard2A20912No TSDebian 7.1, 3.3.0+
selsinork94.9194.90BBBAM33591000No TSAngstrom 3.8.6
selsinork94.9494.91i.MX53-QSBi.MX53996No TS3.4.0+
selsinork243.30454.88Sabre-Litei.MX6996No TS3.0.15-ts-armv7l
Tim.Annan257.79192.22Gumstix PepperAM3359600Gbit modeYocto 9.0.0 Dylan, 3.2
notzed371.92324.49Parallella-16Zynq-70x0800Ubuntu Linaro
selsinork525.18519.41CubietruckA201000No TSLFS-ARM 3.4.67 + gmac
selsinork715.63372.17MinnowboardAtom E6401000No TSAngstrom 3.8.13-yocto
morgaine725.08595.28homebuiltE65502330PCI 33MHzGentoo 32-bit, 3.8.2, r8169
selsinork945.86946.38homebuiltE82002666PCIe X132-bit, 3.7.0, e1000

 

 

In addition to the results displayed in the table, I also ran servers (nuttcp -S) on all my boards and kicked off transfers in both directions from the x86 machine, and then followed that with board-to-board transfers just to check that the choice of clients and servers was not affecting results.  It wasn't, they are very repeatable regardless of the choice, the throughput always being limited by the slowest machine for the selected direction of transfer.  Running tests multiple times showed that variations typically held to less than 0.5%, probably a result of occasional unrelated network and/or machine activity.

 

The above measurements were performed over IPv4.  (See below for IPv6.)

 

Hint:  You can run nuttcp client commands even if a server is running on the same machine, so the most flexible approach is to execute "nuttcp -S" on all machines first, and then run client commands on any machine from anywhere to anywhere in any direction.

 

Initial observations:  The great uniformity in BeagleBone network throughput (both white and Black) stands out, and is clearly not affected by CPU clock speed.  Raspberry Pi Model B clearly has a problem on transmit (now confirmed to be limited by CPU clock) --- I'll have to investigate this further after upgrading my very old Raspbian version.  And finally, my x86 machinery and/or network gear is clearly operating at far below the rated gigabit equipment speed --- this will require urgent investigation and upgrades, especially of NIC bus interfaces.

 

Confirmation or disproval of my figures would be very welcome, as well as extending the tests to other boards and O/S versions.

 

Morgaine.

 

 

Addendum:  Note about maximum theoretical throughput added just above the table after analysis in thread below.

  • Sign in to reply
  • Cancel

Top Replies

  • morgaine
    morgaine over 11 years ago in reply to Former Member +1
    coder27 wrote: Is your RPi overclocked to 1000? Excellent observation!!! The answer is no --- I wrote "1000" in the table entirely because it has been so long since I've messed significantly with the Pi…
  • morgaine
    morgaine over 11 years ago

    I've now also obtained some data points over IPv6, but only for the Angstrom BB white and the Debian BBB.  In both cases, Rx and Tx figures are very close to 1% slower than over IPv4.  This is very likely to be due to the lack of much optimization being carried out on IPv6 so far, versus extreme optimization being done to IPv4 under strong commercial pressure.  It's very encouraging to find their performances already so close.

     

    Hint: Use the '-6' flag in both the server and client commands to select IPv6 operation.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    I'll try this out on some of the Arm boards I have over the weekend.

     

    On your x86 machines low figures..  Gigabit wire speed is a much harder thing to accomplish, you'd need to investigate ethernet packet sizes, switch capability, and things like whether your ethernet adapter is PCI, PCIe and whether it's a PCIe x1, x4 etc.   Also, for example, I've seen various problems with onboard realtek 8169 series gigabit chips in the past, so you would have to look at the chip, the kernel and driver versions too.

    Typical cheap 'desktop' gigabit adapters will often struggle, server adapters are better but come with much increased cost and usually require a type of slot that's not available in a desktop.

     

    As you can imagine this becomes even more of a problem when you have a 10G card in your machine.

     

    Anyway, as a first step, here's some results from a 32bit x86 kernel 3.7.0 machine to a 64 bit x86 machine kernel 3.8.0

     

    1128.4432 MB /  10.01 sec =  945.8611 Mbps 3 %TX 11 %RX 0 retrans 0.26 msRTT

    1129.1255 MB /  10.01 sec =  946.3785 Mbps 3 %TX 15 %RX 0 retrans 0.25 msRTT

     

    client machine has the following NIC

    [4.540835] e1000e 0000:02:00.0 eth0: (PCI Express:2.5GT/s:Width x1)
    [4.540978] e1000e 0000:02:00.0 eth0: Intel(R) PRO/1000 Network Connection
    [4.541152] e1000e 0000:02:00.0 eth0: MAC: 1, PHY: 4, PBA No: D50854-003

     

    Asus P5Q with 4GB ram &

    Intel(R) Core(TM)2 Duo CPU E8200  @ 2.66GHz

     

     

    Server

    [1.200887] e1000e 0000:20:00.0 eth1: (PCI Express:2.5GT/s:Width x4)
    [1.201103] e1000e 0000:20:00.0 eth1: Intel(R) PRO/1000 Network Connection
    [1.201318] e1000e 0000:20:00.0 eth1: MAC: 0, PHY: 4, PBA No: D51930-006

     

    HP ML110 G6

    Intel(R) Core(TM) i7 CPU     870  @ 2.93GHz

     

    There's a Cisco SG300-52 switch sitting between them.

     

    The OS isn't particularly meaningful. once upon a time the client was Slackware but virtually everything on it has been replaced with self compiled stuff.  The server is entirely self compiled.

     

    Both TX and RX results drop to ~932Mbps when using IPv6

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    it would also be interesting to have the output of ethtool -k eth0 on some of these to see if there's differences in the offload settings for various NIC's and whether that influences the results at all

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago in reply to Former Member

    ethtool -k eth0 results for cubieboard

     

    Features for eth0:

    rx-checksumming: off [fixed]

    tx-checksumming: off

            tx-checksum-ipv4: off [fixed]

            tx-checksum-ip-generic: off [fixed]

            tx-checksum-ipv6: off [fixed]

            tx-checksum-fcoe-crc: off [fixed]

            tx-checksum-sctp: off [fixed]

    scatter-gather: off

            tx-scatter-gather: off [fixed]

            tx-scatter-gather-fraglist: off [fixed]

    tcp-segmentation-offload: off

            tx-tcp-segmentation: off [fixed]

            tx-tcp-ecn-segmentation: off [fixed]

            tx-tcp6-segmentation: off [fixed]

    udp-fragmentation-offload: off [fixed]

    generic-segmentation-offload: off [requested on]

    generic-receive-offload: on

    large-receive-offload: off [fixed]

    rx-vlan-offload: off [fixed]

    tx-vlan-offload: off [fixed]

    ntuple-filters: off [fixed]

    receive-hashing: off [fixed]

    highdma: off [fixed]

    rx-vlan-filter: off [fixed]

    vlan-challenged: off [fixed]

    tx-lockless: off [fixed]

    netns-local: off [fixed]

    tx-gso-robust: off [fixed]

    tx-fcoe-segmentation: off [fixed]

    fcoe-mtu: off [fixed]

    tx-nocache-copy: off

    loopback: off [fixed]

     

    so GRO being about the only offload in use, the things marked [fixed] I'm assuming either the hardware or driver doesn't support

     

    Compare to the Intel e1000e in my x86 system:

     

    Features for eth0:

    rx-checksumming: on

    tx-checksumming: on

            tx-checksum-ipv4: off [fixed]

            tx-checksum-ip-generic: on

            tx-checksum-ipv6: off [fixed]

            tx-checksum-fcoe-crc: off [fixed]

            tx-checksum-sctp: off [fixed]

    scatter-gather: on

            tx-scatter-gather: on

            tx-scatter-gather-fraglist: off [fixed]

    tcp-segmentation-offload: on

            tx-tcp-segmentation: on

            tx-tcp-ecn-segmentation: off [fixed]

            tx-tcp6-segmentation: on

    udp-fragmentation-offload: off [fixed]

    generic-segmentation-offload: on

    generic-receive-offload: on

    large-receive-offload: off [fixed]

    rx-vlan-offload: on

    tx-vlan-offload: on

    ntuple-filters: off [fixed]

    receive-hashing: on

    highdma: on [fixed]

    rx-vlan-filter: on [fixed]

    vlan-challenged: off [fixed]

    tx-lockless: off [fixed]

    netns-local: off [fixed]

    tx-gso-robust: off [fixed]

    tx-fcoe-segmentation: off [fixed]

    fcoe-mtu: off [fixed]

    tx-nocache-copy: on

    loopback: off [fixed]

    rx-fcs: off

    rx-all: off

     

    so more capabilities supported and enabled

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    Morgaine,

       Is your RPi overclocked to 1000?

     

      It would be interesting to see how these results are affected by loading the USB,

    since BBB is advertised as benefiting compared to RPi by having separate data paths

    to the cpu.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 11 years ago in reply to Former Member

    coder27 wrote:

     

    Is your RPi overclocked to 1000?

    Excellent observation!!!  The answer is no --- I wrote "1000" in the table entirely because it has been so long since I've messed significantly with the Pi that I'd totally forgotten that it runs so slow by default, and in my head everything was running at 1GHz except the BB white.  Oh dear, LOL.  I've corrected the Pi entry in the table to 700MHz.

     

    My Pi was briefly running at 800MHz at some point but I commented it out because of the massive USB problems that I was experiencing and wrote about.  It didn't help of course (in fact it probably made the problem worse since the ARM's response time to USB events became even slower), but that's why it was running at 700MHz when I brought the board back to life yesterday for the nuttcp tests.  I only use the Pi headless now, as its USB is unusable for me.

     

    I've just rerun nuttcp on the 800MHz Pi and have updated the table with a new row.  Notice how the increase of ARM clock frequency by 14% has raised the Pi's Tx throughput by 5.7%.  The Rx throughput is unaffected.

     

    It would be interesting to see how these results are affected by loading the USB,

    since BBB is advertised as benefiting compared to RPi by having separate data paths

    to the cpu.

    Yes, that would be very interesting, but someone else will have to do it as the USB on my Pi is in such a dire state.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 11 years ago in reply to Former Member

    Awesome info, thanks selsinork!

     

    The big problem is trying to summarize the info that most affects the results, so I've added a column "Limits" for information on anything that may be a significant limit or constraint on the throughput measured.  For example the bus through which the NIC operates can be a major limiting factor, and in my case is a known constraint on gig-to-gig transfers since my server's D-Link DGE-528T card is plugged into a lowly PCI slot running at 33MHz --- adequate at 100Mbps but certainly not at gigabit speeds.

     

    I've added your Asus machine as "client" in the table, and have made the assumption that your two nuttcp output lines correspond to client Rx and Tx thoughput in that order.  (Not that there is any huge difference of course.)

     

    I have machines with on-motherboard gigabit Ethernet  too (typically the NIC is on a PCIe channel of one or more lanes), so I'll try to find a box that's more suitable for network throughput tests and rerun everything.

     

    This is an interesting area of testing, since not only will we gain better understanding of our ARM boards but also improve our home networks as our bottlenecks show up in the numbers.  I intend to update any entry that is affected by improvements at the server end or in network infrastructure, since each entry is intended to quantify the client's Rx/Tx throughput only (the server and network are assumed faster).  This will be an asymptotic process geting progressively closer to valid measurements for the SBCs.

     

    As the engineering mantra says, you don't really know something until you measure it. image

     

     

    PS.  Disappointingly, upgrading the Pi kernel and  firmware with a whole year's improvements (now on 3.6.11+ #545) has only increased Pi's internally-constrained Tx throughput by 2.3%.  No other variables had been modified prior to running the new tests.  New table entry added.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    The big problem is trying to summarize the info that most affects the results,

    Yeah, and trying is probably a pointless task. Think of it more as a set of starting points to look at when you have a result that seems out of place.

     

    I've added your Asus machine as "client" in the table, and have made the assumption that your two nuttcp output lines correspond to client Rx and Tx thoughput in that order.  (Not that there is any huge difference of course.)

    Yes, receive than transmit, would probably be nice if the output reflected which direction.  The other bit of info provided in the output that's likely to be interesting/relevant is the transmitter and receiver CPU utilisation when running these tests. Being able to do the full 100Mbps in a synthetic test is one thing, but if it takes 100% CPU to do it then it's not realistic to expect to be able to do that while the system is doing other things like reading the file it's transferring from disk.

    I have machines with on-motherboard gigabit Ethernet  too (typically the NIC is on a PCIe channel of one or more lanes), so I'll try to find a box that's more suitable for network throughput tests and rerun everything.

    In my experience it's unusual for onboard gigabit NIC's on desktop class boards to use anything other than x1 and usually the cheapest chip they can find, even on high end servers it's often only x2. Multi-port add-in cards and 10G cards can be different of course.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    cubieboard A20

     

    transmit:

    113.3673 MB /  10.07 sec =   94.4586 Mbps 12 %TX 10 %RX 0 retrans 0.56 msRTT

     

    receive:

    112.5225 MB /  10.05 sec =   93.9224 Mbps 0 %TX 30 %RX 0 retrans 0.59 msRTT

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    there's probably also another interesting thing to look at for networking, full-duplex performance.  the numbers we're currently looking at are transmit-only or receive-only which may not be representative of normal networking usage patterns.

     

    simple attempts at doing this using two nuttcp instances simultaneously on the cubie shows a slight drop in transmit to ~92Mbps which could be within normal measurement error range. Receive however appears to decline to ~30Mbps.  Difficult to read too much into those numbers without further investigation, but I do think it's worth looking at. We may find some SoC internal bottlenecks this way, we know the i.MX6 on the SL is supposed to have some problems in this area, so it would be interesting to see if other SoC's have as well.

     

    On another note, we know that both the A20 and AM3359 have onboard gigabit MAC, so you'd not really expect them to have too many problems in implementations with only 100Mbps PHY's attached. Would probably be very interesting to see how tha same SoC's perform with gigabit PHYs as well.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
>
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube