element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC CPU Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 88 replies
  • Subscribers 63 subscribers
  • Views 8499 views
  • Users 0 members are here
  • cubieboard
  • olinuxino
  • sabrelite
  • bbb
  • BeagleBone
  • rpi
Related

SBC CPU Throughput

morgaine
morgaine over 12 years ago

I notice that people are doing some initial benchmarking of BBB and other boards on the RPF forum.  Results roughly as expected I guess:

 

Using just a simple

 

time echo "scale=2000;4*a(1)" | bc -l

 

as a lightweight benchmark, I see these numbers reported (smaller Time is better):

 

[table now updated with extra datapoints reported in current thread below]

 

Submitter
Time (s)
Board
SoC
Clock (MHz)
O/S
shuckle26.488Raspberry Pi BBCM2835700Raspbian 3.1.9
morgaine25.719Raspberry Pi BBCM2835700Raspbian 3.1.9+ #272
shuckle25.009Raspberry Pi BBCM2835700Raspbian 3.2.27
trn24.280Raspberry Pi BBCM2835700Raspbian ?
morgaine22.456Raspberry Pi BBCM2835800Raspbian 3.1.9+ #272
morgaine21.256Raspberry Pi BBCM2835800Raspbian 3.6.11+ #545, new firmware only
selsinork21.0MinnowboardAtom E640T1000Angstrom minnow-2013.07.10.img
shuckle17.0Raspberry Pi BBCM28351000Raspbian ?
morgaine16.153BB (white)AM3359720Angstrom v2012.01-core 3.2.5+, user-gov
selsinork15.850A20-OLinuXino-MICROA20912Debian 7.0, 3.4.67+
selsinork15.328CubieboardA20912Ubuntu/Debian 7.1
pluggy14.510BBBAM33591000Debian
morgaine14.153BBBAM33591000Debian 7.0, 3.8.13-bone20, perf-gov
selsinork13.927A10-OLinuXino-LIMEA101000Debian 7.0, 3.4.67+
Heydt13.159CubieboardA101000?
selsinork12.8Sabre-litei.MX61000Debian armhf
selsinork12.752CubieboardA20912Ubuntu/Debian 7.1 + Angstrom bc
selsinork12.090BBBAM33591000Angstrom dmnd-gov
pluggy11.923BBBAM33591000Angstrom
selsinork11.86BBBAM33591000Angstrom perf-gov
selsinork9.7Sabre-litei.MX61000Debian armhf + Angstrom bc
selsinork9.606Sabre-litei.MX61000LFS 3.12, gcc-4.8.2, glibc-2.18

 

 

As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth. Use the numbers wisely, and don't draw inappropriate conclusions. image

 

Morgaine.

  • Sign in to reply
  • Cancel

Top Replies

  • Former Member
    Former Member over 12 years ago in reply to gdstew +2
    floating point doesn't get you 2000 digits.
  • morgaine
    morgaine over 12 years ago in reply to gdstew +1
    Data is always good, and sharing it is also good. The warnings are to help people avoid unwarranted conclusions. And when used properly, synthetic and other artificial benchmarks can be very valuable,…
  • Former Member
    Former Member over 12 years ago in reply to gdstew +1
    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it. Come on. It's not that complicated. Johnny wanted to know how fast his new computer was. He decided…
Parents
  • mconners
    mconners over 12 years ago

    Well, this has been interesting. (not really)

    Here is what I liked about the data.

     

    1) It cost me nothing

    2) It confirmed what I expected

     

    If the data had said that given the problem in question the BBB took twice as long to compute the answer as the RasPi, it may have given me pause and perhaps prompted me to investigate further using other more appropriate benchmarking techniques.

     

    So while at first blush it may come across as "Move along, nothing to see here" data, I find that type of data can be useful as well. Especially when it is free.

     

    Mike

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to mconners

    Michael Conners wrote:

     

    So while at first blush it may come across as "Move along, nothing to see here" data, I find that type of data can be useful as well. Especially when it is free.

     

    Exactly.  Perfectly good data, perfectly usable in the correct context, and the usual cautions given to discourage people from deriving inappropriate conclusions.

     

    Why anyone would want to vent their ire over this simple gathering of useful data is hard to see, but it clearly hasn't been aimed at being helpful to the forum.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • mcb1
    mcb1 over 12 years ago in reply to morgaine

    Guys/Gals

    I learnt a lot from this .....um ....discussion.

     

    Not from the bickering about benchmarks, but from coders detailed explanation regarding how some of the inner workings applied.

     

    I note that in the tests for various computers in magazines, they have a number of 'real use' benchmarks, and the results vary, so I've always regarded benchmarks with some concern.

     

    While I can understand the passion about the subject, I think the short single responses were far less threatening,( for us in the observation deck), than a long 'attack' on what seemed like each line.

     

    I would also like to point out, that not everyone understands the inner workings, nor do they need to, to have it actually do something.

    You guys all drive cars (I presume) but how many of you know the inner workings of the engine, down to the nuts and bolts and their interaction to make it go.???

    For most there some pedals, that have to operated in some form of order, and a thing in the middle you shift around, and fuel that is always running out.

     

    Morgaine

    I wonder if a slightly different worded caution about the figures (for the less knowledgable ones that might read it.) might statisfy the masses.

    something from Dabs comment maybe.

    True performance is always dependant upon the compiler, application structure, operating system, I/O drivers and many other details.

     

    Raw power is seldom the only answer needed for a good system design.

     

    What is the next  ... discussion ...

     

    Mark

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to mcb1

    Mark,

       I'm happy to see the discussion was useful!

     

       With regard to a differently worded caution, I think your suggestion misses the point.

    This benchmark is a decent benchmark, not just a "poor-man's benchmark" as Morgaine

    originally described it, and it produced results which demonstrated a clear winner

    in price/performance between BBB and RPi, within its domain of applicability,

    which is integer compute bound.  The warning should warn against extrapolating

    the results beyond the domain of applicability, but not much more than that.

     

    Prior to the benchmark results, there were legitimate concerns raised about

    things like memory bandwidth on a 16-bit bus, but these concerns have been

    laid to rest.   Similarly, there were concerns raised about the supposed

    triviality of the test, and the domain of the test, but I think those concerns

    have been shown to be false.

     

    In any benchmarking activity, there will always be claims that the results might

    have turned out differently if a different compiler design or application structure

    or operating system or I/O drivers, etc., had been used.  But in a result this

    decisive, those concerns ring hollow.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Reply
  • Former Member
    Former Member over 12 years ago in reply to mcb1

    Mark,

       I'm happy to see the discussion was useful!

     

       With regard to a differently worded caution, I think your suggestion misses the point.

    This benchmark is a decent benchmark, not just a "poor-man's benchmark" as Morgaine

    originally described it, and it produced results which demonstrated a clear winner

    in price/performance between BBB and RPi, within its domain of applicability,

    which is integer compute bound.  The warning should warn against extrapolating

    the results beyond the domain of applicability, but not much more than that.

     

    Prior to the benchmark results, there were legitimate concerns raised about

    things like memory bandwidth on a 16-bit bus, but these concerns have been

    laid to rest.   Similarly, there were concerns raised about the supposed

    triviality of the test, and the domain of the test, but I think those concerns

    have been shown to be false.

     

    In any benchmarking activity, there will always be claims that the results might

    have turned out differently if a different compiler design or application structure

    or operating system or I/O drivers, etc., had been used.  But in a result this

    decisive, those concerns ring hollow.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Children
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    coder27 wrote:

     

    In any benchmarking activity, there will always be claims that the results might

    have turned out differently if a different compiler design or application structure

    I'm still trying to get to the bottom of the difference between debian & angstrom on the same hardware

     

    The obvious difference is that Angstrom has built everything with vfpv3 and neonv1 support where debian has vfpv3-d16 and no neon.  Since I don't know the insides of bc at all I have no way of knowing if this is even relevant.

     

    readelf -A /usr/bin/bc  will show arch specific details

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    just a guess, but Debian shared libraries are normally compiled with -fPIC

    (for Position Independent Code), and maybe Angstrom doesn't do that.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    possibly, but short of disassembling the code I'm not sure how to tell.

     

    One other possibility is that the debian version is compiled against readline which also pulls in ncurses, angstrom isn't. So possibly there's a penalty for scanning through directories to find the right termcap files.

     

    Unfortunately debian have screwed with the ABI, they have ld-linux buried under /lib/arm-linux-gnueabihf instead of simply /lib making it hard to take a binary from angstrom and run it on debian.

     

    Anyway, having persuaded the angstrom binary to run on debian on the sabre-lite, it runs in 9.7s vs 12.8 for the debian native version.

    It's still using the debian glibc, which probably points more to readline/ncurses rather than -fPIC

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    I think Debian is compiled for ARMv4.  Angstrom might be compiled for ARMv7,

    which may account for the difference.  Angstrom might also be compiled with a

    later version of GCC than Debian.  I'm pretty sure the vector floating point (vfp)

    and neon are not relevant factors.

     

    Edited to add:  Looking again at your table, I see Debian listed as ARMHF,

    which rules out ARMv4, so maybe that isn't the difference.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Anyway, having persuaded the angstrom binary to run on debian on the sabre-lite, it runs in 9.7s vs 12.8 for the debian native version.

     

    I added your new data point to the table.

     

    This is really cool, I think it might lead to some interesting analysis and end up improving our understanding of the figures.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    coder27 wrote:

     

    I think Debian is compiled for ARMv4.  Angstrom might be compiled for ARMv7,

    which may account for the difference.  Angstrom might also be compiled with a

    later version of GCC than Debian.  I'm pretty sure the vector floating point (vfp)

    and neon are not relevant factors.

     

    Edited to add:  Looking again at your table, I see Debian listed as ARMHF,

    which rules out ARMv4, so maybe that isn't the difference.

     

    The debian gcc was built with these options

    --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard

    gcc version 4.7.2 (Debian 4.7.2-4)

     

    the angstrom one is slightly newer and is a Linaro build gcc version 4.7.3 20130205 (prerelease) (Linaro GCC 4.7-2013.02-01)

    however it was cross compiled and the particular arch settings don't appear in the output of gcc -v

     

    Since both the BBB and Sabre-Lite are armv7 it seems that it'll be possible to get the same rootfs to run on either, so I should be able to put the debian armhf that I have on the SL onto the BBB and the BBB's angstrom onto the SL.

    I'll give it a try and let you know how I get on image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • johnbeetem
    johnbeetem over 12 years ago in reply to Former Member

    selsinork wrote:

     

    I'm still trying to get to the bottom of the difference between debian & angstrom on the same hardware

    Any chance they're using different amounts of L2 cache?  From a quick read, it looks like the Sabre-Lite has 1MB L2 cache which the application should easily fit into, but maybe Ångström only allocates 256KB?

     

    General principle: benchmarks that fit into on-chip cache don't test external memory performance image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to johnbeetem

    Well yes, cache can be a factor.  IME usually about the only thing you can do with cache is turn it on or off, beyond that the cache architecture will govern what gets used and how.

    According to the datasheet the AM3359 only has 256KB, so as you say if it fits in that then it should fit in the iMX6's 1Mb. However, we're talking A8 vs A9 and from different implementers. Chances of the cache architecture being somehow different are good.

     

    So yes, maybe it's simply a case of debian being built with every possible knob turned on and therefore suffering from bloat and excessive cache misses.

    Angstrom being the roadrunner to debian's coyote image

     

    Now all we need is to get Acme Corporation to start building BBB and all our problems will be solved image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Well yes, cache can be a factor.  IME usually about the only thing you can do with cache is turn it on or off, beyond that the cache architecture will govern what gets used and how.

     

    Turning off the on-chip cache would eliminate that variable from any tests that want to measure external memory throughput.  If it's not feasible during normal running, it might still be possible as a kernel boot option, which would be good enough for testing even if not all that convenient.  I haven't come across such a kernel option yet, but it wouldn't surprise me if it were available among the CPU options.

     

    Inevitably turning off the cache will be architecture-dependent, although that's not really an obstacle when the entire point is to make a device-dependent measurement,  A quick web search shows that 32-bit Intel CPUs allow you to turn cache off by setting bit 30 of control register cr0.  People have done this by loading a kernel module to do the operation, apparently with success judging by the large slowdown.

     

    It doesn't directly address our interest though, which is to measure cached throughput identically on two different devices which may have different cache sizes and may not be caching a given measurement program in the same way.  That's substantially harder.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube