element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC CPU Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 88 replies
  • Subscribers 63 subscribers
  • Views 8509 views
  • Users 0 members are here
  • cubieboard
  • olinuxino
  • sabrelite
  • bbb
  • BeagleBone
  • rpi
Related

SBC CPU Throughput

morgaine
morgaine over 12 years ago

I notice that people are doing some initial benchmarking of BBB and other boards on the RPF forum.  Results roughly as expected I guess:

 

Using just a simple

 

time echo "scale=2000;4*a(1)" | bc -l

 

as a lightweight benchmark, I see these numbers reported (smaller Time is better):

 

[table now updated with extra datapoints reported in current thread below]

 

Submitter
Time (s)
Board
SoC
Clock (MHz)
O/S
shuckle26.488Raspberry Pi BBCM2835700Raspbian 3.1.9
morgaine25.719Raspberry Pi BBCM2835700Raspbian 3.1.9+ #272
shuckle25.009Raspberry Pi BBCM2835700Raspbian 3.2.27
trn24.280Raspberry Pi BBCM2835700Raspbian ?
morgaine22.456Raspberry Pi BBCM2835800Raspbian 3.1.9+ #272
morgaine21.256Raspberry Pi BBCM2835800Raspbian 3.6.11+ #545, new firmware only
selsinork21.0MinnowboardAtom E640T1000Angstrom minnow-2013.07.10.img
shuckle17.0Raspberry Pi BBCM28351000Raspbian ?
morgaine16.153BB (white)AM3359720Angstrom v2012.01-core 3.2.5+, user-gov
selsinork15.850A20-OLinuXino-MICROA20912Debian 7.0, 3.4.67+
selsinork15.328CubieboardA20912Ubuntu/Debian 7.1
pluggy14.510BBBAM33591000Debian
morgaine14.153BBBAM33591000Debian 7.0, 3.8.13-bone20, perf-gov
selsinork13.927A10-OLinuXino-LIMEA101000Debian 7.0, 3.4.67+
Heydt13.159CubieboardA101000?
selsinork12.8Sabre-litei.MX61000Debian armhf
selsinork12.752CubieboardA20912Ubuntu/Debian 7.1 + Angstrom bc
selsinork12.090BBBAM33591000Angstrom dmnd-gov
pluggy11.923BBBAM33591000Angstrom
selsinork11.86BBBAM33591000Angstrom perf-gov
selsinork9.7Sabre-litei.MX61000Debian armhf + Angstrom bc
selsinork9.606Sabre-litei.MX61000LFS 3.12, gcc-4.8.2, glibc-2.18

 

 

As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth. Use the numbers wisely, and don't draw inappropriate conclusions. image

 

Morgaine.

  • Sign in to reply
  • Cancel

Top Replies

  • Former Member
    Former Member over 12 years ago in reply to gdstew +2
    floating point doesn't get you 2000 digits.
  • morgaine
    morgaine over 12 years ago in reply to gdstew +1
    Data is always good, and sharing it is also good. The warnings are to help people avoid unwarranted conclusions. And when used properly, synthetic and other artificial benchmarks can be very valuable,…
  • Former Member
    Former Member over 12 years ago in reply to gdstew +1
    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it. Come on. It's not that complicated. Johnny wanted to know how fast his new computer was. He decided…
Parents
  • mconners
    mconners over 12 years ago

    Well, this has been interesting. (not really)

    Here is what I liked about the data.

     

    1) It cost me nothing

    2) It confirmed what I expected

     

    If the data had said that given the problem in question the BBB took twice as long to compute the answer as the RasPi, it may have given me pause and perhaps prompted me to investigate further using other more appropriate benchmarking techniques.

     

    So while at first blush it may come across as "Move along, nothing to see here" data, I find that type of data can be useful as well. Especially when it is free.

     

    Mike

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to mconners

    Michael Conners wrote:

     

    So while at first blush it may come across as "Move along, nothing to see here" data, I find that type of data can be useful as well. Especially when it is free.

     

    Exactly.  Perfectly good data, perfectly usable in the correct context, and the usual cautions given to discourage people from deriving inappropriate conclusions.

     

    Why anyone would want to vent their ire over this simple gathering of useful data is hard to see, but it clearly hasn't been aimed at being helpful to the forum.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to mcb1

    Replying to Mark and coder27 together, as you both commented on phraseology::

     

    Mark, you're quite right that depending on the audience, a more extensive and explanatory discussion about appropriate use of benchmarks could be very useful.  My comment was the bare minimum intended for an audience of engineers and techies, who typically know very well already that measuring A doesn't generally allow you to conclude anything about B.  It wasn't intended as an explanation, but merely a reminder not to do anything silly with the numbers, and written in humourous style so that no expert would be affronted by teaching grandma to suck eggs.

     

    That said, your suggested comment or wording addresses a very different topic to mine, namely "performance", and "raw power".  I purposely said nothing about such things, because you can easily slip into the realm of inappropriate conclusions by assuming that a particular benchmark is a good metric of specific types of performance unless you have checked that it directly measures or correlates properly with them.  As a result, your suggestion is much bolder than mine, but also more likely to be wrong without further study.  I just advised caution, which is always safe to do.

     

    coder27, you're entirely right that calling this a "poor-man's benchmark" might be underselling the utility of this measurement.  That was not intended though, since the measurement was useful to me as I indicated, and I assumed that it would be useful to others as well.

     

    I used the phrase only to mean easy, lightweight, simple, fast, built-in, no package to buy or compile, etc etc, and not implying any criticism whatsoever.  After all, if I had been critical of the benchmark or of the results then I wouldn't have gathered and displayed these measurements in the first place!  Nevertheless, it's possible that the phrase conveyed the wrong message to some readers, so a more neutral description might have been better, I do agree.  The word lightweight seems especially suitable.

     

    coder27 wrote:

     

    Prior to the benchmark results, there were legitimate concerns raised about things like memory bandwidth on a 16-bit bus, but these concerns have been laid to rest.

     

    I've seen that matter raised a number of times myself, so it is indeed very useful to possess numeric data that addresses the issue objectively.

     

    Morgaine

     

    Addendum.  After considering the impact of "poor man's" still further, I'm now convinced that your observations require it to be changed to avoid derailing new visitors to the thread.  I've changed "poor man's" to "lightweight" in the opening article.  Many thanks!

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    coder27 wrote:

     

    In any benchmarking activity, there will always be claims that the results might

    have turned out differently if a different compiler design or application structure

    I'm still trying to get to the bottom of the difference between debian & angstrom on the same hardware

     

    The obvious difference is that Angstrom has built everything with vfpv3 and neonv1 support where debian has vfpv3-d16 and no neon.  Since I don't know the insides of bc at all I have no way of knowing if this is even relevant.

     

    readelf -A /usr/bin/bc  will show arch specific details

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    just a guess, but Debian shared libraries are normally compiled with -fPIC

    (for Position Independent Code), and maybe Angstrom doesn't do that.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    possibly, but short of disassembling the code I'm not sure how to tell.

     

    One other possibility is that the debian version is compiled against readline which also pulls in ncurses, angstrom isn't. So possibly there's a penalty for scanning through directories to find the right termcap files.

     

    Unfortunately debian have screwed with the ABI, they have ld-linux buried under /lib/arm-linux-gnueabihf instead of simply /lib making it hard to take a binary from angstrom and run it on debian.

     

    Anyway, having persuaded the angstrom binary to run on debian on the sabre-lite, it runs in 9.7s vs 12.8 for the debian native version.

    It's still using the debian glibc, which probably points more to readline/ncurses rather than -fPIC

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    I think Debian is compiled for ARMv4.  Angstrom might be compiled for ARMv7,

    which may account for the difference.  Angstrom might also be compiled with a

    later version of GCC than Debian.  I'm pretty sure the vector floating point (vfp)

    and neon are not relevant factors.

     

    Edited to add:  Looking again at your table, I see Debian listed as ARMHF,

    which rules out ARMv4, so maybe that isn't the difference.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Anyway, having persuaded the angstrom binary to run on debian on the sabre-lite, it runs in 9.7s vs 12.8 for the debian native version.

     

    I added your new data point to the table.

     

    This is really cool, I think it might lead to some interesting analysis and end up improving our understanding of the figures.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to Former Member

    coder27 wrote:

     

    I think Debian is compiled for ARMv4.  Angstrom might be compiled for ARMv7,

    which may account for the difference.  Angstrom might also be compiled with a

    later version of GCC than Debian.  I'm pretty sure the vector floating point (vfp)

    and neon are not relevant factors.

     

    Edited to add:  Looking again at your table, I see Debian listed as ARMHF,

    which rules out ARMv4, so maybe that isn't the difference.

     

    The debian gcc was built with these options

    --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard

    gcc version 4.7.2 (Debian 4.7.2-4)

     

    the angstrom one is slightly newer and is a Linaro build gcc version 4.7.3 20130205 (prerelease) (Linaro GCC 4.7-2013.02-01)

    however it was cross compiled and the particular arch settings don't appear in the output of gcc -v

     

    Since both the BBB and Sabre-Lite are armv7 it seems that it'll be possible to get the same rootfs to run on either, so I should be able to put the debian armhf that I have on the SL onto the BBB and the BBB's angstrom onto the SL.

    I'll give it a try and let you know how I get on image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • johnbeetem
    johnbeetem over 12 years ago in reply to Former Member

    selsinork wrote:

     

    I'm still trying to get to the bottom of the difference between debian & angstrom on the same hardware

    Any chance they're using different amounts of L2 cache?  From a quick read, it looks like the Sabre-Lite has 1MB L2 cache which the application should easily fit into, but maybe Ångström only allocates 256KB?

     

    General principle: benchmarks that fit into on-chip cache don't test external memory performance image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to johnbeetem

    Well yes, cache can be a factor.  IME usually about the only thing you can do with cache is turn it on or off, beyond that the cache architecture will govern what gets used and how.

    According to the datasheet the AM3359 only has 256KB, so as you say if it fits in that then it should fit in the iMX6's 1Mb. However, we're talking A8 vs A9 and from different implementers. Chances of the cache architecture being somehow different are good.

     

    So yes, maybe it's simply a case of debian being built with every possible knob turned on and therefore suffering from bloat and excessive cache misses.

    Angstrom being the roadrunner to debian's coyote image

     

    Now all we need is to get Acme Corporation to start building BBB and all our problems will be solved image

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Well yes, cache can be a factor.  IME usually about the only thing you can do with cache is turn it on or off, beyond that the cache architecture will govern what gets used and how.

     

    Turning off the on-chip cache would eliminate that variable from any tests that want to measure external memory throughput.  If it's not feasible during normal running, it might still be possible as a kernel boot option, which would be good enough for testing even if not all that convenient.  I haven't come across such a kernel option yet, but it wouldn't surprise me if it were available among the CPU options.

     

    Inevitably turning off the cache will be architecture-dependent, although that's not really an obstacle when the entire point is to make a device-dependent measurement,  A quick web search shows that 32-bit Intel CPUs allow you to turn cache off by setting bit 30 of control register cr0.  People have done this by loading a kernel module to do the operation, apparently with success judging by the large slowdown.

     

    It doesn't directly address our interest though, which is to measure cached throughput identically on two different devices which may have different cache sizes and may not be caching a given measurement program in the same way.  That's substantially harder.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Reply
  • morgaine
    morgaine over 12 years ago in reply to Former Member

    selsinork wrote:

     

    Well yes, cache can be a factor.  IME usually about the only thing you can do with cache is turn it on or off, beyond that the cache architecture will govern what gets used and how.

     

    Turning off the on-chip cache would eliminate that variable from any tests that want to measure external memory throughput.  If it's not feasible during normal running, it might still be possible as a kernel boot option, which would be good enough for testing even if not all that convenient.  I haven't come across such a kernel option yet, but it wouldn't surprise me if it were available among the CPU options.

     

    Inevitably turning off the cache will be architecture-dependent, although that's not really an obstacle when the entire point is to make a device-dependent measurement,  A quick web search shows that 32-bit Intel CPUs allow you to turn cache off by setting bit 30 of control register cr0.  People have done this by loading a kernel module to do the operation, apparently with success judging by the large slowdown.

     

    It doesn't directly address our interest though, which is to measure cached throughput identically on two different devices which may have different cache sizes and may not be caching a given measurement program in the same way.  That's substantially harder.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Children
No Data
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube