element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC CPU Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 88 replies
  • Subscribers 63 subscribers
  • Views 8480 views
  • Users 0 members are here
  • cubieboard
  • olinuxino
  • sabrelite
  • bbb
  • BeagleBone
  • rpi
Related

SBC CPU Throughput

morgaine
morgaine over 12 years ago

I notice that people are doing some initial benchmarking of BBB and other boards on the RPF forum.  Results roughly as expected I guess:

 

Using just a simple

 

time echo "scale=2000;4*a(1)" | bc -l

 

as a lightweight benchmark, I see these numbers reported (smaller Time is better):

 

[table now updated with extra datapoints reported in current thread below]

 

Submitter
Time (s)
Board
SoC
Clock (MHz)
O/S
shuckle26.488Raspberry Pi BBCM2835700Raspbian 3.1.9
morgaine25.719Raspberry Pi BBCM2835700Raspbian 3.1.9+ #272
shuckle25.009Raspberry Pi BBCM2835700Raspbian 3.2.27
trn24.280Raspberry Pi BBCM2835700Raspbian ?
morgaine22.456Raspberry Pi BBCM2835800Raspbian 3.1.9+ #272
morgaine21.256Raspberry Pi BBCM2835800Raspbian 3.6.11+ #545, new firmware only
selsinork21.0MinnowboardAtom E640T1000Angstrom minnow-2013.07.10.img
shuckle17.0Raspberry Pi BBCM28351000Raspbian ?
morgaine16.153BB (white)AM3359720Angstrom v2012.01-core 3.2.5+, user-gov
selsinork15.850A20-OLinuXino-MICROA20912Debian 7.0, 3.4.67+
selsinork15.328CubieboardA20912Ubuntu/Debian 7.1
pluggy14.510BBBAM33591000Debian
morgaine14.153BBBAM33591000Debian 7.0, 3.8.13-bone20, perf-gov
selsinork13.927A10-OLinuXino-LIMEA101000Debian 7.0, 3.4.67+
Heydt13.159CubieboardA101000?
selsinork12.8Sabre-litei.MX61000Debian armhf
selsinork12.752CubieboardA20912Ubuntu/Debian 7.1 + Angstrom bc
selsinork12.090BBBAM33591000Angstrom dmnd-gov
pluggy11.923BBBAM33591000Angstrom
selsinork11.86BBBAM33591000Angstrom perf-gov
selsinork9.7Sabre-litei.MX61000Debian armhf + Angstrom bc
selsinork9.606Sabre-litei.MX61000LFS 3.12, gcc-4.8.2, glibc-2.18

 

 

As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth. Use the numbers wisely, and don't draw inappropriate conclusions. image

 

Morgaine.

  • Sign in to reply
  • Cancel

Top Replies

  • Former Member
    Former Member over 12 years ago in reply to gdstew +2
    floating point doesn't get you 2000 digits.
  • morgaine
    morgaine over 12 years ago in reply to gdstew +1
    Data is always good, and sharing it is also good. The warnings are to help people avoid unwarranted conclusions. And when used properly, synthetic and other artificial benchmarks can be very valuable,…
  • Former Member
    Former Member over 12 years ago in reply to gdstew +1
    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it. Come on. It's not that complicated. Johnny wanted to know how fast his new computer was. He decided…
Parents
  • gdstew
    gdstew over 12 years ago

    As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth.

     

    You're absolutely right ! As an indicator or real world application performance this "benchmark" is worthless. Thanks for sharing it.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to gdstew

    Data is always good, and sharing it is also good.  The warnings are  to help people avoid unwarranted conclusions.

     

    And when used properly, synthetic and other artificial benchmarks can be very valuable, for example as a way of checking that an upgrade hasn't altered your compiler optimization defaults.  As part of regression testing, they're a very useful engineering tool.  You just have to be conscious of their limits, appropriate use versus inappropriate use.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to morgaine

    I think this benchmark is actually pretty decent in a lot of ways.

    1) it is trivially easy to run, nothing to download, nothing to compile,

        so you can easily get results from lots of people, allowing you to

        see how consistent the results are, and they seem to be pretty consistent.

    2)  it is not subject to personal differences in what compiler was used

        to compile it, or what optimization levels or other compiler switches

        were used, although it will exhibit such differences between distros.

    3)  it doesn't rely on computing the same value over and over in a loop.

         Benchmarks that do that can be overly sensitive to compiler loop

         optimizations, and to just-in-time code-generation techniques.

    4)  It has a pretty-well understood area of application, integer compute bound.

        Obviously you wouldn't use it to measure floating-point performance, or

        gpu performance, or I/O performance, etc.

    5)  It uses data that is large enough to show the benefit of large data caches,

         similar to typical user applications.

    6)  It takes about the right amount of time to run--not so short that the time to

         load the benchmark matters, or that the accuracy of the clock matters,

         and not so long that you can't easily run it several times to see that the

         results are consistent.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    1) it is trivially easy to run, nothing to download, nothing to compile,

        so you can easily get results from lots of people, allowing you to

        see how consistent the results are, and they seem to be pretty consistent.

     

    I was totally unaware that triviality in any form was considered a good trait for a benchmark. Doesn't make sense to me though.

     

    2)  it is not subject to personal differences in what compiler was used

        to compile it, or what optimization levels or other compiler switches

        were used, although it will exhibit such differences between distros.

     

    Absolutely no way to know this at all. The programs run in the "benchmark" were almost certainly compiled using different versions of GCC with differing levels of optimization

    built in and with unknown compile swithes used (most are probably the same, some depend on the CPU) when the OS was built.

     

    3)  it doesn't rely on computing the same value over and over in a loop.

         Benchmarks that do that can be overly sensitive to compiler loop

         optimizations, and to just-in-time code-generation techniques.

     

    That is why good synthetic benchmarks consist of many programs. The level of loop optimization available is a good thing to know as is knowing if using

    JIT is something you can do if you prefer to use it (see first statement).

     

    4)  It has a pretty-well understood area of application, integer compute bound.

        Obviously you wouldn't use it to measure floating-point performance, or

        gpu performance, or I/O performance, etc.

     

    I think all those other things are actually good things to benchmark too since they can all affect applications.

     

    5)  It uses data that is large enough to show the benefit of large data caches,

         similar to typical user applications.

     

    Although in the real world this will probably be the exception, not the rule. Yes big caches are good, but using a benchmark

    that it executes (or executes mostly) in cache or keeps (most of) its data in cache skews the results too. Something I believe

    you mentioned earlier as not being desirable.

     

    6)  It takes about the right amount of time to run--not so short that the time to

         load the benchmark matters, or that the accuracy of the clock matters,

         and not so long that you can't easily run it several times to see that the

         results are consistent.

     

    OK. Not really what I consider to be in the top 10 on my list of desirable benchmark traits and pretty much in direct opposition

    of getting useful results. The phrase that comes to mind is quick and dirty. Personnaly I don't want to wait a real long time for

    results either so I prefer to be able to choose what I need to check and how many iterations they run for resonably repeatable

    results.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > The level of loop optimization available is a good thing to know ...

     

    No, its not.

     

    In order to have a benchmark that runs long enough to get decent timings, many benchmarks

    put a loop around the code they want to test, on the theory that it will take N times longer

    to execute a loop N times than it would to do it once.  That theory is just plain wrong

    because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.   In some cases, the entire contents of the loop can be done only once.

    So you think you're measuring the time it takes to do some computation, but you're really not. 

     

    It's tempting to think that it doesn't matter, because a compiler that does good loop optimizations

    is better than one that doesn't.  Which may be true to some extent.  But the problem is that

    your application most likely doesn't repeatedly do the same calculation over and over in a loop,

    so your application won't see the same speedup as your synthetic benchmark.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    In order to have a benchmark that runs long enough to get decent timings, many benchmarks

    put a loop around the code they want to test, on the theory that it will take N times longer

    to execute a loop N times than it would to do it once.  That theory is just plain wrong

    because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.   In some cases, the entire contents of the loop can be done only once.

    So you think you're measuring the time it takes to do some computation, but you're really not.

     

    Yes it is good to know where and how much the loops have been inlined, otherwise bad interpretaions of the

    results are probable. So I guess that you should know what you are doing to get good results.

     

    That theory is just plain wrong because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.

     

    In most decent compilers the level of inlining avaialable is also usually selectable using one or more compile time directives or

    by the compiler itself mainly (but not exclusively) due to code size limitations.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > Yes it is good to know where and how much the loops have been inlined, otherwise bad interpretaions of the results are probable.

     

    Let me spell it out for you.  I said nothing about inlining.   Inlining is something that applies

    to subprograms, and reduces the call/return overhead.  It doesn't apply to loops at all,

    although there is an optimization called loop unrolling that is similar to subprogram inlining.

     

    The loop optimization I referred to is called loop invariant code hoisting.  It involves moving

    code from inside the loop to outside (before) the loop, where it is only done once instead

    of N times.  This optimization prevents you from knowing how long the code takes to run that you

    were intending to measure.   And it is very significant that this Pi benchmark isn't susceptible

    to this optimization.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    Loop unrolling is what I meant to say. Loop invariant code hoisting removes code that should not be in the loop to begin with because the results

    obtained from executing the code does not change (invariant) with each iteration of the loop so you are just wasting CPU cycles each time it

    executes inside the loop. I personally try to keep such code out of loops in the first place because as far as I know it is generally considered to

    be bad programming (wastingCPU cycles and all that) and don't understand why you think it is a good idea to keep it in the loop so you can

    benchmark it.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it.

     

    Come on.  It's not that complicated.


    Johnny wanted to know how fast his new computer was.  He decided to measure how long it takes

    to multiply two numbers together.  He wrote a program that did that, but it ran so fast that his

    stopwatch was useless.  He got a brilliant idea.  Put the multiplication inside a loop that iterates

    1,000,000 times, and time that, and divide the time by 1,000,000.  But his compiler recognized

    the multiplication as loop invariant, and hoisted it out of the loop, so his program only ended up

    doing one multiplication, and his stopwatch was still useless.  Finally he recognized that an

    important feature of a benchmark program is that it avoids doing a calculation repeatedly in a loop.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    Come on.  It's not that complicated.


    Read it again, carefully. Yes you can waste as many CPU cycles as you want to executing code over and over again

    inside a loop that produces a result that never changes (that's why it's called invariant) no matter how many times

    the code is run in the loop.

     

    Why do you want to ? It serves no purpose, at all other than wasting CPU cycles.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Reply
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    Come on.  It's not that complicated.


    Read it again, carefully. Yes you can waste as many CPU cycles as you want to executing code over and over again

    inside a loop that produces a result that never changes (that's why it's called invariant) no matter how many times

    the code is run in the loop.

     

    Why do you want to ? It serves no purpose, at all other than wasting CPU cycles.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
Children
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    Johnny put his multiplication statement inside a loop so that his program would

    run long enough for his stopwatch to be useful.  Turned out his stopwatch still

    wasn't useful because the compiler hoisted the multiplication outside of the loop.

    So Johnny learned his lesson and from now on insists on benchmarks that

    don't involve a loop containing an invariant calculation.  

     

    For example, Johnny would much rather benchmark a calculation of pi to 2000 digits

    than he would a loop that calculates 200 digits 10 times over.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    So Johnny learned his lesson and from now on insists on benchmarks that don't involve a loop

    containing an invariant calculation.

     

    Not sure why any good benchmark would do that anyway since it wouldn't produce useful results.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    You're exactly right.  No good benchmark would do that, which is why I said:

     

    3)  it doesn't rely on computing the same value over and over in a loop.     

        Benchmarks that do that can be overly sensitive to compiler loop     

        optimizations, and to just-in-time code-generation techniques.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube