element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC CPU Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 88 replies
  • Subscribers 63 subscribers
  • Views 8453 views
  • Users 0 members are here
  • cubieboard
  • olinuxino
  • sabrelite
  • bbb
  • BeagleBone
  • rpi
Related

SBC CPU Throughput

morgaine
morgaine over 12 years ago

I notice that people are doing some initial benchmarking of BBB and other boards on the RPF forum.  Results roughly as expected I guess:

 

Using just a simple

 

time echo "scale=2000;4*a(1)" | bc -l

 

as a lightweight benchmark, I see these numbers reported (smaller Time is better):

 

[table now updated with extra datapoints reported in current thread below]

 

Submitter
Time (s)
Board
SoC
Clock (MHz)
O/S
shuckle26.488Raspberry Pi BBCM2835700Raspbian 3.1.9
morgaine25.719Raspberry Pi BBCM2835700Raspbian 3.1.9+ #272
shuckle25.009Raspberry Pi BBCM2835700Raspbian 3.2.27
trn24.280Raspberry Pi BBCM2835700Raspbian ?
morgaine22.456Raspberry Pi BBCM2835800Raspbian 3.1.9+ #272
morgaine21.256Raspberry Pi BBCM2835800Raspbian 3.6.11+ #545, new firmware only
selsinork21.0MinnowboardAtom E640T1000Angstrom minnow-2013.07.10.img
shuckle17.0Raspberry Pi BBCM28351000Raspbian ?
morgaine16.153BB (white)AM3359720Angstrom v2012.01-core 3.2.5+, user-gov
selsinork15.850A20-OLinuXino-MICROA20912Debian 7.0, 3.4.67+
selsinork15.328CubieboardA20912Ubuntu/Debian 7.1
pluggy14.510BBBAM33591000Debian
morgaine14.153BBBAM33591000Debian 7.0, 3.8.13-bone20, perf-gov
selsinork13.927A10-OLinuXino-LIMEA101000Debian 7.0, 3.4.67+
Heydt13.159CubieboardA101000?
selsinork12.8Sabre-litei.MX61000Debian armhf
selsinork12.752CubieboardA20912Ubuntu/Debian 7.1 + Angstrom bc
selsinork12.090BBBAM33591000Angstrom dmnd-gov
pluggy11.923BBBAM33591000Angstrom
selsinork11.86BBBAM33591000Angstrom perf-gov
selsinork9.7Sabre-litei.MX61000Debian armhf + Angstrom bc
selsinork9.606Sabre-litei.MX61000LFS 3.12, gcc-4.8.2, glibc-2.18

 

 

As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth. Use the numbers wisely, and don't draw inappropriate conclusions. image

 

Morgaine.

  • Sign in to reply
  • Cancel

Top Replies

  • Former Member
    Former Member over 12 years ago in reply to gdstew +2
    floating point doesn't get you 2000 digits.
  • morgaine
    morgaine over 12 years ago in reply to gdstew +1
    Data is always good, and sharing it is also good. The warnings are to help people avoid unwarranted conclusions. And when used properly, synthetic and other artificial benchmarks can be very valuable,…
  • Former Member
    Former Member over 12 years ago in reply to gdstew +1
    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it. Come on. It's not that complicated. Johnny wanted to know how fast his new computer was. He decided…
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    1) it is trivially easy to run, nothing to download, nothing to compile,

        so you can easily get results from lots of people, allowing you to

        see how consistent the results are, and they seem to be pretty consistent.

     

    I was totally unaware that triviality in any form was considered a good trait for a benchmark. Doesn't make sense to me though.

     

    2)  it is not subject to personal differences in what compiler was used

        to compile it, or what optimization levels or other compiler switches

        were used, although it will exhibit such differences between distros.

     

    Absolutely no way to know this at all. The programs run in the "benchmark" were almost certainly compiled using different versions of GCC with differing levels of optimization

    built in and with unknown compile swithes used (most are probably the same, some depend on the CPU) when the OS was built.

     

    3)  it doesn't rely on computing the same value over and over in a loop.

         Benchmarks that do that can be overly sensitive to compiler loop

         optimizations, and to just-in-time code-generation techniques.

     

    That is why good synthetic benchmarks consist of many programs. The level of loop optimization available is a good thing to know as is knowing if using

    JIT is something you can do if you prefer to use it (see first statement).

     

    4)  It has a pretty-well understood area of application, integer compute bound.

        Obviously you wouldn't use it to measure floating-point performance, or

        gpu performance, or I/O performance, etc.

     

    I think all those other things are actually good things to benchmark too since they can all affect applications.

     

    5)  It uses data that is large enough to show the benefit of large data caches,

         similar to typical user applications.

     

    Although in the real world this will probably be the exception, not the rule. Yes big caches are good, but using a benchmark

    that it executes (or executes mostly) in cache or keeps (most of) its data in cache skews the results too. Something I believe

    you mentioned earlier as not being desirable.

     

    6)  It takes about the right amount of time to run--not so short that the time to

         load the benchmark matters, or that the accuracy of the clock matters,

         and not so long that you can't easily run it several times to see that the

         results are consistent.

     

    OK. Not really what I consider to be in the top 10 on my list of desirable benchmark traits and pretty much in direct opposition

    of getting useful results. The phrase that comes to mind is quick and dirty. Personnaly I don't want to wait a real long time for

    results either so I prefer to be able to choose what I need to check and how many iterations they run for resonably repeatable

    results.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > The level of loop optimization available is a good thing to know ...

     

    No, its not.

     

    In order to have a benchmark that runs long enough to get decent timings, many benchmarks

    put a loop around the code they want to test, on the theory that it will take N times longer

    to execute a loop N times than it would to do it once.  That theory is just plain wrong

    because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.   In some cases, the entire contents of the loop can be done only once.

    So you think you're measuring the time it takes to do some computation, but you're really not. 

     

    It's tempting to think that it doesn't matter, because a compiler that does good loop optimizations

    is better than one that doesn't.  Which may be true to some extent.  But the problem is that

    your application most likely doesn't repeatedly do the same calculation over and over in a loop,

    so your application won't see the same speedup as your synthetic benchmark.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to morgaine

    The interpretation that people give to good data is an entirely different matter, and my cautions were the usual advice about taking perfectly good numbers and making wholly incorrect conclusions about them.

     

    How can you provide a useful interpretation of data that in your own words you should "take (benchmarks) with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth.". I mean really,

    if this is your idea of normal cautions for data I'd like to see your idea a real bona-fide warning.


    AFAWK the numbers are totally accurate.

     

    But not really useful for the real world (see previous responses) which is the point you keep dancing around.

     

    Even you have agreed with that,

     

    I agree with what you said, but not that the data in this "benchmark" is good in the sense that a useful interpretation of much of anything is possible using it. Which is

    again the point you keep dancing around. It is far too simplistic a benchmark to provide that.

     

    so you're really just looking for a fight as usual.

     

    Something I've seen you do on numerous occasions yourself.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > As to the benchmark itself, no single line program line can be considered as a valuable synthetic benchmark of anything.

    ...

    > I was totally unaware that triviality in any form was considered a good trait for a benchmark. Doesn't make sense to me though.

    ...

    > It is far too simplistic a benchmark to provide that.

     

    You obviously have no clue what this benchmark does.  It isn't a "single line program" at all.

    Invoking the program takes a single line, and that is a good thing.

    There's a very important difference between the command to invoke a program

    and the program itself. 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    In order to have a benchmark that runs long enough to get decent timings, many benchmarks

    put a loop around the code they want to test, on the theory that it will take N times longer

    to execute a loop N times than it would to do it once.  That theory is just plain wrong

    because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.   In some cases, the entire contents of the loop can be done only once.

    So you think you're measuring the time it takes to do some computation, but you're really not.

     

    Yes it is good to know where and how much the loops have been inlined, otherwise bad interpretaions of the

    results are probable. So I guess that you should know what you are doing to get good results.

     

    That theory is just plain wrong because any decent compiler will hoist as much code as possible outside the loop where

    it's only done once.

     

    In most decent compilers the level of inlining avaialable is also usually selectable using one or more compile time directives or

    by the compiler itself mainly (but not exclusively) due to code size limitations.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 12 years ago in reply to gdstew

    Gary Stewart wrote:

     

    How can you provide a useful interpretation of data that in your own words you should "take (benchmarks) with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth.".

     

    A rational person can provide a useful interpretation by using ordinary engineering knowledge and commonsense.

     

    I refer you to coder27's post about the data confirming that execution times are proportional to clock rates for a given architecture, and likewise confirming that there is a significant improvement from ARM11 to Cortex-A8 for a given clock rate.  These are helpful observations in that they confirm what is expected, and if the numbers had indicated something entirely different then we would have some very serious issues to investigate.

     

    > I wrote:
    > AFAWK the numbers are totally accurate.

     

    But not really useful for the real world (see previous responses) which is the point you keep dancing around.

     

    See previous section.  It is you who chose to dance around and ignore the useful and helpful interpretations of this data explained well in coder27's post, and instead dived in here directly at me without provocation nor valid reason.

     

    I have consistently stated that good data is useful when used appropriately, but unhelpful when used inappropriately.  I clearly said "Results roughly as expected" in my opening post which points to the data being useful, and then I gave the usual cautions about using benchmark data wrongly to reach inappropriate conclusions.

     

    Nobody here has made any inappropriate conclusions from this data, so whatever are you arguing about?

     

    Morgaine.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • shabaz
    shabaz over 12 years ago in reply to Former Member

    As far as I know, some open source* and possibly some commercial benchmarks do a similar computation as part of a CPU intensiveness test (at least for single cored processors). Probably, for an embedded app where we may not be interested in say, file I/O, or multimedia extensions or multi-cored results, then it does have some value.

     

    *Source: OSmark

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    > Yes it is good to know where and how much the loops have been inlined, otherwise bad interpretaions of the results are probable.

     

    Let me spell it out for you.  I said nothing about inlining.   Inlining is something that applies

    to subprograms, and reduces the call/return overhead.  It doesn't apply to loops at all,

    although there is an optimization called loop unrolling that is similar to subprogram inlining.

     

    The loop optimization I referred to is called loop invariant code hoisting.  It involves moving

    code from inside the loop to outside (before) the loop, where it is only done once instead

    of N times.  This optimization prevents you from knowing how long the code takes to run that you

    were intending to measure.   And it is very significant that this Pi benchmark isn't susceptible

    to this optimization.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • gdstew
    gdstew over 12 years ago in reply to Former Member

    Yes actually I do know that it runs as an interpreted program in the bash shell and executes echo to send a math equation to execute and executes the external compiled program (bc) to compute pi to 2000 digits

    and time and how long it takes to do so. So you are mainly testing floating point performance. While this is not really simplistic (not really complicated either), it is not a really useful for much of anything other

    than a FP benchmark either.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 12 years ago in reply to gdstew

    floating point doesn't get you 2000 digits.

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • Cancel
<>
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube