element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Single-Board Computers
  • Products
  • Dev Tools
  • Single-Board Computers
  • More
  • Cancel
Single-Board Computers
Forum SBC CPU Throughput
  • Blog
  • Forum
  • Documents
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Single-Board Computers to participate - click to join for free!
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • Replies 88 replies
  • Subscribers 63 subscribers
  • Views 8534 views
  • Users 0 members are here
  • cubieboard
  • olinuxino
  • sabrelite
  • bbb
  • BeagleBone
  • rpi
Related

SBC CPU Throughput

morgaine
morgaine over 12 years ago

I notice that people are doing some initial benchmarking of BBB and other boards on the RPF forum.  Results roughly as expected I guess:

 

Using just a simple

 

time echo "scale=2000;4*a(1)" | bc -l

 

as a lightweight benchmark, I see these numbers reported (smaller Time is better):

 

[table now updated with extra datapoints reported in current thread below]

 

Submitter
Time (s)
Board
SoC
Clock (MHz)
O/S
shuckle26.488Raspberry Pi BBCM2835700Raspbian 3.1.9
morgaine25.719Raspberry Pi BBCM2835700Raspbian 3.1.9+ #272
shuckle25.009Raspberry Pi BBCM2835700Raspbian 3.2.27
trn24.280Raspberry Pi BBCM2835700Raspbian ?
morgaine22.456Raspberry Pi BBCM2835800Raspbian 3.1.9+ #272
morgaine21.256Raspberry Pi BBCM2835800Raspbian 3.6.11+ #545, new firmware only
selsinork21.0MinnowboardAtom E640T1000Angstrom minnow-2013.07.10.img
shuckle17.0Raspberry Pi BBCM28351000Raspbian ?
morgaine16.153BB (white)AM3359720Angstrom v2012.01-core 3.2.5+, user-gov
selsinork15.850A20-OLinuXino-MICROA20912Debian 7.0, 3.4.67+
selsinork15.328CubieboardA20912Ubuntu/Debian 7.1
pluggy14.510BBBAM33591000Debian
morgaine14.153BBBAM33591000Debian 7.0, 3.8.13-bone20, perf-gov
selsinork13.927A10-OLinuXino-LIMEA101000Debian 7.0, 3.4.67+
Heydt13.159CubieboardA101000?
selsinork12.8Sabre-litei.MX61000Debian armhf
selsinork12.752CubieboardA20912Ubuntu/Debian 7.1 + Angstrom bc
selsinork12.090BBBAM33591000Angstrom dmnd-gov
pluggy11.923BBBAM33591000Angstrom
selsinork11.86BBBAM33591000Angstrom perf-gov
selsinork9.7Sabre-litei.MX61000Debian armhf + Angstrom bc
selsinork9.606Sabre-litei.MX61000LFS 3.12, gcc-4.8.2, glibc-2.18

 

 

As usual, take benchmarks with a truckload of salt, and evaluate with a suitable mixture of suspicion, snoring, and mirth. Use the numbers wisely, and don't draw inappropriate conclusions. image

 

Morgaine.

  • Sign in to reply
  • Cancel

Top Replies

  • Former Member
    Former Member over 12 years ago in reply to gdstew +2
    floating point doesn't get you 2000 digits.
  • morgaine
    morgaine over 12 years ago in reply to gdstew +1
    Data is always good, and sharing it is also good. The warnings are to help people avoid unwarranted conclusions. And when used properly, synthetic and other artificial benchmarks can be very valuable,…
  • Former Member
    Former Member over 12 years ago in reply to gdstew +1
    > and don't understand why you think it is a good idea to keep it in the loop so you can benchmark it. Come on. It's not that complicated. Johnny wanted to know how fast his new computer was. He decided…
  • morgaine
    morgaine over 11 years ago in reply to morgaine

    Nice little graphic of ARM family evolution:

     

    cortex-a50_series_original.jpg

    It doesn't seem to normalize for core counts though, but shows the aggregate throughput of all cores on a SoC together --- good for marketting but not as clear for our purposes.

     

    Morgaine.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago in reply to morgaine

    The A7 is basically a tweaked A8, the idea being that it's feature compatible with A15 but much lower power. To achieve that, it's still an in-order architecture, but they made some compromises in order to keep the power consumption low while making the core more easily synthesizable.  Remember, the goal of the A7 is more about being the low power part of a big.little SoC with an A15 - the target being a background task processor for something like a smartphone where you don't need a power hungry core eating your battery when you're not actively using it.

    There's an interesting discussion here http://www.anandtech.com/show/4991/arms-cortex-a7-bringing-cheaper-dualcore-more-power-efficient-highend-devices

    I'm not normally a reader of annandtech, and you'll need to ignore the fanboys from both sides in the comments, but there's a reasonable explanation of some of the reasons A7 is slower in some areas than A8.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago in reply to morgaine

    Morgaine Dinova wrote:

     

    It doesn't seem to normalize for core counts though, but shows the aggregate throughput of all cores on a SoC together --- good for marketting but not as clear for our purposes.

    Yep, definately a marketing slide. Performance of the A15 graphed against power consumption of the A7 certainly makes it look good. Once you make the jump from fairly simple in-order execution to complex out-of-order there's a penalty in the form of increased power. Intel discovered that in the Prescott days and dropped back to a simpler architecture starting with Core.

    While Arm is ahead on power consumption, that's at the cost of performance. As they start aiming for increased performance and x86 territory, the power consumption will have to increase as well. That's not to say they can't get similar performance with lower power than Intel can today, but who knows what Intel will be doing by then.

     

    That said, I'm finding the A9 powered i.MX6 to be far more performant than I'd expected. Perhaps having my first proper interaction with Arm being the ARM11 based RPi has left me with expectations that are too low.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • johnbeetem
    johnbeetem over 11 years ago in reply to Former Member

    selsinork wrote:

    Morgaine Dinova wrote:

     

    It doesn't seem to normalize for core counts though, but shows the aggregate throughput of all cores on a SoC together --- good for marketing but not as clear for our purposes.

    Yep, definitely a marketing slide. Performance of the A15 graphed against power consumption of the A7 certainly makes it look good. Once you make the jump from fairly simple in-order execution to complex out-of-order there's a penalty in the form of increased power. Intel discovered that in the Prescott days and dropped back to a simpler architecture starting with Core.

     

    While Arm is ahead on power consumption, that's at the cost of performance. As they start aiming for increased performance and x86 territory, the power consumption will have to increase as well. That's not to say they can't get similar performance with lower power than Intel can today, but who knows what Intel will be doing by then.

    Yes, that's a very pretty slide Morgaine posted.  But we know we have to be careful with the term "peak performance", which a clever wag once defined as "a guarantee from the manufacturer that you can't go faster than this".

     

    Speaking of clever aphorisms, you may have heard Hamming's quote: "The purpose of computation is insight, not numbers".  Gio Wiederhold transformed this into: "The number of computations without purpose is out of sight."  Well, that's what you get with out-of-order speculative execution: you know a bunch of results will get discarded and the power expended to compute them will be wasted.  Whenever I see an implementation with long pipelines I think about all those flushes and all those electrons rushing from ground to Vdd acting like a big switched-capacitor resistor.  And for what purpose?  So that wasteful software has acceptable performance (sigh).

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • morgaine
    morgaine over 11 years ago in reply to johnbeetem

    John Beetem wrote:

     

    Well, that's what you get with out-of-order speculative execution: you know a bunch of results will get discarded and the power expended to compute them will be wasted.  Whenever I see an implementation with long pipelines I think about all those flushes and all those electrons rushing from ground to Vdd acting like a big switched-capacitor resistor.  And for what purpose?  So that wasteful software has acceptable performance (sigh).

     

    So true.  And software is often wasteful even in places where we don't usually expect it, simply because you can't normally satisfy a wide range of requirements at the same time equally well.  Here's an example.

     

    Desktop machines already surpassed (a few years ago) the power they need to implement the desktop metaphor perfectly in respect of performance, meaning that normal "metaphoric paper" operations such as organizing documents and folders are perceptually instantaneous.  (More power is needed only by algorithmic operations such as searching, which typically aren't yet perceptually instantaneous.)

     

    And yet, the dumb software merrily runs the CPUs turned up to 11 in order to respond in 100 microseconds instead of in the few milliseconds required by frame rate and our perceptions.  It's a waste of power because the efficiency/clock-rate curve is not linear (it's less efficient at higher speeds), and the time saved by doing something faster can't be turned into energy saving by idling sooner because switching to idle is not an instantaneous operation.  The combination of these two factors means that the fast CPUs of the last several years waste energy for the bulk of what we do on the desktop.

     

    Morgaine.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    Ok, so some more results for Allwinner based boards, this is all  with my minimal LFS based armhf userspace and with essentially identical 3.4.67 kernel from sunxi. I've rebuilt u-boot & the kernel with the latest versions. The cubieboard2 and OLinuXino-A20-Micro use exactly the same kernel but with their own script.fex/script.bin

     

    A10-Lime 11.742s

    A20-OLinuXino-Micro 12.145s

    A20-Cubieboard2 12.147s

     

    So there's some reasonable improvements on the previous Debian based numbers, and around what we expected. The Cortex-A7 vs Cortex-A8 differential we discussed is easily visible and can only really be explained by architecture and clock speed differences as everything else is effectively identical.

    Next step is to put the same code onto a BBB, but I'm not really expecting any difference compared to the A10 based LIME

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    While our tests here have been limited to a single very specific and easy to run test, they only show one aspect.

     

    For the Allwinner devices, there's some results at http://sunxi.org/Benchmarks It's interesting to compare the openssl speed numbers between A10 & A20 and note that the A10 is significantly faster (and I've confirmed similar numbers between the A10 lime and the A20 Micro).

    Then compare the A10 & A20 Linpack where the A20 seems significantly faster. Of course the comparison isn't quite fair as, from the compile options, they appear to be comparing NEON on the A10 against VFPv4 on the A20, the A10 doesn't have VFPv4.

     

    However, having tried identical linpack binaries on both A10 & A20 I'm getting results that suggest the A20 is approx 3x faster for these floating point operations and it doesn't seem to matter if I use VFPv3 or NEON on both, the A20 still outperforms the A10.

     

    The Cortex-A9 based i.MX6 still beats both A10 & A20, but the A20 is surprisingly close, approx 120000 KFLOPS for the A20 compared to approx 150000 KFLOPS for the i.MX6 when using VFPv3 or NEON, the A20 manages approx 145000 KFLOPS with VFPv4. Some of this will be down to raw clock speed difference, 912MHz for the A20, 996MHz for the iMX6.

    So it seems that in order to gain feature parity with the Cortex-A15, the Cortex-A7 has been gifted the newer floating point unit in it's entirety. Bonus if you're doing floating point stuff on the A20..

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
  • Former Member
    Former Member over 11 years ago

    A20 Cubietruck

    12.149s

     

    No real surprise that it's very similar to other boards with the A20 chip.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Cancel
<
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube