RoadTest: Try out the Raspberry Pi Model 3 B Plus!
Author: Nitin_Bhaskar
Creation date:
Evaluation Type: Development Boards & Tools
Did you receive all parts the manufacturer stated would be included in the package?: True
What other parts do you consider comparable to this product?: Raspberry Pi, Raspberry Pi 2B, Dragonboard 410c
What were the biggest problems encountered?: Native compilation performance is bad. SSH disconnects over long run.
Detailed Review:
Raspberry pi over years have proved to be an ideal platform for educational and quick prototyping purpose. I was lucky enough to be selected as one for the roadtester for new Raspberry pi 3B+. My review basically is targeted towards the improvement over previous versions, benchmarking and ease of developing/running ML/AI frameworks such as tensorflow and ARM NN SDK.
The Rpi 3B+ arrived in box as shown below.
The box contained a safety instruction with quick start manual and Rpi 3B+(ofcourse!).
Roadtest prerequisite:
I used the following for this roadtest:
Below is the image of Raspberry pi with micro SD card used.
{tabbedtable} Tab Label | Tab Content | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CPU | This Raspberry pi has SoC from Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC clocked at 1.4GHz. The processor is encapsulated in a new package with a heat spreader for better thermal control (see below image).
Below is the console output from "/proc/cpuinfo", processor : 0 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 1 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 2 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 3 model name : ARMv7 Processor rev 4 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Hardware : BCM2835 Revision : a020d3 Serial : 00000000379948e6 As it can be seen hardware reported is BCM2835 instead of BCM2837 as it shown in previous versions of Rpi. Few info on CPU frequency scaling, pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 600000 pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 1400000 pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq 600000 pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies 600000 1400000 pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 600000 pi@raspberrypi:~ $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand Very clearly, only 600MHz and 1400MHz is supported by scaling governor and switching scheme is "ondemand". During "sysbench" profiling, below is variation of CPU temperature and CPU throttling seen at the end,
temp=54.2'C 1400000 temp=53.7'C 1400000 temp=53.7'C 1400000 temp=54.8'C 1400000 temp=54.2'C 1400000 temp=53.7'C 1400000 temp=53.7'C 600000 temp=53.7'C 600000 ^C pi@raspberrypi:~ $ Note that this temperature logged is die temperature and it reached max of 54.8 degree Celsius. CPU frequency changed according to the demand and at the end when test was over, it came back to 600MHz.
I ran few benchmark tests on Rpi 3B+ and below is my finding, Sysbench: pi@raspberrypi:~ $ sysbench --num-threads=4 --test=cpu --cpu-max-prime=20000 --validate run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Additional request validation enabled. Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 20000 Test execution summary: total time: 186.6278s total number of events: 10000 total time taken by event execution: 746.3490 per-request statistics: min: 73.83ms avg: 74.63ms max: 97.98ms approx. 95 percentile: 76.14ms Threads fairness: events (avg/stddev): 2500.0000/33.25 execution time (avg/stddev): 186.5872/0.03
Below is result from Rpi 2B+, Test execution summary: total time: 431.4357s total number of events: 10000 total time taken by event execution: 1725.2872 per-request statistics: min: 76.31ms avg: 172.53ms max: 357.22ms approx. 95 percentile: 188.58ms Threads fairness: events (avg/stddev): 2500.0000/12.51 execution time (avg/stddev): 431.3218/0.04
Comparison table:
Clearly it can be see that the execution time taken by Rpi 3B+ is less than half of that taken by Rpi 2B+. | |||||||||||||||
RAM | The RAM used in this version of Raspberry pi is same as one used in previous version of Rpi 3B+ and Rpi 2B+ which is B8132B4PB-8D-F RAM chip (1GB) from Elpida. Below is some benchmark number, Sysbench: pi@raspberrypi:~ $ sysbench --test=memory --memory-block-size=1K --memory-total-size=1G --num-threads=1 run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing memory operations speed test Memory block size: 1K Memory transfer size: 1024M Memory operations type: write Memory scope type: global Threads started! Done. Operations performed: 1048576 (276646.74 ops/sec) 1024.00 MB transferred (270.16 MB/sec) Test execution summary: total time: 3.7903s total number of events: 1048576 total time taken by event execution: 3.0114 per-request statistics: min: 0.00ms avg: 0.00ms max: 3.63ms approx. 95 percentile: 0.00ms Threads fairness: events (avg/stddev): 1048576.0000/0.00 execution time (avg/stddev): 3.0114/0.00
Below output is from Rpi 2B+, Test execution summary: total time: 8.7923s total number of events: 1048576 total time taken by event execution: 6.7600 per-request statistics: min: 0.00ms avg: 0.01ms max: 20.28ms approx. 95 percentile: 0.00ms Threads fairness: events (avg/stddev): 1048576.0000/0.00 execution time (avg/stddev): 6.7600/0.00
Comparison table:
Memtester: pi@raspberrypi:~ $ time memtester 128M 1 memtester version 4.3.0 (32-bit) Copyright (C) 2001-2012 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 128MB (134217728 bytes) got 128MB (134217728 bytes), trying mlock ...locked. Loop 1/1: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok 8-bit Writes : ok 16-bit Writes : ok Done. real 6m8.503s user 6m7.887s sys 0m0.590s
Below output is from Rpi 2B+, real 20m6.274s user 13m22.910s sys 0m1.940s
Comparison table:
In both tests, Rpi 2B+ has taken time more than double that of Rpi 3B+. | |||||||||||||||
PMIC | There is an onboard power management IC from MaxLinear and the part number is MxL7704. It is five output PMIC optimized for powering low power microprocessors. | |||||||||||||||
USB + Ethernet | This version of Raspberry Pi uses LAN7515 which is capable of providing USB hub and Gigabit Ethernet functionalities, hence, a common section for USB and Ethernet. Although LAN7515 supports gigabit Ethernet, due to USB speed limitation the throughput is limited to 300Mbps. LAN7515 is an upgrade over previous version of Rpi which used LAN9514.
My throughput test here involves following aspects,
USB speed test: Time taken to transfer 0.99GB file: 1m16s
Ethernet throughput: TCP: Rx - 203Mbps Tx - 327Mbps
UDP: Rx - 212Mbps Tx - 265Mbps
USB + Ethernet: TCP Rx - 166Mbps and time taken for USB transfer of 0.99GB file during simultaneous TCP Rx - 1m32s
TCP Tx -244Mbps and Time taken for USB transfer of 0.99GB file during simultaneous TCP Tx- 1m22s | |||||||||||||||
WiFi | This revision of Raspberry Pi received a major upgrade on WiFi front. The new Rpi 3B+ has support for 802.11ac. The wireless solution used here is CYW43455 from Cypress which supports single stream IEEE 802.11.b/g/n/ac wireless LAN, as well as Bluetooth 4.2/BLE. This is the first Raspberry Pi that supports both 2.4GHz and 5GHz bands. CYW43455 supports WPA/WPA2 and WPS.
As far as linux driver is concerned, open source FMAC from Cypress is used. Below are the throughput numbers,
For BT/BLE, drivers used are, hci_uart 36864 1 btbcm 16384 1 hci_uart serdev 20480 1 hci_uart bluetooth 368640 29 hci_uart,bnep,btbcm,rfcomm ecdh_generic 28672 1 bluetooth
Below is output during BT pairing, pi@raspberrypi:~ $ bluetoothctl [NEW] Controller B8:27:EB:33:xx:xx raspberrypi [default] [bluetooth]# agent KeyboardOnly Agent registered [bluetooth]# default-agent Default agent request successful [bluetooth]# power on Changing power on succeeded [bluetooth]# scan on Discovery started [CHG] Controller B8:27:EB:33:xx:xx Discovering: yes [NEW] Device 04:D1:3A:8D:xx:xx Mi 5 [bluetooth]# pair 04:D1:3A:8D:xx:xx Attempting to pair with 04:D1:3A:8D:xx:xx [CHG] Device 04:D1:3A:8D:xx:xx Connected: yes Request passkey [agent] Enter passkey (number in 0-999999): 773283 [CHG] Device 04:D1:3A:8D:xx:xx Modalias: bluetooth:v001Dp1200d1436 [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001105-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000110a-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000110c-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000110e-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001112-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001115-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001116-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000111f-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000112d-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 0000112f-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001132-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001200-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001800-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx UUIDs: 00001801-0000-1000-8000-00805f9b34fb [CHG] Device 04:D1:3A:8D:xx:xx ServicesResolved: yes [CHG] Device 04:D1:3A:8D:xx:xx Paired: yes Pairing successful | |||||||||||||||
Running Tensorflow | Running Tensorflow on Rpi 3B+ is very simple. First download and install Tensorflow using pip,
sudo pip install tensorflow
Download sample imagenet example which classifies image. wget https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/imagenet/classify_image.py
Run downloaded example. When run first time, it would download inception model. Below profiling is done on the second run. pi@raspberrypi:~ $ time python classify_image.py 2018-06-06 14:52:55.854013: W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization(). giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296) custard apple (score = 0.00147) earthstar (score = 0.00117) real 0m50.943s user 0m54.723s sys 0m3.696s | |||||||||||||||
Running ARM NN SDK | ARM has released its neural network SDK which is optimized for opencl and NEON. Since we have NEON support on Rpi 3B+, I will try running an ARM NN example optimized for NEON.
First let's create a folder, mkdir project cd project
Download alexnet model, wget https://developer.arm.com//-/media/developer/technologies/Machine%20learning%20on%20Arm/Tutorials/Running%20AlexNet%20on%20Pi%20with%20Compute%20Library/compute_library_alexnet.zip
Unzip alexnet model, unzip compute_library_alexnet.zip -d assets_alexnet
Install scons, sudo apt-get install scons
Clone Compute Library SDK git clone https://github.com/Arm-software/ComputeLibrary.git
Build using scons with option "neon=1" to enable NEON optimization cd ComputeLibrary scons Werror=1 debug=0 asserts=0 neon=1 opencl=0 examples=1 build=native –j4
Running the alexnet example pi@raspberrypi:~/project/ComputeLibrary $ export LD_LIBRARY_PATH=build/ pi@raspberrypi:~/project/ComputeLibrary $ PATH_ASSETS=../assets_alexnet pi@raspberrypi:~/project/ComputeLibrary $ time ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt ./build/examples/graph_alexnet Usage: ./build/examples/graph_alexnet 0 ../assets_alexnet ../assets_alexnet/go_kart.ppm ../assets_alexnet/labels.txt [fast_math_hint] No fast math info provided: disabling fast math ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 0m5.827s user 0m14.469s sys 0m1.583s | |||||||||||||||
ZRAM | The performance of Raspberry Pi can be improved slightly using ZRAM.
First download the script from https://wiki.debian.org/ZRam Let's experiment on ARM NN example, Without ZRAM, pi@raspberrypi:~/project/ComputeLibrary $ export LD_LIBRARY_PATH=build/ pi@raspberrypi:~/project/ComputeLibrary $ PATH_ASSETS=../assets_alexnet pi@raspberrypi:~/project/ComputeLibrary $ time ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt ./build/examples/graph_alexnet Usage: ./build/examples/graph_alexnet 0 ../assets_alexnet ../assets_alexnet/go_kart.ppm ../assets_alexnet/labels.txt [fast_math_hint] No fast math info provided: disabling fast math ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 0m5.827s user 0m14.469s sys 0m1.583s
Now start the zram compression pi@raspberrypi:~/project/ComputeLibrary $ sudo /home/pi/zram.sh start Setting up swapspace version 1, size = 173.9 MiB (182292480 bytes) no label, UUID=358dc673-1764-4897-9ab7-27e4b19d1a62 Setting up swapspace version 1, size = 173.9 MiB (182292480 bytes) no label, UUID=596434c2-483e-453b-933b-816d44c4587f Setting up swapspace version 1, size = 173.9 MiB (182292480 bytes) no label, UUID=e2cf31ed-d23b-45a3-af5d-fa10a0754220 Setting up swapspace version 1, size = 173.9 MiB (182292480 bytes) no label, UUID=88836e77-647a-4470-94b2-c7357c8d84d9
Rerun the ARM NN SDK example pi@raspberrypi:~/project/ComputeLibrary $ time ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt ./build/examples/graph_alexnet Usage: ./build/examples/graph_alexnet 0 ../assets_alexnet ../assets_alexnet/go_kart.ppm ../assets_alexnet/labels.txt [fast_math_hint] No fast math info provided: disabling fast math ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 0m5.836s user 0m14.186s sys 0m1.600s
Comparison table:
A small improvement is seen. |
Overall I found Rpi 3B+ a neat piece of hardware with pretty stable OS support. I would have liked to see an upgrade of RAM, hopefully in next version of Rpi .
Top Comments
Nitin_Bhaskar To confirm only RPI 3B+ was provided and not SD card? May I suggest you to put the values you obtained in the console in a table or graph. I really had a hard time comparing them myself!
…Hi Dixon.
Yes, only Rpi 3B+ was provided and not SD card. I have created comparison table for easy reference.
Thanks,
Nitin
nice to see the deep learning tests.
agree on the RAM update.