RoadTest: RoadTest the Raspberry Pi 4 Model B (2GB)
Author: bernhardmayer
Creation date:
Evaluation Type: Development Boards & Tools
Did you receive all parts the manufacturer stated would be included in the package?: True
What other parts do you consider comparable to this product?: BeagleBone AI, Nvidia Jetson Nano
What were the biggest problems encountered?: Getting the right power and display adapters ;-)
Detailed Review:
This is work in progress. I am getting together my test results and findings within the next hours and days...
These are my first impressions just after unpacking and before powering up:
The mechanical changes are critical for some industrial application where there are some exactly fitting wires or holes in cases. But I think this is acceptable regarding the high power improvements of the Raspberry Pi 4. Additionally the previous versions of the Raspberry Pi are still available for a long time.
I don't think there is much to say about the Raspberry Pi which hasn't already been said. Most of you know the basic function, the operating system and its capabilities. The new things are that the Raspberry Pi 4 now has two HDMI outputs, real Gigabit Ethernet and USB 3.0.
I am going to test how suitable it is for robotics, especially autonomous driving robots. These robots have different requirements.
These coming robots will rely heavily on optical sensors, thus cameras. Now there also exist robots which drive around only using ultrasonic sensors or lidars but these sensors give you in the best case only a 2D scan with all the distances to obstacles around the robot in a defined height. But these sensors give you no information on the type of obstacle (solid wall, car approaching, blade of grass which can be pushed away) or if there is anything above or below the limited field of view of the sensor. Additionally these sensor are expensive. Even more if you need more of them to cover the whole size of the robot. The only advantage is that they don't require much computing power.
When you use cameras on your robot the whole system gets much cheaper. The camera doesn't give you discrete information on the distance of obstacles but it gives you way more information on the type of obstacle with a much wider field of view. So one camera (maybe with a fish-eye lens) can cover the whole area in front of the robot. And with the size and position of the obstacle in the image you can triangulate its position. Additionally with a camera you can read signs and markings on the ground or determine the type of ground (street, grass, ...) The downside of this approach is that it requires image processing and image processing is complicated. The progress in artificial intelligence and neural networks simplifies the image processing a little bit but it still requires high processing power.
This leads to two rival requirements on the robot:
First of all the power consumption needs to be low. If your system needs hundreds of watts you have to carry big batteries and this generates high costs and limits your payload. The second requirement is that the system needs enough computing power to do the image processing. A good start is to process two to ten images per second so that the robot can drive with enough speed and react to obstacles fast enough.
In this RoadTest We will see how well the Raspberry Pi 4 fits to these requirements.
Similar boards which also target AI and autonomous robots are the Nvidia Jetson Nano (https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/ ) and the BeagleBone AI (BeagleBoard.org - AI ) but both have a higher price tag.
I already did a similar test with Raspberry Pi 3 Model B+ last year: Deep Neural Network Benchmark with Raspberry Pi 2, 3 and 3+
Then I took the pre-trained GoogLeNet network for image classification and checked how fast it was executed on the Raspberry Pi. I used OpenCV as library. Then it was version 3.4.1. Now it is version 4.1.2. To get the most up to date version on the Raspberry Pi you have to compile it yourself on the system. For the installation follow these instructions: https://www.pyimagesearch.com/2019/09/16/install-opencv-4-on-raspberry-pi-4-and-raspbian-buster/ On my system parallel compiling on the four cores of the Raspberry Pi didn't work, so I compiled it using only a single thread. This leads us to the first benchmark:
task | duration |
---|---|
compiling OpenCV 3.4.1 on Raspberry Pi 3 Model B+ | about 5 hours |
compiling OpenCV 4.1.2 on Raspberry Pi 4 Model B | about 2 hours |
So the first point goes to the Raspberry Pi 4.
Then processing of the neural network. I posted my test program in the test of last year (linked above) and the program still runs with the new installation.
Lat year I had the following results with OpenCV 3.4.1:
Board with OpenCV 3.4.1 | time (ms) |
---|---|
Raspberry Pi 2 Rev 1.1 | 2635 |
Raspberry Pi 3 | 1804 |
Raspberry Pi 3 Model B+ | 1548 |
Now I have the following results with OpenCV 4.1.2:
Board with OpenCV 4.1.2 | time (ms) | compared to Raspberry Pi 4 |
---|---|---|
Raspberry Pi 2 Rev 1.1 | 2054 | 574 % |
Raspberry Pi 3 | 951 | 266 % |
Raspberry Pi 3 Model B+ | 818 | 228 % |
Raspberry Pi 4 Model B | 358 | 100 % |
This shows that there were big improvements in the OpenCV dnn module. Raspberry Pi 3 and Raspberry Pi 3 Model B+ are now with OpenCV 4.1.2 nearly twice as fast as with OpenCV 3.4.1. The Raspberry Pi 2 is only 30% faster. I don't know why the improvement on the Raspberry Pi 2 is not as high as on the Raspberry Pi 3. Maybe because my Raspberry Pi 2 is Rev 1.1 with a BCM2836 (Cortex-A7) and the Raspberry Pi 3 has a BCM2837 (Cortex-A53). Nevertheless the Raspberry Pi 4 is still more than twice as fast as the Raspberry Pi 3 Model B+.
This is a screenshot of my test-object:
The next tested network is ENet (https://arxiv.org/abs/1606.02147 ). This is a network for image segmentation which is trained on the popular Cityscapes Dataset (https://www.cityscapes-dataset.com/ ). This network takes every part of the image and tells whether it is road, sidewalk, terrain, person or anything different. This network knows 20 different classes. This network could be helpful for autonomous robots and tell it where the driveable surface is and support path planing.
Although the network is open source I had some problems getting the actual network. Finally I was successful with the help of this blog post (https://www.pyimagesearch.com/2018/09/03/semantic-segmentation-with-opencv-and-deep-learning/ ), downloaded the code and extracted the neural network (caffe framework). This blog post also helped me to generate my testing code. But it is in python and i transferred it to C++. So the resulting code is a mixture of the one from the blog post and my code from the GoogLeNet benchmark.
Here is the code:
#include <iostream> #include <string> #include <opencv2/dnn.hpp> #include <opencv2/core/utils/trace.hpp> #include <opencv2/opencv.hpp> #include <opencv2/imgcodecs.hpp> #include <thread> // global variables for exchange between threads cv::VideoCapture cap; // create camera input cv::Mat cameraImage; // create opencv mat for camera void cameraThread(void) // function for the camera thread { while(1) // loop forever { cap >> cameraImage; // copy camera input to opencv mat } } int main( int argc, char** argv ) { int ende=0; std::thread tcam; // create thread pointer cv::Mat colorMap=cv::Mat(256,1,CV_8UC3); // define enet color map colorMap.setTo(0); colorMap.at<cv::Vec3b>(0,0)=cv::Vec3b( 0, 0, 0); colorMap.at<cv::Vec3b>(1,0)=cv::Vec3b( 81, 0, 81); colorMap.at<cv::Vec3b>(2,0)=cv::Vec3b(244, 35,232); colorMap.at<cv::Vec3b>(3,0)=cv::Vec3b( 70, 70, 70); colorMap.at<cv::Vec3b>(4,0)=cv::Vec3b(102,102,156); colorMap.at<cv::Vec3b>(5,0)=cv::Vec3b(190,153,153); colorMap.at<cv::Vec3b>(6,0)=cv::Vec3b(153,153,153); colorMap.at<cv::Vec3b>(7,0)=cv::Vec3b(250,170, 30); colorMap.at<cv::Vec3b>(8,0)=cv::Vec3b(220,220, 0); colorMap.at<cv::Vec3b>(9,0)=cv::Vec3b(107,142, 35); colorMap.at<cv::Vec3b>(10,0)=cv::Vec3b(152,251,152); colorMap.at<cv::Vec3b>(11,0)=cv::Vec3b( 70,130,180); colorMap.at<cv::Vec3b>(12,0)=cv::Vec3b(220, 20, 60); colorMap.at<cv::Vec3b>(13,0)=cv::Vec3b( 0, 0,142); colorMap.at<cv::Vec3b>(14,0)=cv::Vec3b( 0, 0, 70); colorMap.at<cv::Vec3b>(15,0)=cv::Vec3b( 0, 60,100); colorMap.at<cv::Vec3b>(16,0)=cv::Vec3b( 0, 80,100); colorMap.at<cv::Vec3b>(17,0)=cv::Vec3b( 0, 0,230); colorMap.at<cv::Vec3b>(18,0)=cv::Vec3b(119, 11, 32); colorMap.at<cv::Vec3b>(19,0)=cv::Vec3b(255,255,255); std::cout << "OpenCV version : " << CV_VERSION << std::endl; // print opencv version for debug std::string model = "enet-model.net"; // define filenames for neural network std::string proto = ""; cv::dnn::Net net = cv::dnn::readNet(model, proto); // open net if (net.empty()) { std::cerr << "Can't load network by using the following files: " << std::endl; std::cerr << "proto: " << proto << std::endl; std::cerr << "model: " << model << std::endl; return -1; } // uncomment if camera is used /* cap.open(0); // open camera if(!cap.isOpened()) { std::cout << "no camera found!" << std::endl; return -1; } cap >> cameraImage; // copy camera input to opencv mat to get data to startup tcam=std::thread(cameraThread); // start extra thread to get camera input */ // used static image instead of camera cameraImage= cv::imread("test.jpg"); cv::resize(cameraImage, cameraImage, cv::Size(), 0.2, 0.2, cv::INTER_LINEAR); // resize to useful size std::cout << "starting ..." << std::endl; while(ende==0) { cv::Mat image; cameraImage.copyTo(image); // copy camera image to have local copy for modifications cv::Mat inputBlob = cv::dnn::blobFromImage(image, 1.0f/255.0, cv::Size(512, 512),cv::Scalar(0), true); //Convert Mat to batch of images cv::TickMeter t; net.setInput(inputBlob); //set the network input t.start(); // start timer cv::Mat result=net.forward(); // computer output t.stop(); // stop timer std::cout << "Out shape:" << result.size[0] << " x " << result.size[1] << " x " << result.size[2] << " x " << result.size[3] << "\n"; // print size of output shape std::cout << "Time: " << (double)t.getTimeMilli() / t.getCounter() << " ms" << std::endl; // print result of timer cv::Mat outSmall(result.size[2],result.size[3],CV_8UC1); // genrate Mat for output image outSmall.setTo(0); // reset image to 0 for(int i=0;i<result.size[2];i++) // go through all rows { for(int j=0;j<result.size[3];j++) // go through all cols { float maxv=-9999; // set standard value for(int k=0;k<result.size[1];k++) // go through all channels { float maxa=(float) *(result.ptr<float>(0,k,i)+j); if(maxa>maxv) // check if result of this channel is higher { maxv=maxa; outSmall.at<uchar>(i,j)=(uchar)k; // set highest channel at this pixel } } } } cv::Mat combined; cv::Mat outColor; cv::applyColorMap(outSmall,outColor,colorMap); // generate colored output with colormap cv::resize(outColor,outColor,image.size()); //resize output to input size cv::addWeighted(image,0.5,outColor,0.5,0.0,combined); // generate blended image cv::imshow("out",outColor); // show image cv::imshow("combined",combined); // show image // cv::imshow("image",image); // show image char key=cv::waitKey(1); // check if end if(key=='e') ende=1; } return 0; } nhjnjk
The code is not ery advanced but it will do the benchmarking job.
The former code used real input from the Raspberry Pi camera but for this benchmark I switched to a static image I took out side. It was taken in a park with a street and some trees and persons in it. For the benchmark I thought it would be too much effort to go to a suitable place.
This screenshot shows the output of the network blended with the image.
And this is the raw output of the network:
Now to the results:
The code of the blog post uses an input resulotion of 1024 x 512 pixels for the network. This led to a processing time of 2008 ms. This is a little bit too long for robots. So i reduced the resolution to 512 x 512 pixels. This gave me 893 ms which is still a little bit long but one can work with it. Further reduction of the resolution led to unusable output of the network. So not recommended.
Board with OpenCV 4.1.2 | time (ms) |
---|---|
Raspberry Pi 4 Model B with 512 x 512 pixel | 893 |
Raspberry Pi 4 Model B with 1024 x 512 pixel | 2008 |
I also tested this network on different Raspberry Pi generations with a resolution of 512 x 512 pixels:
Board with OpenCV 4.1.2 | time (ms) | compared to Raspberry Pi 4 |
---|---|---|
Raspberry Pi 2 Rev 1.1 | 3415 | 382 % |
Raspberry Pi 3 | 2059 | 231 % |
Raspberry Pi 3 Model B+ | 1859 | 208 % |
Raspberry Pi 4 Model B | 893 | 100 % |
In this test the Raspberry Pi 4 is also more than twice as fast as its predecessor Raspberry Pi 3 Model B+. Although in this test the advantage is not as big as in the previous test, especially compared to the Raspberry Pi 2.
During network execution I measured the following temperatures (The internal temperature was measured with the command "vcgencmd measure_temp", the external temperature was measured with a hand held pyrometer):
Network | Internal CPU temperature (°C) | measured temperature (°C) | time (ms) | CPU load | ambient temperature (°C) |
---|---|---|---|---|---|
GoogLeNet just after start | 35 | 32 | 322 | 95 % | 19 |
GoogLeNet after 1 hour | 82 | 72 | 359 | 95 % | 19 |
ENet just after start | 35 | 33 | 891 | 75 % | 19 |
ENet after 1 hour | 69 | 62 | 889 | 75 % | 19 |
One interesting point is that GoogLeNet generates 95 % CPU load and ENet only 75 %. I don't know why this is the case and will have to make further investigations. Accordingly the temperature with GoogLeNet gets higher.
Nevertheless the temperature seems to have no or only a small (GoogLeNet after 1 hour) impact on the execution time. So from this point a heat spreader is not necessary.
Another interesting point is that the internal temperature is about 10 degrees higher than the externally measured. This maybe related to measurement errors to some part. Or maybe the thermal resistance of case and lid of the CPU are very high. This would reduce the effect of a heat spreader.
However I think a heat spreader is a good investment for the Raspberry Pi, especially when it is running at high CPU loads for a long time. It helps to keep the temperatures low and reduce temperature stress on the whole device.