RoadTest the Raspberry Pi 4 Model B (2GB) - Review

Table of contents

RoadTest: RoadTest the Raspberry Pi 4 Model B (2GB)

Author: bernhardmayer

Creation date:

Evaluation Type: Development Boards & Tools

Did you receive all parts the manufacturer stated would be included in the package?: True

What other parts do you consider comparable to this product?: BeagleBone AI, Nvidia Jetson Nano

What were the biggest problems encountered?: Getting the right power and display adapters ;-)

Detailed Review:

Hint

 

This is work in progress. I am getting together my test results and findings within the next hours and days...

 

First impressions

These are my first impressions just after unpacking and before powering up:

  • Box: The Raspberry Pi 4 comes in a simple paper box. It was just lying in there with no ESD shielding bag. I think this is a little bit critical. but this was already the same with the raspberry Pi 3
  • Power Input and HDMI: Power input changed from Micro USB to USB C and HDMI changed from Type A (Standard) to Type C (micro). So you have to get new adapters.
  • Ethernet position changed: The Ethernet connector is now on the right.
  • Raspberry logo on the wifi shielding: There is no more logo on the wifi shielding

 

The mechanical changes are critical for some industrial application where there are some exactly fitting wires or holes in cases. But I think this is acceptable regarding the high power improvements of the Raspberry Pi 4. Additionally the previous versions of the Raspberry Pi are still available for a long time.

 

Introduction

I don't think there is much to say about the Raspberry Pi which hasn't already been said. Most of you know the basic function, the operating system and its capabilities. The new things are that the Raspberry Pi 4 now has two HDMI outputs, real Gigabit Ethernet and USB 3.0.

 

I am going to test how suitable it is for robotics, especially autonomous driving robots. These robots have different requirements.

 

These coming robots will rely heavily on optical sensors, thus cameras. Now there also exist robots which drive around only using ultrasonic sensors or lidars but these sensors give you in the best case only a 2D scan with all the distances to obstacles around the robot in a defined height. But these sensors give you no information on the type of obstacle (solid wall, car approaching, blade of grass which can be pushed away) or if there is anything above or below the limited field of view of the sensor. Additionally these sensor are expensive. Even more if you need more of them to cover the whole size of the robot. The only advantage is that they don't require much computing power.

When you use cameras on your robot the whole system gets much cheaper. The camera doesn't give you discrete information on the distance of obstacles but it gives you way more information on the type of obstacle with a much wider field of view. So one camera (maybe with a fish-eye lens) can cover the whole area in front of the robot. And with the size and position of the obstacle in the image you can triangulate its position. Additionally with a camera you can read signs and markings on the ground or determine the type of ground (street, grass, ...) The downside of this approach is that it requires image processing and image processing is complicated. The progress in artificial intelligence and neural networks simplifies the image processing a little bit but it still requires high processing power.

 

This leads to two rival requirements on the robot:

First of all the power consumption needs to be low. If your system needs hundreds of watts you have to carry big batteries and this generates high costs and limits your payload. The second requirement is that the system needs enough computing power to do the image processing. A good start is to process two to ten images per second so that the robot can drive with enough speed and react to obstacles fast enough.

 

In this RoadTest We will see how well the Raspberry Pi 4 fits to these requirements.

 

Similar boards

Similar boards which also target AI and autonomous robots are the Nvidia Jetson Nano (https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/ ) and the BeagleBone AI (BeagleBoard.org - AI ) but both have a higher price tag.

 

Tests

Neural networks

OpenCV 4.1.2

 

I already did a similar test with Raspberry Pi 3 Model B+ last year: Deep Neural Network Benchmark with Raspberry Pi 2, 3 and 3+

Then I took the pre-trained GoogLeNet network for image classification and checked how fast it was executed on the Raspberry Pi.  I used OpenCV as library. Then it was version 3.4.1. Now it is version 4.1.2. To get the most up to date version on the Raspberry Pi you have to compile it yourself on the system. For the installation follow these instructions: https://www.pyimagesearch.com/2019/09/16/install-opencv-4-on-raspberry-pi-4-and-raspbian-buster/  On my system parallel compiling on the four cores of the Raspberry Pi didn't work, so I compiled it using only a single thread. This leads us to the first benchmark:

 

task
duration
compiling OpenCV 3.4.1 on Raspberry Pi 3 Model B+about 5 hours
compiling OpenCV 4.1.2 on Raspberry Pi 4 Model Babout 2 hours

 

So the first point goes to the Raspberry Pi 4.

 

GoogLeNet

Then processing of the neural network. I posted my test program in the test of last year (linked above) and the program still runs with the  new installation.

Lat year I had the following results with OpenCV 3.4.1:

Board with OpenCV 3.4.1
time (ms)
Raspberry Pi 2 Rev 1.12635
Raspberry Pi 31804
Raspberry Pi 3 Model B+1548

 

Now I have the following results with OpenCV 4.1.2:

Board with OpenCV 4.1.2
time (ms)compared to Raspberry Pi 4
Raspberry Pi 2 Rev 1.12054574 %
Raspberry Pi 3951266 %
Raspberry Pi 3 Model B+818228 %
Raspberry Pi 4 Model B358100 %

 

This shows that there were big improvements in the OpenCV dnn module. Raspberry Pi 3 and Raspberry Pi 3 Model B+ are now with OpenCV 4.1.2 nearly twice as fast as with OpenCV 3.4.1. The Raspberry Pi 2 is only 30% faster. I don't know why the improvement on the Raspberry Pi 2 is not as high as on the Raspberry Pi 3. Maybe because my Raspberry Pi 2 is Rev 1.1 with a BCM2836 (Cortex-A7) and the Raspberry Pi 3 has a BCM2837 (Cortex-A53). Nevertheless the Raspberry Pi 4 is still more than twice as fast as the Raspberry Pi 3 Model B+.

 

This is a screenshot of my test-object:

image

 

ENet

The next tested network is ENet (https://arxiv.org/abs/1606.02147 ). This is a network for image segmentation which is trained on the popular Cityscapes Dataset (https://www.cityscapes-dataset.com/ ). This network takes every part of the image and tells whether it is road, sidewalk, terrain, person or anything different. This network knows 20 different classes. This network could be helpful for autonomous robots and tell it where the driveable surface is and support path planing.

 

Although the network is open source I had some problems getting the actual network. Finally I was successful with the help of this blog post (https://www.pyimagesearch.com/2018/09/03/semantic-segmentation-with-opencv-and-deep-learning/ ), downloaded the code and extracted the neural network (caffe framework). This blog post also helped me to generate my testing code. But it is in python and i transferred it to C++. So the resulting code is a mixture of the one from the blog post and my code from the GoogLeNet benchmark.

 

Here is the code:

 

 

#include <iostream>  
#include <string>  
#include <opencv2/dnn.hpp>  
#include <opencv2/core/utils/trace.hpp>  
#include <opencv2/opencv.hpp> 
#include <opencv2/imgcodecs.hpp>
#include <thread>  
      
// global variables for exchange between threads  
cv::VideoCapture cap;    // create camera input  
cv::Mat cameraImage;  // create opencv mat for camera  

void cameraThread(void)    // function for the camera thread  
{  
    while(1)    // loop forever  
    {  
        cap >> cameraImage;    // copy camera input to opencv mat  
    }  
}  
      
int main( int argc, char** argv )  
{  
    int ende=0;  
    std::thread tcam;    // create thread pointer  
     
    cv::Mat colorMap=cv::Mat(256,1,CV_8UC3);  // define enet color map
    colorMap.setTo(0);
    colorMap.at<cv::Vec3b>(0,0)=cv::Vec3b(  0,  0,  0);
    colorMap.at<cv::Vec3b>(1,0)=cv::Vec3b( 81,  0, 81);
    colorMap.at<cv::Vec3b>(2,0)=cv::Vec3b(244, 35,232);
    colorMap.at<cv::Vec3b>(3,0)=cv::Vec3b( 70, 70, 70);
    colorMap.at<cv::Vec3b>(4,0)=cv::Vec3b(102,102,156);
    colorMap.at<cv::Vec3b>(5,0)=cv::Vec3b(190,153,153);
    colorMap.at<cv::Vec3b>(6,0)=cv::Vec3b(153,153,153);
    colorMap.at<cv::Vec3b>(7,0)=cv::Vec3b(250,170, 30);
    colorMap.at<cv::Vec3b>(8,0)=cv::Vec3b(220,220,  0);
    colorMap.at<cv::Vec3b>(9,0)=cv::Vec3b(107,142, 35);
    colorMap.at<cv::Vec3b>(10,0)=cv::Vec3b(152,251,152);
    colorMap.at<cv::Vec3b>(11,0)=cv::Vec3b( 70,130,180);
    colorMap.at<cv::Vec3b>(12,0)=cv::Vec3b(220, 20, 60);
    colorMap.at<cv::Vec3b>(13,0)=cv::Vec3b(  0,  0,142);
    colorMap.at<cv::Vec3b>(14,0)=cv::Vec3b(  0,  0, 70);
    colorMap.at<cv::Vec3b>(15,0)=cv::Vec3b(  0, 60,100);
    colorMap.at<cv::Vec3b>(16,0)=cv::Vec3b(  0, 80,100);
    colorMap.at<cv::Vec3b>(17,0)=cv::Vec3b(  0,  0,230);
    colorMap.at<cv::Vec3b>(18,0)=cv::Vec3b(119, 11, 32);
    colorMap.at<cv::Vec3b>(19,0)=cv::Vec3b(255,255,255);
    
    std::cout << "OpenCV version : " << CV_VERSION << std::endl;    // print opencv version for debug  
     
    std::string model = "enet-model.net";    // define filenames for neural network  
    std::string proto = "";  
      
    cv::dnn::Net net = cv::dnn::readNet(model, proto); // open net  
          
    if (net.empty())  
    {  
        std::cerr << "Can't load network by using the following files: " << std::endl;  
        std::cerr << "proto: " << proto << std::endl;  
        std::cerr << "model: " << model << std::endl;  
        return -1;  
    }

// uncomment if camera is used
/*    cap.open(0);        // open camera  
    if(!cap.isOpened())   
    {  
        std::cout << "no camera found!" << std::endl;  
        return -1;  
    }  
    cap >> cameraImage;    // copy camera input to opencv mat to get data to startup  
    tcam=std::thread(cameraThread);    // start extra thread to get camera input  
*/      
// used static image instead of camera    
    cameraImage= cv::imread("test.jpg");
    cv::resize(cameraImage, cameraImage, cv::Size(), 0.2, 0.2, cv::INTER_LINEAR);  // resize to useful size
    std::cout << "starting ..." << std::endl;  
      
    while(ende==0)  
    {  
        cv::Mat image;  
        cameraImage.copyTo(image);    // copy camera image to have local copy for modifications  
        cv::Mat inputBlob = cv::dnn::blobFromImage(image, 1.0f/255.0, cv::Size(512, 512),cv::Scalar(0), true);   //Convert Mat to batch of images  
        cv::TickMeter t;  
        net.setInput(inputBlob); //set the network input  
        t.start();  // start timer  
        cv::Mat result=net.forward();    // computer output  
        t.stop();  // stop timer  

 std::cout << "Out shape:" << result.size[0] << " x " << result.size[1] << " x " << result.size[2] << " x " << result.size[3] << "\n";  // print size of output shape
        std::cout << "Time: " << (double)t.getTimeMilli() / t.getCounter() << " ms" << std::endl;  // print result of timer  
        cv::Mat outSmall(result.size[2],result.size[3],CV_8UC1);  // genrate Mat for output image
        outSmall.setTo(0);  // reset image to 0

        for(int i=0;i<result.size[2];i++) // go through all rows
        {
            for(int j=0;j<result.size[3];j++)  // go through all cols
            {
                float maxv=-9999;  // set standard value
                for(int k=0;k<result.size[1];k++)  // go through all channels
                {
                    float maxa=(float) *(result.ptr<float>(0,k,i)+j);
                    if(maxa>maxv)  // check if result of this channel is higher
                    {
                        maxv=maxa;
                        outSmall.at<uchar>(i,j)=(uchar)k;  // set highest channel at this pixel
                    }
                }
            }
        }
        cv::Mat combined;
        cv::Mat outColor;
        cv::applyColorMap(outSmall,outColor,colorMap);  // generate colored output with colormap

        cv::resize(outColor,outColor,image.size());  //resize output to input size
        cv::addWeighted(image,0.5,outColor,0.5,0.0,combined);  // generate blended image
    
        cv::imshow("out",outColor);    // show image  
        cv::imshow("combined",combined);    // show image  
      //  cv::imshow("image",image);    // show image  
        char key=cv::waitKey(1);    // check if end  
        if(key=='e') ende=1;  
    }  
    return 0;  
}  

nhjnjk

 

The code is not ery advanced but it will do the benchmarking job.

 

The former code used real input from the Raspberry Pi camera but for this benchmark I switched to a static image I took out side. It was taken in a park with a street and some trees and persons in it. For the benchmark I thought it would be too much effort to go to a suitable place.

 

This screenshot shows the output of the network blended with the image.

image

And this is the raw output of the network:

image

Now to the results:

The code of the blog post uses an input resulotion of 1024 x 512 pixels for the network. This led to a processing time of 2008 ms. This is a little bit too long for robots. So i reduced the resolution to 512 x 512 pixels. This gave me 893 ms which is still a little bit long but one can work with it. Further reduction of the resolution led to unusable output of the network. So not recommended.

 

Board with OpenCV 4.1.2time (ms)
Raspberry Pi 4 Model B with 512 x 512 pixel893
Raspberry Pi 4 Model B with 1024 x 512 pixel2008

 

I also tested this network on different Raspberry Pi generations with a resolution of 512 x 512 pixels:

 

Board with OpenCV 4.1.2
time (ms)compared to Raspberry Pi 4
Raspberry Pi 2 Rev 1.13415382 %
Raspberry Pi 32059231 %
Raspberry Pi 3 Model B+1859208 %
Raspberry Pi 4 Model B893100 %

 

In this test the Raspberry Pi 4 is also more than twice as fast as its predecessor Raspberry Pi 3 Model B+. Although in this test the advantage is not as big as in the previous test, especially compared to the Raspberry Pi 2.

 

Temperatures

 

During network execution I measured the following temperatures (The internal temperature was measured with the command "vcgencmd measure_temp", the external temperature was measured with a hand held pyrometer):

 

NetworkInternal CPU temperature (°C)
measured temperature (°C)
time (ms)CPU load
ambient temperature (°C)
GoogLeNet just after start353232295 %19
GoogLeNet after 1 hour827235995 %19
ENet just after start353389175 %19
ENet after 1 hour696288975 %19

 

One interesting point is that GoogLeNet generates 95 % CPU load and ENet only 75 %. I don't know why this is the case and will have to make further investigations. Accordingly the temperature with GoogLeNet gets higher.

Nevertheless the temperature seems to have no or only a small (GoogLeNet after 1 hour) impact on the execution time. So from this point a  heat spreader is not necessary.

Another interesting point is that the internal temperature is about 10 degrees higher than the externally measured. This maybe related to measurement errors to some part. Or maybe the thermal resistance of case and lid of the CPU are very high. This would reduce the effect of a heat spreader.

 

However I think a heat spreader is a good investment for the Raspberry Pi, especially when it is running at high CPU loads for a long time. It helps to keep the temperatures low and reduce temperature stress on the whole device.

Anonymous