CycleSafe - #4 - The Road Test and Object Detections with YOLOv8

9 Jun 2024

Video Recording

I've recorded my bicycle ride from the tail camera perspective near the sunset on a nice spring evening. I've used the components and the app I've described in my previous blog CycleSafe - #3 - The Preparations for the Road Test and the Data Collection The length of the record I've published is 8 minutes as the other 12 minutes don't have seeing of cars, so I've cut them.

What is YOLOv8

YOLO v8 is a recent version of the popular YOLO (You Only Look Once) object detection algorithm, released by Ultralytics in 2022. It introduces several improvements and new features over previous versions, making it a powerful tool for real-time object detection and instance segmentation tasks.

Key Features of YOLO v8

Instance Segmentation: In addition to object detection, YOLO v8 can perform instance segmentation, which means it can identify and segment individual objects within an image, providing pixel-level masks for each instance.
Improved Accuracy: YOLO v8 incorporates new techniques and architectural changes that enhance its accuracy in detecting and localizing objects, especially small objects, compared to previous versions.
Faster Inference Speed: YOLO v8 has been optimized for faster inference, making it suitable for real-time applications that require high processing speeds, such as surveillance systems and autonomous vehicles.
New Loss Function: YOLO v8 utilizes a new loss function called "focal loss," which helps improve the detection of small objects by down-weighting well-classified examples and focusing on hard-to-detect objects.
Higher Resolution: YOLO v8 processes images at a higher resolution (608x608 pixels) compared to previous versions, allowing for better detection of smaller objects and improved overall accuracy.
Trainable Bag-of-Freebies: YOLO v8 introduces a "trainable bag-of-freebies" technique, which involves training the model with various data augmentation and regularization techniques to improve its performance further.

I've selected it based on its accuracy for real-time object detection capabilities and built-in capability to classify cars, trucks, and bicycles.

YOLOv8 has several models. The smallest one is nano. It is suitable to run on constrained devices with limited computing power.

Video Processing

I wrote a Python app using the OpenCV and YOLOv8 nano libraries to process the recorded video.

import cv2
import datetime
from ultralytics import YOLO
import supervision as sv

# Load the YOLOv8n model
model = YOLO('yolov8n.pt')

model.info()

def get_current_datetime_string():
    """
    Returns the current date and time as a string in the format "YYYYMMDDHHMMSS".
    """
    now = datetime.datetime.now()
    return now.strftime("%Y%m%d%H%M%S")

input_video_path = 'tail-video.mp4'

cap = cv2.VideoCapture(input_video_path)

if not cap.isOpened():
    print("Error: Could not open video Source.")
    exit()

# rame rate or frames per second
fps = 30
 
# Width and height of the frames in the video stream

size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
videoWriter = cv2.VideoWriter('/var/lib/tail/tail-detection'+get_current_datetime_string()+'.mp4', 
    fourcc, fps, size)
 
success, frame = cap.read()
count=30
# loop until there are no more frames and variable > 0
while success:
    success, frame = cap.read()
    if not success:
        print("Failed to grab the frame from the video source, Trying again...")
    if (count < fps):
        count = count+1
    else:
        count=0

        # Perform object detection https://docs.ultralytics.com/modes/predict/#inference-sources
        results = model.predict(frame, classes=[2, 3, 5, 7], conf=0.25)  # Class IDs for car, motorcycle, bus, truck imgsz=frame.shape, 

        # Visualize the results on the frame
        annotated_frame = results[0].plot()
        videoWriter.write(annotated_frame)

cap.release()
videoWriter.release()
cv2.destroyAllWindows()

The record was processed using the YOLO v8 nano model. The processing was taking almost 2 seconds per 1 frame on RPi4.

And it was using only 1 of its 4 CPU cores.

I've decided to process only 1 frame for a second of the original recording. So it appears as a fast movie replay at 30x speed. I've uploaded the video to Youtube, but I found its resolution not great.

Some math

In my estimation, a cyclist needs at least 2 seconds to react to a potential danger. If a car runs at 100 km/h it can travel ~28 m in a second. My app needs to process at least two frames to measure the distance and speed of the car relative to the cyclist. It will take ~4 seconds with YOLOv8 nano without further optimization. So it needs to be able to detect car 6 seconds in advance, which gives a distance of 168 meters. It is not realistic to expect based on my current setup. If a car is approaching at 50 km/h then the distance to detect it goes down to 84 meters, which is a more realistic scenario. But then it will reduce the usefulness of the solution.

Alternatives

Use a higher camera resolution so the app can detect cars at a longer distance, but it may result in many false positives which makes the solution useless.

Find a way to use more CPU cores for video processing.

Use a different algorithm (FOMO, other versions of YOLO).

Crop the frame to reduce the processing time.

DAB over 1 year ago

Nice update.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel