Video Recording
I've recorded my bicycle ride from the tail camera perspective near the sunset on a nice spring evening. I've used the components and the app I've described in my previous blog CycleSafe - #3 - The Preparations for the Road Test and the Data Collection The length of the record I've published is 8 minutes as the other 12 minutes don't have seeing of cars, so I've cut them.
What is YOLOv8
YOLO v8 is a recent version of the popular YOLO (You Only Look Once) object detection algorithm, released by Ultralytics in 2022. It introduces several improvements and new features over previous versions, making it a powerful tool for real-time object detection and instance segmentation tasks.
Key Features of YOLO v8
- Instance Segmentation: In addition to object detection, YOLO v8 can perform instance segmentation, which means it can identify and segment individual objects within an image, providing pixel-level masks for each instance.
- Improved Accuracy: YOLO v8 incorporates new techniques and architectural changes that enhance its accuracy in detecting and localizing objects, especially small objects, compared to previous versions.
- Faster Inference Speed: YOLO v8 has been optimized for faster inference, making it suitable for real-time applications that require high processing speeds, such as surveillance systems and autonomous vehicles.
- New Loss Function: YOLO v8 utilizes a new loss function called "focal loss," which helps improve the detection of small objects by down-weighting well-classified examples and focusing on hard-to-detect objects.
- Higher Resolution: YOLO v8 processes images at a higher resolution (608x608 pixels) compared to previous versions, allowing for better detection of smaller objects and improved overall accuracy.
- Trainable Bag-of-Freebies: YOLO v8 introduces a "trainable bag-of-freebies" technique, which involves training the model with various data augmentation and regularization techniques to improve its performance further.
I've selected it based on its accuracy for real-time object detection capabilities and built-in capability to classify cars, trucks, and bicycles.
YOLOv8 has several models. The smallest one is nano. It is suitable to run on constrained devices with limited computing power.
Video Processing
I wrote a Python app using the OpenCV and YOLOv8 nano libraries to process the recorded video.
import cv2 import datetime from ultralytics import YOLO import supervision as sv # Load the YOLOv8n model model = YOLO('yolov8n.pt') model.info() def get_current_datetime_string(): """ Returns the current date and time as a string in the format "YYYYMMDDHHMMSS". """ now = datetime.datetime.now() return now.strftime("%Y%m%d%H%M%S") input_video_path = 'tail-video.mp4' cap = cv2.VideoCapture(input_video_path) if not cap.isOpened(): print("Error: Could not open video Source.") exit() # rame rate or frames per second fps = 30 # Width and height of the frames in the video stream size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) fourcc = cv2.VideoWriter_fourcc(*'mp4v') videoWriter = cv2.VideoWriter('/var/lib/tail/tail-detection'+get_current_datetime_string()+'.mp4', fourcc, fps, size) success, frame = cap.read() count=30 # loop until there are no more frames and variable > 0 while success: success, frame = cap.read() if not success: print("Failed to grab the frame from the video source, Trying again...") if (count < fps): count = count+1 else: count=0 # Perform object detection https://docs.ultralytics.com/modes/predict/#inference-sources results = model.predict(frame, classes=[2, 3, 5, 7], conf=0.25) # Class IDs for car, motorcycle, bus, truck imgsz=frame.shape, # Visualize the results on the frame annotated_frame = results[0].plot() videoWriter.write(annotated_frame) cap.release() videoWriter.release() cv2.destroyAllWindows()
The record was processed using the YOLO v8 nano model. The processing was taking almost 2 seconds per 1 frame on RPi4.
And it was using only 1 of its 4 CPU cores.
I've decided to process only 1 frame for a second of the original recording. So it appears as a fast movie replay at 30x speed. I've uploaded the video to Youtube, but I found its resolution not great.
Some math
In my estimation, a cyclist needs at least 2 seconds to react to a potential danger. If a car runs at 100 km/h it can travel ~28 m in a second. My app needs to process at least two frames to measure the distance and speed of the car relative to the cyclist. It will take ~4 seconds with YOLOv8 nano without further optimization. So it needs to be able to detect car 6 seconds in advance, which gives a distance of 168 meters. It is not realistic to expect based on my current setup. If a car is approaching at 50 km/h then the distance to detect it goes down to 84 meters, which is a more realistic scenario. But then it will reduce the usefulness of the solution.
Alternatives
Use a higher camera resolution so the app can detect cars at a longer distance, but it may result in many false positives which makes the solution useless.
Find a way to use more CPU cores for video processing.
Use a different algorithm (FOMO, other versions of YOLO).
Crop the frame to reduce the processing time.