Color Classification on ESP32Cam

14 Apr 2020

Introduction

In machine vision systems, image processing techniques can be used for color measurement, color matching, and color-based recognition.

For decades, companies have been using vision systems for inspection purposes when their products are of high value or have potential liabilities if associated with a defect. Now vision inspection is becoming more affordable for companies that are interested in producing quality products, even if they are neither of high value nor pose a potential risk if defective. Examples include hardware, such as nails and screws, corrugated cardboard and plastic containers.

Essentials of Machine Vision have been summarized for a quick learning in this aspect.

Line Scan vs Area Scan

The type of camera an end-user needs, whether it is a line scan or area scan, will depend on the application. Most people understand the technology behind an area scan camera, which is akin to a high-resolution digital camera. Area scan cameras produce 2D images with horizontal and vertical elements.

Line scan cameras are more complex than their counterparts, they basically work by looking at one line, and then build a 2D image by moving the object relative to the camera while continually grabbing one-line slices. If the object cannot be contained in a practical-size field of view, then one should consider a line scan camera for his application. The most typical line scan camera use is for applications that involve the inspection of a “web,” such as large rolls of paper, fabric, aluminum, steel, or sheets of glass.

Monochrome vs Color

Monochrome cameras are mostly a better choice if the application does not require a color analysis. Because they don’t need a color filter, they are more sensitive than color cameras and deliver more detailed images. Most of the color machine vision cameras use the Bayer matrix to capture color data. Each pixel has a color filter, half of them green and a quarter red and blue each. The debayering algorithm uses the information from adjoining pixels to determine the color of each pixel. So a 2×2 debayering reads the information from three adjoining pixels and 5×5 debayering reads the information from 24 adjoining pixels. So if you need a color camera, the bigger the debayering number, the better.

Frame Rate

Frame rate is the number of images that the sensor can capture and transmit per second. The human brain detects approximately 14 to 16 images per second; the frame rate of a movie is usually 24 fps. For fast-moving applications like inspections of newspapers the camera needs to “shoot” in milliseconds. On the other end there are microscopic applications which require low frame rates comparable to the ones of the human eye.

Environment

Many vision systems are installed in sites that are quite warm. Cameras are composed of sensitive electronic equipment and are impacted by temperature. Usually, if the environment exceeds the specifications, it doesn’t mean the camera won’t operate. Instead, it means the camera’s performance may diminish as the bounds of these specifications are exceeded. The most common way that performance diminishes in a hot environment is that the camera will become “noisier,” meaning there are unintended brightness variations in the image. Also, exceeding the temperature specifications usually has a negative impact on the life of the camera.

Light wavelength

Light wavelength is not an issue of concern for every application. Some applications, though, will require the camera to be sensitive to certain wavelengths of light. For example, with currency inspection, there may be security features that are only visible in UV or infrared light. In this case, be sure the camera has enough sensitivity at that wavelength to execute a successful inspection. There will also be a need for a light source with corresponding wavelength properties in order to allow the camera to visualize the feature in question. For a majority of applications, white light, or broadband light, is used, and any standard camera should suffice. If the inspection involves any special lighting, be sure that the camera choice is appropriate for the light wavelength required.

RGB or CMYK?

The color systems used by scientists and artists are entirely different. An artist will mix blue and yellow paint to get a shade of green; a scientist will mix green and red light to create yellow. The printed page in a magazine is yet another system. It's important to define the two different kinds of color that we see in the world as the first step in understanding color systems. First, there's the color you can touch, such as the skin of an apple or a painted wall. These colors are part of the surface of an object. Next, there's the color you can't touch, such as a beam of red light and the colors produced by your computer monitor. Colors generated by light are part of one color system. The tangible colors which are on the surface of objects or on the printed page are another color system.

Hence, in this blog I have tried to demonstrate Color Classification and Recognition on ESP32Cam; a 32-bit popular WiFi-BT MCU with integrated camera and costing <= 5 USD.

An ESP32Cam is a 32-bit MCU developed by Espressif Systems and Ai-Thinker, with an in-built 5KB SRAM, can be clocked upto 160MHz and comes pre-equipped with OV2640 camera with image output formats of JPEG, BMP, GRAYSCALE. A color-model based on individual pixel identification, is used to train the hardware. There are methods where a still image is shot and stored as RGB pixel array value and post-processing is performed for further classification. Yet I have tried to use continuous frame capture from camera stream by storing individual bits while performing color format conversion per frame and further down-sampling and extracting image feature vector. Though this approach suffers from noise due to ambient light-conditions. Since, there are numerous parameters that influence the accuracy of color-model, like stray movement of camera, false-labeling of ROI, direct light exposure on sensor and unavailability of color filter on sensor module. Still, the concept works and can be improved with detailed training-dataset.

Hence, on classification between Red or Blue, the ESP32Cam programs the pixel value obtained from approximate feature-vector into WS2812b tile, that changes the color of LEDs accordingly. In addition, an ALS(VCNL4040) has been used all-together with ESP32 that aids in getting ambient brightness of ROI, and performs 'adaptive-brightness' on 7x6 WS2812b tile.

Stuff you'll need..

- ESP32Cam from AiThinker (works with other variants too)

- CP2102 Converter for programming the ESP32Cam (FTDI/Prolific..whatever)

- Sparkfun VCNL4040 ALS

- WS2812b LEDs

- An Element14 Breadboard(must have) & some jumper wires.

Connection flow diagram:

Code fragments for color model:

// ESP32Camera PIO Setup(AiThinker)
bool setup_camera(framesize_t frameSize) {
    camera_config_t config;
    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    config.pin_d1 = Y3_GPIO_NUM;
    config.pin_d2 = Y4_GPIO_NUM;
    config.pin_d3 = Y5_GPIO_NUM;
    config.pin_d4 = Y6_GPIO_NUM;
    config.pin_d5 = Y7_GPIO_NUM;
    config.pin_d6 = Y8_GPIO_NUM;
    config.pin_d7 = Y9_GPIO_NUM;
    config.pin_xclk = XCLK_GPIO_NUM;
    config.pin_pclk = PCLK_GPIO_NUM;
    config.pin_vsync = VSYNC_GPIO_NUM;
    config.pin_href = HREF_GPIO_NUM;
    config.pin_sscb_sda = SIOD_GPIO_NUM;
    config.pin_sscb_scl = SIOC_GPIO_NUM;
    config.pin_pwdn = PWDN_GPIO_NUM;
    config.pin_reset = RESET_GPIO_NUM;
    config.xclk_freq_hz = 20000000;
    config.pixel_format = PIXFORMAT_RGB565;
    config.frame_size = frameSize;
    config.jpeg_quality = 12;
    config.fb_count = 1;
    bool ok = esp_camera_init(&config) == ESP_OK;
    sensor_t *sensor = esp_camera_sensor_get();
    sensor->set_framesize(sensor, frameSize);
    return ok;
}

After initiating the stream with 160x120-framesize and at an average of 25.6fps, the RGB565 bitstream is converted to RGB888.

// Converting RGB565 stream to RGB888_format
void convert_to_rgb(uint8_t *buf, size_t len) {
    for (int y = 0; y < H; y++) {
        for (int x = 0; x < W; x++) {
            rgb_frame[y][x][0] = 0;
            rgb_frame[y][x][1] = 0;
            rgb_frame[y][x][2] = 0;
        }
    }
    for (size_t i = 0; i < len; i += 2) {
        const uint8_t high = buf[i];
        const uint8_t low  = buf[i+1];
        const uint16_t pixel = (high << 8) | low;
        const uint8_t r = (pixel & 0b1111100000000000) >> 11;
        const uint8_t g = (pixel & 0b0000011111100000) >> 6;
        const uint8_t b = (pixel & 0b0000000000011111);
        const size_t j = i / 2;
        const uint16_t x = j % WIDTH;
        const uint16_t y = floor(j / WIDTH);
        const uint8_t block_x = floor(x / BLOCK_SIZE);
        const uint8_t block_y = floor(y / BLOCK_SIZE);
        rgb_frame[block_y][block_x][0] += r;
        rgb_frame[block_y][block_x][1] += g;
        rgb_frame[block_y][block_x][2] += b;
    }
}

After, averaging out the pixels from the stream, down-sampling is performed for simplifying the classification, and prevents program crash on this little board, hence the frame is compared to the pre-trained dataset and performs classification based on real-time frame values.

// Capturing image and perform down-sampling
bool capture_still() {
    camera_fb_t *frame = esp_camera_fb_get();
    if (!frame)
        return false;
    convert_to_rgb(frame->buf, frame->len);
    return true;
}
 // Convert image to features vector
void linearize_features() {
    size_t i = 0;
    for (int y = 0; y < H; y++) {
        for (int x = 0; x < W; x++) {
            features[i++] = rgb_frame[y][x][0];
            features[i++] = rgb_frame[y][x][1];
            features[i++] = rgb_frame[y][x][2];
        }
    }
}

Demonstration:

Conclusion: For a limited dataset, the classification between Red and Blue achieved a score of 93%, and the accuracy increases during the dim light conditions/ pitch dark. While, the present model lacks classification of green color, in short there is still a plenty of room for exploring computer-vision on low-cost microcontrollers and initiatives like TinyML further pave path for further development. In coming days, I would like to integrate dynamic data with WS2812b for some serious LiFi Communication.

_________________________________________________________________________________________________

When a radio wave passes across a metal object the EM fields cause the charged electrons in the metal to oscillate and cause small AC currents at the same frequency to be induced into the metal. If a microwave source(mobile) is brought near to the loop and a call or text is made, the radio waves emitted from the phone, pass across the loop, thus inducing a voltage into the antenna (the loop) and if it is close enough will be large enough to light a small LED. Since the loop is about one wavelength in size it is resonant and so there is a good transfer of power (low reactance) between the radio wave and LED. The mobile phone automatically tests the network and adjusts its transmission power to maximize the battery life and minimize the network interference. As a result the brightness of LED will depend on the data being sent (the average signal), the local signal strength and how close the loop is to the phone.

Is Germanium Diode needed at all? ..since the LED is itself a (Light Emitting) Diode after all and one would not think that another diode would help. However, the initial prototypes failed, while saving on some cents... The LED will have a relatively high capacitance which at these frequencies will tend to de-tune the loop and short out the LED. The germanium diode however is made up of a tiny wire which only makes a point-of-contact onto a piece of semiconducting germanium so it's 'self' capacitance is very low keeping the loop resonant. The germanium diode will rectify the AC signal from the loop forming a series of DC pulses that will be nicely smoothed by the LED's capacitance. Without the diode however the raw AC signal from the loop will tend to be averaged to zero by the LED's capacitance.

Mobiles make use of various bands of radio frequencies to communicate between the mobile to base and the base to mobile: in Europe these include 900 and 1800 MHz (850 and 1900 MHz in the USA, Canada and likewise in Asia)

The relationship between wavelength, speed of light and the frequency follows the well known formula:
Wavelength λ(m) = speed / frequency = c(m/s) / ν(Hz)
λ(m) = 300,000,000 / ν(Hz) or approximately:
λ(m) = 300 / ν(MHz)
So for a mid-range of about 1000 MHz (1 GHz) we get a typical mobile phone wavelength of about:
λ = 300/ 1000 = 0.3 m = 30 cm.

The loop consists of about a wavelength of wire, approx. 30 cm so each side is about 30/4 = 7.5 cm. The two ends are connected in series circuit, consisting of an LED(red) and a germanium diode. The loop is made from a piece of copper wire bent into a square(or any shape). If the wire is insulated remember to scrap off the insulation and solder-tin the ends. Simply solder the germanium diode and LED.

Works on GSM (2G). Almost everyone prefers LTE these days, inclusive of 5G-NR. But, one needs to switch his cell to 2G_Only Network. Some modern phones do use higher frequencies, less power and use the power in a slightly different way (e.g. spread spectrum). Hence, some smart phones do/don't work and success may be due to the signal strength of the local mobile phone mast nearby.

Top Comments

Lorenz 6 months ago

I am trying to partly replicate the project. Could you maybe post your entire Code or did you already do that? I especially am unsure what the values for the constants WIDTH and BLOCK_SIZE are in your funciton that converts the rgb565 buffer to the rgb888 array.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
dubbie over 4 years ago in reply to ankur608

Ankur,

This is the first 'colour picker' that I have seen so it is not something I am that familiar with.

Dubbie
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
ankur608 over 4 years ago in reply to dubbie

Thanks Dubbie, as this project is based on ESP32Cam that is trained to recognise color values(R and B ) and ambient light levels through sensor, thus overwriting the neopixels with approx. pixel value of the detected color. It is yet another fancy color picker based on computer-vision.

-Ankur
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
dubbie over 4 years ago

I liked the LED array. I'm sure it must be my fault but I wasn't entirely sure what this was doing. Cn you clarify a little?

Dubbie
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel