In evaluating the HVC-P2, I'm going to focus on the feasibility of using it in a security monitoring application. As I began to think about what features that I might want in the application, I realized that I may be putting the cart before the horse, i.e. I'm using a paper spec to determine which features to implement without comprehending how those specs would translate into real life application performance (primarily because I don't know how those specs were tested).
As I gain familiarity with the hardware and software I've started to notice issues with accuracy and repeatability. I've attributed that to the fact that I'm using live images but realize that I really need to quantify how well this device works.
The application examples that Omron uses are vending and ticketing machines, POS terminals, digital signage, and interactive robots. In those applications the detection is occurring with close proximity and mostly static targets. They also use an occupancy sensor example but in that case less precision is required and less accuracy could be tolerated. The literature states a detection rate of up to 4 per second, but it indicates that the rate is dependent on the specific detection function (e.g. human body takes longer than face), distance (smaller image size takes longer), object stability, and number of objects.
In this blog post I'm going to outline which performance parameters that I'm going to test and and indicate what level of performance that I think that my application would require. In future posts I'll describe my testing methodology and present the test results.
It turns out that many of the parameters that I consider important are not explicitly specified but can be quantified under specific test conditions.
Here is my list of key parameters:
1) Accuracy
2) Repeatability
3) Acquisition Time
4) Acquisition Distance
5) Acquisition Window
These parameters would need to be measured for each of the various detection modes and the facial recognition mode. And in reality the first 3 parameters are a function of the last 2, so that would be how I structure my tests. I would also need to do the tests with and without the stability function enabled.
1) Accuracy is an interesting parameter to quantify as I don't think there are specific image standards relating to classification (I could be wrong here). So I'll use a bit of subjectivity in this case. I'll try to use clearly identifiable images and only expect correct detection of the major categories (body, face, and hand). I'll use the subcategories (age, gender, expression) to measure algorithm repeatability but won't score the classification.
2) Repeatability is a parameter that I'll only test on static images (i.e. not "live"). It will be interesting to see how this parameter degrades with increasing distance (decreasing image size).
3) Acquisition time varies with the detection modes that are enabled and it is not specified. In Omron literature the detection rate is listed at "up to four times per second".
4) Acquisition distance depends on whether you have the long distance (-10) or wide angle camera (-20)
Here is the maximum "accurate" detection distance specs for the -10 unit that I am testing:
One caveat here is that there are detection threshold settings that can limit the detection range. It is interesting to note the defaults that are set in the sample software (total image size is 1600x1200). Detection size is a square bounding box, so for example the minimum default detection size for Human Body is 30x30 and maximum is 8192x8192.
5) Acquisition window is partially a limitation of the camera lens and partially a limitation of the detection algorithm.
The camera limits the field of view:
And the algorithm limits the detectable image orientation:
The algorithm limits will be somewhat difficult to test (i.e. time consuming), so I think that Ill measure only the Roll angle since that is straightforward with a static image that is rotated. That will give me a sense of whether I get numbers that are consistent with their spec.
Performance requirements:
For my application I think that good accuracy and repeatability at the following distances would be adequate:
Human Body Detection - 10m
Face Detection - 2m
Face Recognition - 1m
So, well within the listed specs. I think repeatability is more important than accuracy since software can compensate for some mis-detection if it is repeatable. I would put the accuracy requirement at 75% and repeatability at 90% at these distances. I would want facial recognition to be 100% accurate (i.e. no false matches).
The stated acquisition time of 4 per second is adequate for detection, but in a security monitoring application it would require other camera(s) to be used for image capture.
Other specifications:
1) Power - the current spec is less than 400mA @ 5V. I haven't measured the peak current yet, but the average current is about 130mA. Since this unit cannot operate standalone (has no GPIO or networking), it is probably not a good candidate for battery operation.
2) Image output - the image output is provided primarily for setup and debug. The full resolution camera image (1600x1200) is used for detection but is not available as an output. The available output sizes are none/160x120/320x240. You can get a scaled image at 1/10 or 1/5 of the full image resolution.
3) Size - the main processing board is 45x45x8mm and the camera board is 25x25x8mm, compact enough to be used as a reasonably inconspicuous sensor.
4) Communication - USB UART, default rate 9600bps, max rate 921600bps. It would have been nice if the unit had WiFi and a Webserver like IP cameras. That would have made it a lot more attractive and possibly usable as a standalone unit.. They did have a WiFi enabled unit (HVC-C2W) that was packaged like an IP camera. I'm not sure if that is still available, but maybe there is hope for the next revision of this product.