Hello,
I'm working on a fleet control project on a very tight schedule (which I didn't set). I'm the primary (probably the only) developer, and the project should be completed in around 3 months. The system is based on an arduino due, a simcom 5360 (3G + GPS), accelerometers, odb2 interface and bluetooth low energy, all of which would be integrated into a single PCB by the hardware guy. I don't think I will have any problem at using these peripherals, but what I'm really wondering is what would be the best practices to make the system as robust as possible so that when its delivered it doesn't fail. A crash and reset, wouldn't be a disaster, but having the hardware to fail and not do what is supposed to do (process and send telemetry) would be a major disaster. So my question is, what are some good practices/recommendations to make the system as robust as possible so that once its delivered it will keep working for months?
As of now what I've been doing is to code as much as possible in the PC, because its faster to compile and easier to debug. My plan is to create wrappers for some Arduino functions to be able to test as much as possible code on my PC. I'm doing exhaustive unit tests to all functions and considering corrupt serial data (I wouldn't like garbage to cause a hang or crash). I also would like to do some code coverage, but I need to find the right tools to do it, as visual studio community 2017 doesn't support it. I plan to program a server that will simulate different situations, including different network conditions to test if the client performs as it should. I also plan to use an ODB2 simulator to test at home different conditions. A watch dog is going to be used to make sure the loop is properly looping. And that is pretty much my current approach to make the system robust.
One thing I'm not completely sure, is what are the best ways to perform field tests. Ideally I would like to minimise them, as they are expensive and time consuming. What are some good practices to make the most out of them? If something fails in the field I would like to be able to track it to the source of the issue, as opposed to end up wondering what caused it and repeating field test over and over on different conditions.
An alternative solution to trying to make the system bug-free, could be to implement OTA updates, which on the espressif mcus is pretty straightforward, but here I'm not sure how I could do it. Any ideas?
Also any suggestions and comments would be gladly welcomed...
Thanks