How can I make an embedded system robust?

Hello,

I'm working on a fleet control project on a very tight schedule (which I didn't set). I'm the primary (probably the only) developer, and the project should be completed in around 3 months. The system is based on an arduino due, a simcom 5360 (3G + GPS), accelerometers, odb2 interface and bluetooth low energy, all of which would be integrated into a single PCB by the hardware guy. I don't think I will have any problem at using these peripherals, but what I'm really wondering is what would be the best practices to make the system as robust as possible so that when its delivered it doesn't fail. A crash and reset, wouldn't be a disaster, but having the hardware to fail and not do what is supposed to do (process and send telemetry) would be a major disaster. So my question is, what are some good practices/recommendations to make the system as robust as possible so that once its delivered it will keep working for months?

As of now what I've been doing is to code as much as possible in the PC, because its faster to compile and easier to debug. My plan is to create wrappers for some Arduino functions to be able to test as much as possible code on my PC. I'm doing exhaustive unit tests to all functions and considering corrupt serial data (I wouldn't like garbage to cause a hang or crash). I also would like to do some code coverage, but I need to find the right tools to do it, as visual studio community 2017 doesn't support it. I plan to program a server that will simulate different situations, including different network conditions to test if the client performs as it should. I also plan to use an ODB2 simulator to test at home different conditions. A watch dog is going to be used to make sure the loop is properly looping. And that is pretty much my current approach to make the system robust.

One thing I'm not completely sure, is what are the best ways to perform field tests. Ideally I would like to minimise them, as they are expensive and time consuming. What are some good practices to make the most out of them? If something fails in the field I would like to be able to track it to the source of the issue, as opposed to end up wondering what caused it and repeating field test over and over on different conditions.

An alternative solution to trying to make the system bug-free, could be to implement OTA updates, which on the espressif mcus is pretty straightforward, but here I'm not sure how I could do it. Any ideas?

Also any suggestions and comments would be gladly welcomed...

Thanks

Top Replies

Parents

Fred27 over 6 years ago

Well for a start, an Arduino is probably not really suitable. There are microcontrollers and peripherals that are appropriate for automotive and industrial environments - such as TI's Hercules series.
Cancel
Vote Up +3 Vote Down

Sign in to reply

Cancel
neuromodulator over 6 years ago in reply to Fred27

Was reading about the Hercules, I would think It's a bit of an overkill, being a CortexR with lots of safety mechanism, it also wouldn't make the system more robust unless other components are also equally robust, which I would think could increase the price quite a bit. Here, nobody would die if the system fails, and I would think a lower than 5% of failures per month of usage would be in the limit of whats acceptable. The Due is a CortexM3, why do you think such MCU isn't really suitable for the that task?
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel
Jan Cumps over 6 years ago in reply to neuromodulator

neuromodulator wrote:

Was reading about the Hercules, I would think It's a bit of an overkill, being a CortexR with lots of safety mechanism, it also wouldn't make the system more robust unless other components are also equally robust, which I would think could increase the price quite a bit. Here, nobody would die if the system fails, and I would think a lower than 5% of failures per month of usage would be in the limit of whats acceptable. The Due is a CortexM3, why do you think such MCU isn't really suitable for the that task?
If a failure causes major disaster and overheating causes instability, you may want to read about functional safety and safety controllers one more time. You will not get what you need with firmware. You need hardware support and a thourough risk analysis.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel
neuromodulator over 6 years ago in reply to Jan Cumps

No major disaster would occur, no life depends on the system. After checking the price of the Hercules I noticed their prices are quite affordable, but at this point of the development it probably would go to rev. 2 (if we get to that point).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Reply

neuromodulator over 6 years ago in reply to Jan Cumps

No major disaster would occur, no life depends on the system. After checking the price of the Hercules I noticed their prices are quite affordable, but at this point of the development it probably would go to rev. 2 (if we get to that point).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Children

No Data