<< Previous |
Questions about risk management
In the comments to my last week blog post mcb1 and DAB brought to my attention importance of exception handling.
The following questions needs to be answered:
- What happens when communication from a detector not received in a preset time?
- How to verify connections and operating of sensors?
- What happens when there is more then one detector in the same area and one of them stops reporting?
- What happens when the only detector is in 'maintenance' or removed?
Make it safer
I decided to revisit the original design to address the first and the second question. At least I'll know when I'm at risk.
I've added two new error processing components. One is internal Error Processing deployed on Intel Edison. It's main responsibility is to communicate errors from other processing components and monitor detectors disconnects. In addition, it will establish a heartbeat communication with External Monitoring. In case internal Error Processing fails, the External Monitoring will detect loss of communication and will send alert ("deadman switch").
External Monitoring
On the role of External Monitoring I'm considering a Cronitor service. It has capability to detect disconnection, has integration with Slack. This integration with Slack channel requires only configuration changes.
Cronitor service has several options. I've chosen Free Cronitor option. This service is free for a single monitor, but can be extended if required.
The code for the hearbeat is pretty straightforward:
// Heartbeats API setup var heartbeats = require('heartbeats'); // a heart that beats every 5 minutes. var heart = heartbeats.createHeart(5*60*1000); var request = require('request'); heart.createEvent(1, function(count, last){ var ping = 'https://cronitor.link/'+process.env.CRONITOR_MONITOR+'/complete?auth_key='+process.env.CRONITOR_AUTH_TOKEN; request(ping, function (error, response, body) { if (!error && response.statusCode == 200) { console.log(body); } }); });
When the monitor goes done, then Cronitor communicates alert to Slack and sends another message when heartbeat recovers:
I still need to add functionality to Error Processing component. But it will be covered next week. Stay tuned.
Top Comments