Downtime Incident - 30/11/2019
From the period from 30/11/2019 -> 04/12/2019, the ev.energy app and smart charging service experienced a degradation in service which affected some user's ability to load the mobile application and the ability to schedule smart charging sessions. This degradation in service did not affect all users and did not prevent all smart charging sessions however, any impact on our user's ability to access the service is taken seriously and this report outlines our investigation and actions taken to prevent similar issues in future.
Timeline
28/11/19 11:00 - Update released to improve the accuracy of scheduling smart charging on Tesla cars.
30/11/19 09:00 - First incident of downtime alerted, identifies as a server running out of disk space due to log files. Log files removed and service restored.
03/12/19 06:00 - Second incident of downtime, restored by 09:20.
04/12/19 01:00 - Third incident of downtime, restored by 09:30.
04/12/19 12:00 - Fix deployed to the Tesla integration which resolved the root cause and all operation back to normal.
Root Cause
The root cause of the issue was a change made to the Telsa integration to keep our charging data up to date. However, this change introduced a bug that led to cars being updated far more frequently than planned. This manifested itself as a large increase in load across our infrastructure
Resolution
The error made in the Tesla integration was updated to ensure that the update rate was frequent enough to accurately report charging data without being unnecessarily frequent.
Actions taken to improve system and response
- Introduced an out-of-hours alerting system to alert engineering team of issues overnight.
- Improved monitoring of car and charge integrations which make it easier for the ev.energy team to diagnose issues.
Future action:
- Increase the un-coupling of the backend infrastructure so that an issue with one car/charger integration does not cause a downtime of the application
- Ensure there is a fallback to non-smart charging during time when ev.energy is not able to control the car.