Resolved -
As of Sunday, March 29 at 15:08 CET, the incident has been resolved. All affected customers who raised an incident ticket during the outage will automatically receive a Root Cause Analysis (RCA) report.
We will contact these customers by email from IoT Problem Management with the expected timeline for delivery of the report.
Kind Regards
KPN IoT
Mar 30, 07:18 CEST
Identified -
The root cause of the event that started around 02:00–03:00 CET (winter-to-summer time change) is still under investigation.
What we do see is that a very high number of devices started reconnecting simultaneously. This created an exceptional load on the iNAT environment (since the switch from winter to summer time), exceeding the capacity it can handle. As a result, both new connections and existing connections that attempt to re-establish or set up a new session may be impacted.
To restore service, we have initiated a controlled, step-by-step recovery process to relieve the iNAT environment from the overload and gradually bring traffic back to normal levels. This will allow iNAT to process both new and existing traffic successfully again.
At this moment, we estimate this recovery process will take approximately 2 hours.
The next business call will take place at 15:30 CET, during which our engineers will provide the latest status and outlook. Based on that update, we will inform you again at 16:00 CET.
Mar 29, 14:03 CEST
Investigating -
During the transition from winter time to summer time (02:00–03:00), an issue occurred within the KPN domain. Customers whose APN uses an iNAT solution are currently unable to establish new connection and partialy on existing connections.
This morning, the impact increased to such an extent that KPN initiated the high-priority incident process. Several analyses have been performed, and we have attempted to mitigate the impact through failovers and by isolating nodes. So far, this has not yet led to a solution.
We have been able to pinpoint the domain where the issue occurs to the interaction between the iNAT master and worker environment.
KPN engineers have engaged the technology partner (vendor) responsible for this environment. Together, we are now performing deep dives to determine the root cause.
The next business call regarding status and progress will take place at 13:30 CET.
After that, we will provide IoT customers with a new update at 14:30 CET.
We apologize for this disruption and the impact on your organization and customers.
KPN IoT
Mar 29, 12:35 CEST