Summary:
On Wednesday, October 23rd at 3:01 PM MST, a system status check failure occurred in a core server for handling remote desktop connections into the OnDemand OBeer environment. The system reboot completed and the server became fully available at 3:45 PM, after which connections resumed, and the issue was confirmed as resolved by 3:50 PM.
Timeline:
Root Cause Analysis:
The primary cause was an unexpected system check failure in AWS. This means that AWS owned hardware that was hosting our VMs had an issue, which necessitated an immediate restart so AWS could change the hardware used for running our server. Once the restart was initiated and completed, service was restored