Post-Mortem: ridley.fastlizard4.org downtime due to hypervisor bug

Did you know that servers are capable of detecting when a sysadmin wants to get a decent night of sleep for once?

One of LizardNet’s servers, ridley.fastlizard4.org, experienced several unscheduled reboots, with the first occurring around 12:27 UTC today (Saturday 29 July 2017).  The unscheduled reboots were followed by an extended downtime while the server was migrated to a stable host server.  Ridley was back up and operating normally by 14:20 UTC.  The first reboot was caused by a bug in the hypervisor software on the old host server, and the subsequent reboots were an attempt to diagnose some strange performance issues that surfaced in the aftermath.  The extended downtime was due to the migration of ridley to a host server where the hypervisor bug was patched.  This should resolve the unexpected reboots, and I believe it will also resolve the performance problems that were observed after the first unexpected reboot.  Everything should be back to normal now.  Thank you for your patience, and many thanks to Linode’s excellent support team for their assistance in resolving this.

The following is a partial list of services that were unavailable during this unexpected downtime:

  • LizardWiki
  • Ladies On Two Wheels forums
  • Star Trek Games wiki
  • Wikitroid Skintest
  • LizardNet Code Review (Gerrit)
  • LizardNet Code Explorer (Gitblit)
  • LizardVPN
  • LizardNet Minecraft servers s1, c1, and c2
  • LizardNet’s Teamspeak3 server
  • LizardIRC server diamond.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on ridley.fastlizard4.org
Advertisements
Post-Mortem: ridley.fastlizard4.org downtime due to hypervisor bug