All Servers: Ubuntu Distribution Upgrades

With the upcoming end-of-life for Ubuntu Linux 12.04 LTS, I will soon be upgrading the installed Ubuntu version on all LizardNet servers, which currently all run 12.04.

The plan is to perform a double upgrade on all three servers, first from 12.04 LTS (precise) to 14.04 LTS (trusty), then from 14.04 LTS to 16.04 LTS (xenial), which is the newest LTS (long-term-support) release of Ubuntu Linux available and which is anticipated to be supported until April 2021.[1]

Servers will be upgraded in sequence, starting with minecraft1, followed by phazon, then ridley.  Before each upgrade, a full backup of the system being upgraded will be taken to allow for a rollback in case something goes seriously wrong, and once all upgrades on a system are complete I will make a detailed analysis to ensure that everything is working as expected.

As with any OS upgrade, there will be major changes that may require reconfiguration of software and services, and these will only be compounded by the fact that I am upgrading two major releases in one go.  Among the changes that I know will be problematic is the significantly different configuration format for the Apache web server introduced in version 2.4 – upgrading the OS will also entail upgrading from Apache 2.2 to Apache 2.4 then updating the existing configuration to be compatible with the new version of Apache.  Certain software may also be broken in the upgrade process, especially older software.  Such changes will increase the time required to complete the upgrade beyond the time needed to simply perform the upgrade to the operating system components themselves.

Because of this, I am initially allocating a downtime period of 12 hours for each server.  The actual time the server and the services it hosts will be unavailable may and will probably vary; for example, during the process of downloading the new operating system files, certain parts of the server will continue operating, and the downtime period may be shorter if things go better than planned, or may be longer if unexpected difficulties arise.  As minecraft1 is the first server I plan to upgrade, I will use the upgrade experience on that server to inform my estimates for the remaining two servers, and will adjust the planned downtime periods accordingly if needed.

Because of the extended downtime required by OS upgrades, certain services on servers scheduled for downtime may be relocated to other servers during the scheduled downtime period to ensure their availability; these relocations, if any, will be detailed in each servers’ downtime notification.

Finally, because of the anticipated extended nature of the downtimes, the downtime periods are not set in stone and may change if necessary to ensure that I have enough time to devote to the upgrade task.  However, I will do my best to ensure a minimum of 24 hours notice before the start of a downtime, and that all changes occur at a minimum of 24 hours before the original start of the downtime.

Downtime Schedule

This post will be updated with the scheduled downtimes for all servers as they are scheduled, and will also be updated as the upgrades are complete.

Notes

  1. “Anticipated” because the actual end-of-life date has not yet been formally announced by Canonical; however, for Long-Term-Support releases, the EOL is generally five years from the month and year of release.
All Servers: Ubuntu Distribution Upgrades

Post-Mortem: ridley.fastlizard4.org downtime due to hard crash

Earlier today, at 23:00:02 UTC on Wednesday 23 November 2016, ridley.fastlizard4.org suffered a hard crash resulting in a brief unexpected downtime.  The server was automatically brought back up by monitoring systems, followed by me verifying that everything is still functioning normally.  All services should be restored to normal at this time.  I have not yet identified a definitive cause for the crash; however, I will continue to analyze the data available to me and monitor for any further unexpected events.

The following is a partial list of services that were unavailable during this unexpected downtime:

  • LizardWiki
  • Ladies On Two Wheels forums
  • Star Trek Games wiki
  • Wikitroid Skintest
  • LizardNet Code Review (Gerrit)
  • LizardNet Code Explorer (Gitblit)
  • LizardVPN
  • LizardNet Minecraft servers s1, c1, and c2
  • LizardNet’s Teamspeak3 server
  • LizardIRC server diamond.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on ridley.fastlizard4.org (emails sent to fastlizard4.org users during the downtime will be delivered after the downtime concludes)
Post-Mortem: ridley.fastlizard4.org downtime due to hard crash

20 November 2016: Scheduled reboot for critical Xen security fix

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

The Xen development team has released several critical and so far undisclosed Xen Security Advisories (XSAs), and as such, Linode (LizardNet’s provider) will be performing emergency maintenance on all of their Xen hosts.  LizardNet’s sole Xen system, phazon.fastlizard4.org, will be rebooted as part of the endeavour to patch the Xen vulnerabilities before the public disclosure date of 22 November 2016.  (More information can be found on the Linode status blog here.)

The following server and services will experience downtime:

phazon.fastlizard4.org
Date and time of downtime start: 12:00 Sunday 20 November 2016 UTC (convert to other timezones)
Duration of downtime: Expected between 30 minutes and 1 hour, but up to 2 hours is possible
Status: Completed on schedule with no issues!
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)

Apologies for the short notice on this downtime (both from me and Linode).

20 November 2016: Scheduled reboot for critical Xen security fix

25 October 2016: Emergency reboots to patch “Dirty Cow” vulnerability

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

A few hours before this post, I rebooted all servers to apply kernel updates to patch the so-called “Dirty COW” privilege escalation vulnerability in the Linux kernel.  The vulnerability is indexed as CVE-2016-5195, and more information about it can be found here (with some more technical explanation here).

Due to the emergency nature of these reboots, they needed to be conducted without advance warning. I apologize for not being able to provide advance notice, and thank you for your understanding.

There is a silver lining though – since reboots needed to be performed anyway, I took advantage of them to use waiting hardware upgrades from Linode – servers minecraft1 and ridley have both now had their RAM doubled.  This only added a few minutes to the downtime the reboots would have caused otherwise.

Servers affected:

phazon.fastlizard4.org
Date and time of downtime start: In the past
Duration of downtime: Minutes
Status: Completed with no issues!
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)
ridley.fastlizard4.org
Date and time of downtime start: In the past
Duration of downtime: Minutes
Status: Completed with no issues, hardware upgrades applied!
Partial list of services affected:

  • LizardWiki
  • Ladies On Two Wheels forums
  • Star Trek Games wiki
  • Wikitroid Skintest
  • LizardNet Code Review (Gerrit)
  • LizardNet Code Explorer (Gitblit)
  • LizardVPN
  • LizardNet Minecraft servers s1, c1, and c2
  • LizardNet’s Teamspeak3 server
  • Rav3nZNC
  • LizardIRC server diamond.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on ridley.fastlizard4.org (emails sent to fastlizard4.org users during the downtime will be delivered after the downtime concludes)
minecraft1.fastlizard4.org
Date and time of downtime start: In the past
Duration of downtime: Minutes
Status: Completed with no issues, hardware upgrades applied!
Services affected:

25 October 2016: Emergency reboots to patch “Dirty Cow” vulnerability

7 September 2016: Scheduled reboot for critical Xen security fix

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

The Xen development team has released a critical and so far undisclosed Xen Security Advisory (XSA), and as such, Linode (LizardNet’s provider) will be performing emergency maintenance on all of their Xen hosts.  LizardNet’s sole Xen system, phazon.fastlizard4.org, will be rebooted as part of the endeavour to patch the Xen vulnerabilities before the public disclosure date of 8 September 2016.  (More information can be found on the Linode status blog here.)

Edit: The downtimes have been completed on schedule with no issues.  When more information becomes publicly available about the specific XSA(s) that lead to this downtime, I will update this post.

The following server and services will experience downtime:

phazon.fastlizard4.org
Date and time of downtime start: 11:00 Wednesday 7 September 2016 UTC (convert to other timezones)
Duration of downtime: Expected between 30 minutes and 1 hour, but up to 2 hours is possible
Status: Completed on schedule with no issues!
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)

Apologies for the short notice on this downtime (both from me and Linode).

7 September 2016: Scheduled reboot for critical Xen security fix

Post-Mortem: ridley.fastlizard4.org downtime due to hardware problems

One of LizardNet’s servers, ridley.fastlizard4.org, was unexpectedly down from just after 14:00 UTC to just after 16:00 UTC today (Wednesday 3 August 2016) due to its host Linode server suffering a hardware problem.  The problem has since been fixed, and all services should now be running normally.  Thank you for your patience!

The following is a partial list of services that were unavailable during this unexpected downtime:

  • LizardWiki
  • Ladies On Two Wheels forums
  • Star Trek Games wiki
  • Wikitroid Skintest
  • LizardNet Code Review (Gerrit)
  • LizardNet Code Explorer (Gitblit)
  • LizardVPN
  • LizardNet Minecraft servers s1, c1, c2, and s2.5
  • LizardNet’s Teamspeak3 server
  • Rav3nZNC
  • LizardIRC server diamond.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on ridley.fastlizard4.org
Post-Mortem: ridley.fastlizard4.org downtime due to hardware problems

22 July 2016: Scheduled reboot for critical Xen security fix

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

The Xen development team has released a critical and so far undisclosed Xen Security Advisory (XSA), and as such, Linode (LizardNet’s provider) will be performing emergency maintenance on all of their Xen hosts.  LizardNet’s sole Xen system, phazon.fastlizard4.org, will be rebooted as part of the endeavour to patch the Xen vulnerabilities before the public disclosure date of 26 July 2016.  (More information can be found on the Linode status blog here.)

Update: The embargoes on the Xen Seucurity Advisory that triggered this emergency scheduled reboot have been lifted, and the issue responsible seems to have been XSA-182.  Some excellent (as always) commentary about the cause and implications of this XSA has been released by the QubesOS team, and can be found here.

The following server and services will experience downtime:

phazon.fastlizard4.org
Date and time of downtime start: 11:00 Friday 22 July 2016 UTC (convert to other timezones)
Duration of downtime: Expected between 30 minutes and 1 hour, but up to 2 hours is possible
Status: Completed on schedule with no issues!
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)

Apologies for the short notice on this downtime (both from me and Linode).

22 July 2016: Scheduled reboot for critical Xen security fix

9 April 2016: Scheduled maintenance downtime

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

Update: The scheduled maintenance has been completed with no issues.

Linode has scheduled required maintenance downtime for the server that hosts one of LizardNet’s servers, specifically, phazon.fastlizard4.org.  This downtime only affects phazon, and is expected to last about 60 minutes, though a full two hours is allocated to the downtime and may be necessary.  This downtime does not seem to be security related.

The following server and services will experience downtime:

phazon.fastlizard4.org
Date and time of downtime start: 03:00 Saturday 9 April 2016 UTC (convert to other timezones)
Duration of downtime: One hour expected, but a window of two hours has been allocated and the full two hours may be necessary.
Status: Completed with no issues.
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)

Thank you in advance for your patience!

9 April 2016: Scheduled maintenance downtime

18 February 2016: Emergency reboots to fix multiple critical security issues

This is a past/expired downtime notification. The downtimes specified below have been completed, and remarks/results are given below as well.

Unless otherwise noted, all dates and times are given in Coordinated Universal Time (UTC), with time in 24-hour notation.

Update: The reboots have all been completed.

In the hours following this post, all LizardNet servers (ridley.fastlizard4.org, phazon.fastlizard4.org, and minecraft1.fastlizard4.org) will be rebooted so patches for multiple critical security vulnerabilities can be applied.  The patches include fixes for CVE-2016-0728 (Linux kernel privilege escalation) and CVE-2015-7547 (glibc getaddrinfo stack-based buffer overflow) – more information about these vulnerabilities can be found at their respective links.

Due to the emergency nature of these reboots, they will be occurring almost immediately after this post.  I apologize for not being able to provide more advance notice, and thank you for your understanding.

Servers affected:

phazon.fastlizard4.org
Date and time of downtime start: Immediately
Duration of downtime: Minutes
Status: Completed with no issues!
Partial list of services affected:

  • LizardWiki
  • LizardNet OTRS (emails sent to OTRS during the downtime will be delivered after the downtime concludes)
  • LizardNet Continuous Integration (Jenkins) (Gerrit will not be able to trigger any jobs during the downtime, and they will not be run after the downtime concludes)
  • LizardNet Minecraft dynamic web maps
  • LizardIRC server emerald.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on phazon.fastlizard4.org (emails sent to phazon.fastlizard4.org users during the downtime will be delivered after the downtime concludes)
ridley.fastlizard4.org
Date and time of downtime start: Immediately
Duration of downtime: Minutes
Status: Completed with no issues!
Partial list of services affected:

  • LizardWiki
  • Star Trek Games wiki
  • Wikitroid Skintest
  • LizardNet Code Review (Gerrit)
  • LizardNet Code Explorer (Gitblit)
  • LizardVPN
  • LizardNet Minecraft servers s1, c1, and c2
  • LizardNet’s Teamspeak3 server
  • Rav3nZNC
  • LizardIRC server diamond.lizardirc.org
  • LizardIRC’s website
  • LizardMail services on ridley.fastlizard4.org (emails sent to fastlizard4.org users during the downtime will be delivered after the downtime concludes)
minecraft1.fastlizard4.org
Date and time of downtime start: Immediately
Duration of downtime: Minutes
Status: Completed with no issues!
Services affected:

18 February 2016: Emergency reboots to fix multiple critical security issues

(Resolved) Resumed/ongoing DDoS attacks targeting Linode infrastructure causing service interruptions

Update 14 January 2016: The attacks seem to have finally subsided and this issue is now resolved.  Linode has not reported any signs of the attack for a few days now, and they to have declared the incident to be over.  They’ll be publishing a full report on the attacks soon, and I’ll update this post when that becomes available.

All LizardNet services should now be operating normally, which no further risk of downtime or interruptions caused by the attacks.  Thanks for your patience!


Update 6 January 2016: The attacks against Linode are, unfortunately, still ongoing, though it seems that the network engineers have made good headway in mitigating and hardening against the attacks.  No significant service disruptions have occurred for over a week now; the most that has been seen is occasional slow performance due to increased latency or packet loss.  Besides, that, though, everything seems to be operating mostly smoothly.  Of course, until the attacks either cease or are completely mitigated against (which will still take some time yet), the chance remains still of occasional slow/degraded performance, along with a slight chance of temporary outages (though, based on the pattern, no further outages are expected as of this update).

In other words, expect perhaps some occasional slowness and nothing more, though don’t be too surprised if outages start occurring again if the attacks shift.


Original post: Unfortunately, the DDoS attacks targeting service provider Linode’s infrastructure have resumed and are ongoing.  According to a preliminary report released by Linode, since Christmas Day, Linode has received over 30 attacks “of significant duration and impact”.  Linode’s network engineers are working around the clock to mitigate the attacks, however, it is inevitable that the attacks will cause service interruptions ranging from degraded performance to full outages of LizardNet and LizardNet-hosted sites and services.  Hopefully, as attack vectors are mitigated, the interruptions will become less frequent and severe, but until the attacks cease, it’s worth noting that service interruptions may occur, though hopefully not as often or as severely now that network protective measures are in place.

Fortunately, it seems that the Fremont datacenter, which houses LizardNet’s servers, has been spared the brunt of the attacks, or for some reason has been better able to cope with them than some other datacenters.  This morning there was a period of an hour or two of increased latency and packet loss, but otherwise all LizardNet services were still available.  That doesn’t rule out future service interruptions, though, so if you start having trouble accessing LizardNet services, it’s almost certainly due to a shift in the ongoing attacks.

As before, this is out of my hands and there’s nothing that can be done except to wish Linode and the other upstream service providers luck in defending against these attacks.  It’s worth noting that this is an extremely massive attack, targeting networking infrastructure both at Linode’s datacenters and at upstream interconnection points; indeed, I would even hazard to call these attacks unprecedented in severity, coordination, persistence, and duration.

Linode has indicated that they plan to publish a detailed report once the attacks are fully mitigated and/or cease, which will allow for a more detailed analysis of the attacks.  Until then, though, thank you for bearing with me.

Best of luck to the Linode network engineering teams!

(Note: LizardIRC has servers outside of the Fremont datacenter and with other non-Linode providers; for more information specific to LizardIRC, please visit LizardIRC’s social networking pages: TwitterFacebookGoogle+.)

(Resolved) Resumed/ongoing DDoS attacks targeting Linode infrastructure causing service interruptions