Hello, ModX Cloud is down for my two sites for more than 9 hours now, without a single update. Can you please let us know what is happening and when we may expect to be live again?
Kind regards --Mike
Hello, ModX Cloud is down for my two sites for more than 9 hours now, without a single update. Can you please let us know what is happening and when we may expect to be live again?
Kind regards --Mike
There was a fire in the datacenter in Amsterdam.
Hi @michiel,
Our sincere apologies. We are not able to publish notices everywhere—including the forums.
All MODX Cloud status notifications are published at status.modxcloud.com and all modx.com notices are published at status.modx.com. In addition, inside the MODX Cloud dashboard, for this event, we added a pushdown notice to let customers know what was going on and provide quick access to the incident link.
To be clear, our infrastructure provider for our AMS3 platform has their hardware infrastrucutre at the NorthC data center in Almere, NL. There was a fire at this facility there was a total power loss. We published the first status update within minutes of the server becoming unavailable.
As of yesterday night, I sent out an email to all affected customers to let them know what actions to take.
It should be noted that we had a disaster recovery plan for catastrophic server failure but not for complete data center shut down. While backups are stored in offsite, redundant cloud object storage our normal recovery process would be to rebuild a new machine in sequence with the backups in the same data center (which we’ve done several times before) and it would recover. We would then reassign the IP addresses and sites would come back online.
Given that the data center had it’s power shut down during fire fighting etc, the entire data center was inaccessible. As such we had to on the fly refactor our site migration system to migrate sites to a new data center using the last availiable backup. Again, the other issue with this is that our normal data center migration system includes a proxy so there’s no downtime, however, since the origin IP is locked in the shut down data center, there was no way to proxy traffic to the new location.
We’re currently in the midst of helping our customers move their sites to new locations and are here for any issues that may arise in the process (although so far everything seems to be working as it should be).
If you have questions about your own sites, please contact our support team directly. If you have general thoughts about how we could better handle a rare catastrophic event such as this, please feel free to responsd here.
Hello @smashingred,
Thanks for that insight. For some reason I did not get the pushdown notice, and only @jako pointed out where exactly the updates were posted I found the correct information.
My sites have both successfully been transferred, so thank you for that work.
As someone who has worked with telco’s and assisted in similar (although not this dramatic) events, I can only advise you to take the time to do a proper post-mortem later in the week when the pressure has subsided, collect feedback from all involved and use this to improve your procedures. It is so useful to increase service quality even more.
Thanks & kind regards --Mike