Server outage (disk failure)
Yesterday morning, I noticed that my server was running slow. I couldn’t see any processes hugging up resources, though.
Instead of really looking into the problem, I decided to reboot the machine . That was a mistake. As the server did not come back online, I realised that it was likely that there was a problem with the disks .
I have a dedicated server at http://www.hetzner.de/ , and it’s really the first time I run into problems . I can really recommend this hosting provider.
The server has a software raid with 2 disks , running Cent OS.
I assumed that mdadm was trying to recover , but had no way of knowing, since the machine did not come back online.
At this point, I got very scared - I feared loss of data.
Fortunately, the guys at hetzner supply a self-service console to the machine (a rescue system).
I could log in using that mechanism, and then I was able to mount the filesystems in raid. It was quickly clear that indeed, 1 disk died.
Now I could do 2 things :
- request a disk replacement. This was going to take a while, and during that time I don’t have a redundant disk. And chances are high, when 1 disk fails , the other will also fail.
- move my installation to a new server. I know that between ordering a new server, and having the OS installed on it ready for use, only takes around 1 hour (did I mention these guys are great ? Note that this is physical hardware, not some cloud service !)
I decided to go with option 2 .
This consists of copying the data from the old server to the new one (this took a long time), reinstalling the software , reapplying the configuration for my mail servers and other stuff, and then adjusting the Domino configuration (change the ip addresses).
In the end, it took me 10 hours in all, to get the new server up and running…including copying the data. Now I just have to decommision the old server , and I’m done :-)