You may know our server is hosted by OVH and is SAN-based. This means that the / of our server is not stored on a local hard disk drive but it is instead stored in a distant SAN. We already encountered “little” uptime problems because of some OVH service down-times. Well, last week has been the most horrible week ever[fr].
We had terrible throughput down to 70KBit/s, latencies up to 2 seconds and several down time that could have been as long as 8hours in a row.
On top of that, we had updates for the website that we could only apply by crossing our finger because some of them require the website to be rebooted. But, when rebooted, the django-based website touches a lot of files and with such a terrible latency, it could take up to 10 minutes to launch.
The problem was that cherokee was trying to spawn too many processes which in turn slowed down the website’s boot sequence and this often lead the server to lack of RAM just as a bomb fork would operate. I have fixed this problem by putting a lock on the beginning of the boot sequence.
A few hours ago, OVH switched our server to a new SAN which delivers close-to-normal performances. We are using this time to fix as many problems as we can. If you spot any problem, please send us a message.
Anyway, we are sorry for the inconvenience it may have caused.