A review of recent cPanel hosting issues – and the way forward

Over the past few days, we’ve discussed with several of you that the recent experience with cPanel Linux hosting has been less than ideal. The issues have predominantly been about websites taking considerable time to load. Naturally, we were extremely concerned by it.  This week, we undertook a comprehensive review of the Linux hosting setup in order to get to the bottom of this issue.

In course of our investigation, we had to take some of the servers offline today (Tuesday) for brief periods ranging from 5 minutes on a couple of servers to 45 minutes on one server, during which time your customers’ websites would have stopped resolving. Our analysis revealed that there indeed were some issues with the infrastructure. For your reference, here is a recap of the issues-

  • We use the CloudLinux Kernel which ensures fair distribution of resources across all servers. There was a bug in the CloudLinux Kernel which was reporting faulty numbers for the %iowait and %idle parameters.
  • Due to this bug, the load average on the servers was being reported falsely, which is why it did not trigger our alerting systems.
  • At the same time, we also disabled a few sites which seemed to be consuming excessive server resources – however, this didn’t help.
  • We realized quite late that we are running out of I/O on our storage devices (SANs), and they were saturated. We should have caught this earlier, but didn’t do so as the numbers reported by our hosting servers indicated otherwise.

It was a blunder on our part that we missed the %ioutil trends in our graphs 🙁 For this we unreservedly apologise, we’ll ensure that this never reoccurs. This exercise did enable us to put together a plan of action to fix these issues permanently. What follows is a preview of this plan – our product engineering team will actively comment on this post as we go forward with this process, to keep you up to speed –

  • We were already working on building a new storage architecture which heavily utilizes SSDs to overcome I/O bottlenecks. This was on the cards for the upcoming weeks, but we are expediting this to begin in the next 2-3 days.This might require downtimes on the servers – we’ll ensure to let you know about it in advance.
  • We’re communicating with the folks at CloudLinux to ensure that the bug causing the faulty stats reporting is fixed.
  • Meanwhile, we’ve increased the memory (RAM) on all our hosting servers to support/augment the MySQL performance, allowing it to cache more aggressively.

We’re confident that, with the steps we’re taking, the issues that customers have been facing will be fixed permanently. I recommend that you follow the comments on this page for our updates as we go forward.


5 thoughts on “A review of recent cPanel hosting issues – and the way forward”

  1. Rizwan says:

    *The SSDs for the servers will be arriving today evening (US time) at the DataCenter. It shouldn’t be an issue adding them to the servers because our servers are hot swappable and installation would not require a downtime.

    * However, we would need a downtime when implementing the SSD’s with FlashCache . This will happen most probably by the day after tomorrow, the 14th of October, 2010.
    We will keep you in the loop before making any changes.

  2. rizwan.i says:

    We are currently Installing the SSD’s and in order to stress test them an emergency maintenance has been scheduled for one of our cPanel Servers.

    A notice for the same with the affected IP Addresses has also been added at:

    We will update this thread with the progress made.

  3. vidhi.s says:

    The SSD’s have been implemented with FlashCache on the 1st Server.

    We will now be implementing the SSD’s with FlashCache on the 2nd Server. This Server will undergo a maintenance as per the schedule mentioned at:

  4. vidhi.s says:

    Flashcache has been implemented with SSD’s on two of our servers. The IPs for which this has been done can be found here:

  5. Rizwan says:

    We are implementing FlashCache on CP-1 so a few websites will not be available for 45 minutes on this server. The list of IP addresses is available on

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.