A Better Approach to Restarting a Mongrel Cluster

| Comments

At Karabunga, we use Mongrel, a lot. As our Rails applications become larger, the startup time of a Mongrel process becomes significant.

As you know, restarting your a Mongrel cluster is a matter of issuing this command: /etc/init.d/mongrel_cluster restart. Here’s what happens for a cluster of 4 Mongrels:

A typical Mongrel Cluster restart

For one of our application, the “stop” time is about 2 seconds and the “start” time is somewhere around 10 seconds for each Mongrel process. Which means that for a cluster of 15, we have a window of about 3 minutes where at least 1 and often many Mongrels are innaccessible. Worse, there’s even a point where there is no Mongrel running at all. In a high traffic production environment, this is not acceptable.

With some hacking, I managed to modify mongrel_cluster_ctl (the script called by /etc/init.d/mongrel_cluster) to avoid the above scenario. My hack also makes sure that at most 1 Mongrel will be down at any given time. Here’s a graphical representation of my hack:

A Mongrel Cluster restart with my hack

I believe this is much more efficient. Moreover, when implemented with Swiftiply, downtime is reduced to zero since Swiftiply will detect a “dead” Mongrel and route requests to one that is alive.