by Greg Retkowski
Computer immunology is a hot topic in system administration. Wouldn’t it be great to have our servers solve their own problems? System administrators would be free to work proactively, rather than reactively, to improve the quality of the network.
This is a noble goal, but few solutions have made it out of the lab and into the real world. Most real-world environments automate service monitoring, then notify a human to repair any detected fault. Other sites invest a large amount of time creating and maintaining a custom patchwork of scripts for detecting and repairing frequently recurring faults. This article demonstrates how to build a self-healing network infrastructure using mature open source software components that are widely used by system administrators. These components are NAGIOS and Cfengine.