Versioned Will Enforcement (And You Can Do!)
If security is a never-ending process, operations is systematic refutation of entropy.
When I first started out, I did everything by hand. Installs, configuration, every aspect of management. Eventually I started writing scripts. Scripts would install things for me, and copy configuration files, and whatever else. But the scripts were stupid. If they were run more than once, bad things might happen.
The day I did my first automated network install was an epiphany. No more hitting enter or clicking next until my eyes bled a merry pattern on the keyboard.
The weird thing is, my first job involved using Norton Ghost to install entire labs of workstations with an operating system image. But it never occurred to me, until many years later, that a similar thing could be had for servers. A major hole in my experience.
So then I started using images to install new systems. Of course, the problem with using images is that as soon as you build them, they're out of date. What's in the image is not actually representative of what you have in production. The image has new stuff the production boxes won't, or the production systems were changed in some undocumented way that is not reflected in the image, or... Anyway, then you end up writing more scripts. To keep things in sync. Only they aren't perfect, because by this point every system is just slightly different enough that you can't find all the edge cases until they cause a boom.
Two years ago I discovered Puppet. I had seen change management before, but in the form of cfengine, and it didn't really grab me. Its syntax didn't make my life any easier. It didn't offer a mental model for how the different pieces of my infrastructure interacted. Puppet did. Maybe Luke just explained it properly in the videos I watched while researching change management tools.
The joy of change management comes from documenting your infrastructure, and then enforcing that singular vision across it with a minimum of effort.
When you install a new host (presumably using Jumpstart/JET, or FAI, or Cobbler), you install Puppet. A few minutes later, that host is now configured with the same base as the rest of your installed hosts. They're all the same. File permissions, users, directories, services, cron jobs...
If a service needs to be installed on a group of hosts, you write the service class, include it in the service group, and Puppet does the rest.
There's no more "Oh, right, we changed how that works, but I guess this system we never think about didn't get updated, and now we've totally screwed ourselves in some really unexpected way."
There's no more "Hm, someone changed something on this box, and I don't know why, but I'd better not touch it," because your Puppet classes are in a versioned repository. You always know who, and why, something was done. (If someone does make a local change, well, too bad for them, because Puppet is bloody well going to change it back until they create an auditable configuration trail.)
I think there's a threshold: Once you hit a certain number of hosts, you can't keep them all in your head. I have 20 physical hosts and 87 virtual ones. When I bring up a new Solaris zone, I don't want to have to run some script that configures it. Heck, I don't even want to bring it up myself. I just tell Puppet to do it, and then Puppet enables itself in the zone, and then the zoned Puppet configures the zone and suddenly whatever service I wanted to be running is.
I don't want to have my installation method add a bunch of users. What if I have new users? Now I need to make sure my user adding scripts, and my post-installation scripts, will do the right thing! No, I think I'll just let Puppet ensure, every 20 minutes, that users who are supposed to exist, do, and those who shouldn't, don't. (Not to mention that Puppet makes sure the users environment is always set up. No more having to copy your dot-files around, or checking them out from your version control system, or...)
Once you reach a certain amount of platform complexity, you need to abstract management into something you can keep in your head. Otherwise you end up spinning repetitively instead of focusing on newer, more interesting work.
It isn't even really that much of a paradigm shift. We always end up writing scripts to manage our systems for us. Taking the next step and writing classes and functions in Puppet's declarative language really isn't a leap.
Once a codebase reaches a certain amount of complexity, it has to be refactored. It has to be abstracted. Otherwise it becomes unmaintainable. As with development, so too for operations.
If you've been at this game for a number of years, and you find yourself performing the same tasks over and over; or like me you are administering a moderate number of hosts; or you have thousands upon thousands of systems, and you aren't using some form of versioned change management: Consider this an intervention.
Dude. You're doing it wrong.