Friday, February 22, 2008

In which

I re-discover the power of checklists and standards and the saying 'If you don't have time to do it right, when will you have the time to fix it?'

I had a task - on short notice [1] - to move a local zone [2] to a new host. Couldn't migrate the sucker using the cunning shell scripts [1] we've deveoped so I did it the hard way - create a new zone, create the local users, mount points, startup scripts and then turn it all on.

I like to create checklists for situations like this. The act of creating a checklist forces logic on the task, and nothing beats having a rote list of steps to follow when you're in a hurry; it saves 'thinking' for exceptions to the process.

But I didn't have time so I scribbled out a rough list of steps in a text file and called it good.

This reinforced the value of a properly done checklist because I spent (or so it felt) nearly as much time going back and forth on basic tasks [3] as I did getting anything done and thinking way too much.

This exercise also reinforced the value of standards.

We have a reasonably smart way to shortcut hardware fail over [4] but one directory, with one script was the sole exception on this host.

I spent 90 minutes on the failover. I spent 60 minutes tracking that bug down and fixing it.

I wonder what the time code is for 'flailed around like a dumb ass but a valuable habit was reinforced'?

[1] For a variety of reasons that are tedious and not worthy of discussion.
[2] Solaris-speak for a virtual machine.
[3] Couldn't create the users without the mount points for their home directories. Couldn't mount the directories without editing vfstab. Etc.
[4] Mounting everything but the OS on NFS mounts.
blog comments powered by Disqus