Well, it took ages, but the EBN and the different VMs it hosts are back. Add "sysadmin" to the list of occupations I probably shouldn't attempt without (1) more training (2) a stricter schedule. The NLUUG spring conference on systems administration was quite educational -- and fun, too, chatting with various companies and learning about NanoBSD and ZFS -- but it didn't give me any magical beans to fix what ailed the EBN.

So what was the problem? Well, the whole thing started (yay, placing the blame!) with Bertjan, who wanted a newer Qt version on the EBN for his software quality checking tools. The EBN ran 6.2-R, and the necessary Qt versions and stuff are not supported on that OS anymore. While the EOL for FreeBSD 6 is still six months away, the ports maintainers don't necessarily want to support that. So we needed to update the OS to something newer.

There's tools to do that now, but I've never used them, and anyway I don't think they support FreeBSD 6. So that means lots of "make buildworld buildkernel installkernel installworld" kinds of steps. First off I found that doing the compilations took a lot longer than I expected (or hoped). So where I planned to go 6-6.4-7.3-8.0 in one day, the fact was that just compiling was going to take longer than that. I couldn't pre compile everything either with the machine still up, because FreeBSD 8 doesn't compile in a FreeBSD 6 environment. Hence the multiple steps. Note to self: update more frequently to avoid this kind of large upgrade.

Second problem was that the jails (virtual machines) on the server were poorly set up. They all had their own copies of the world. I hadn't realized that a 6.2 jail wouldn't work in a 7.3 host (for instance, ps fails and lots of other system tools don't like it). If I had spent more time thinking, I would have realized that I could installworld to each jail again and things would be ok. Note to self: set up jails with an easily upgradeable world, as described in lots of best-practices documents on jails.

So I upgraded the host onwards to FreeBSD 8.0. Another long long compile, with no GNU screen to make it easier to deal with. Thank goodness for the ILOM and the system console redirection it provides.

Of course, then I went on to make delete-old-libs, which meant that the ports on the system -- all of which were compiled against the 6.2 libraries -- didn't work anymore. Note to self: see that little note "in case no 3rd party program uses them anymore"? Keep it in mind next time.

So, after about two days, I had a base system updated to 8.0, no working jails at all, and all ports -- both in the host and in the jails -- broken. At this point, I started doing two things in parallel. Note to self: don't. I started rebuilding the ports in the host system, and reconfiguring the jails to have a single base installation with just /home, /etc, /var and /usr/local local to each jail, using nullfs mounts; I also decided to drop the starting of jails in /etc/rc.local and to use the jail-launching support that is now built in (but which wasn't, as far as I know, available in 6.0 which is when I first configured the machine). Note to self: that was actually a good idea, and thanks also to Sjors who reminded me of the jail_* variables.

So, rebuilding ports after a big step like that is complicated by the fact that perl, ruby, php and python all needed to be recompiled and portupgrade -apP sometimes doesn't quite get it right. In any case I needed to rebuild the ruby stack first to get a working portupgrade. The other three languages were a mess, with some modules of the languages disappearing at inopportune points along the upgrade path. Basically I did portupgrade -apP ; pkgdb -F ; portinstall an awful lot until things were working again. This morning I finally got rid of the last missing PHP 5.3 modules which brought the EBN parts back to life. Note to self: read UPDATING twice before doing this again.

Of course, all that would have been less problematic if the disk array hadn't given out twice during the whole operation. Once the ridiculously heavy load on the machine caused a panic and once the power on one of the disks fluctuated enough to cause another panic. Running fsck on a 600GB filesystem with 14M inodes is not quick (especially if there's a few directories with 1M files in each, as is the case with KDE SVN mirrors). Note to self: badger more people about a better disk array for KDE.

Combine all that with sickness and family time and that's why it took a week. I'm blogging this for the notes to self for the next time I run an upgrade (resolution: when FreeBSD 8.1 comes out) and to notify folks that things should be back to normal. (If not, drop me a note in comments). One the positive side, the server is better organized now, disk usage is down a little bit, and future upgrades should be much easier.