Monday 31 August 2009

Home network rebuild, part two: Virtual Hosts

Having previously got my new fileserver/NAS box up and running I've now moved on to splitting out the roles of my previous monolithic server into several virtual servers.

Virtual Servers
The first server (not actually virtual) hosts the basic network infrastructure required to get everything else running. NFS for various filesystems (not least of all home dirs), AoE for network attached disk space, DNS, DHCP and tftp for booting clients, and NTP for network time. On top of that I'll probably set up networked syslog there too.

The second server is the phone system -- asterisk. It's going onto a virtual machine of its own because having functioning phones is important and I'll probably install it, configure it and then keep the updates to a bare minimum.

Then we get onto the two fun boxes. One is the main, stable server. It runs Debian lenny (stable) and hosts mail (exim and dovecot), apache, subversion, squid, network monitoring, music streaming, printing, and the like. The second is running Debian squeeze (though it'll probably become sid at some point) and is for more bleeding edge stuff.

The idea is that if the version on lenny isn't up to my needs instead of back-porting or installing individual packages from unstable as I used to, I can simply install it on the unstable box. It's also intended to be a bit more of a play area for ideas. If I actually find myself needing the unstable version of dovecot, say, I'll probably spin it off onto its own box.

There are a few remaining areas which are going to present a problem. One is MythTV. It requires hardware (the tuner cards), and often bleeding edge releases of various things. At the moment virtualizing it doesn't seem like a win so it will stay on the old server for the time being and eventually, perhaps, be shifted onto a spare Via M10000 board I have and powered up and down for recording with wake-on-lan packets. We'll see.

Implementation
The two physical servers I have are the new Atom 330 based NAS box and my old dual core Athlon 3800 box. Each has a couple of gigs of memory and gigabit ethernet connection. The Atom box won't support kvm but kqemu seems to run fine on it. In fact, I've found the version of kvm I'm running (kvm85 on linux-2.6.30 amd64) is actually slightly unstable and prone to crashing and locking up, so I'm using qemu (with kqemu) on both servers at the moment.

I've handled networking by using bridging and having experimented with approaches a bit I've gone with creating an LVM for the basic operating system for each host and presenting that to the VM as /dev/hda, but then any data partitions (e.g. mail stores, home directories) are separate LVMs accessed from within the host over AoE.

Having these partitions available to all hosts while they're up rather than having to attach them to the machine as virtual devices makes managing them easier, but despite playing with booting the whole virtual machine over AoE using pxegrub, I found the complexities of managing the vmlinuz and initrd images externally to the host running them (they have to be external so they can be served up by TFTP) to outweigh any benefits.

So far, and since switching to kqemu instead of kvm, they seem to be up and running and performing well enough for the tasks they have. The original server is being steadily stripped of its responsibilities and services and once I've built an asterisk virtual machine it will be taken down, stripped of its disks and rebuilt as a minimal physical host for running kvm/qemu images. Plus that inconvenient MythTV server, of course.


Future Plans
The main thing I'm planning to get around to at some point is sorting out live migration of virtual machines. I'd like the hosts to all sit on the low-powered Atom server for much of the time but as other machines come up, migrate them over. So for example, when I fire up my quad-core athlon box it'd be nice to have the two main servers migrate transparently onto it for increased speed. With wake-on-lan configured and an appropriate shutdown script to migrate them back this would mean I could have CPU power on demand without actually having to deal with host downtime.

A variation on this would be to have the various desktop machines around the house boot into a virtual-hosting shell and grab their (or any other) virtual machine from the network. I could then power off my desktops for much of the time without losing session state (including open network connections) and remote machine access.

Why?
None of this seems like an immediately obvious Big Win, and in truth a large part of the reason for doing it is for its own sake so that I have up to date, hands on experience of these things.

All that said, there are real advantages I'm already seeing at this early stage. The virtual machines are all hosted on LVM volumes. When I perform an upgrade I can first snapshot the volume and if it all goes wrong, roll back to the previously working state. There have been times when I've really wished I could have done that with a buggy upgrade.

Also, having the main parts of my network infrastructure on a virtual host means that I can effectively upgrade them, repartition them and generally carry out all the sorts of low-level maintenance which previously meant crawling about in the attic swearing at scratched CD-ROMs and buggy BIOSes from the comfort of a laptop in the garden.

For a home network, virtualization is perhaps not as exciting a prospect as in a large data centre and the advantages of scale are lost, but there are still some more mundane advantages that I'm looking forward to.

Home network rebuild, part one: Storage

Over the past couple of months (a period dictated more by lack of time than anything technical) I have been undergoing some fairly major rebuilding of the house server infrastructure.

Background
Up until July we had a server in the attic with a terabyte of RAID 5 storage (Linux software RAID on SATA disks) which ran virtually everything except the routing tasks which were moved onto a OpenWRT-based Netgear WGT634u some time ago.

In July I needed to upgrade the disk space and took the opportunity to build a new server and explore the possibility of using ATA over Ethernet (AoE) and moving some of my services onto virtual hosts. The server I bought was a TranquilPC BBS2 server, which is a low power, Atom 330 based thing with three hot-swappable SATA drive bays and Gigabit Ethernet. Being an Atom it won't support things that require AMD-V or VT-x extentions (e.g. kvm) but it should be good for lightweight kqemu-based virtual machines. To this machine I added three 1TB Western Digital WD10EADS drives.

The BBS2 is a nice piece of kit originally intended for use as a Windows Home Server. It runs Linux very nicely though and I went about setting it up as a Debian server. The idea was that it'd have a 1GB RAID 1 root partition mirrored across two drives and the rest as a chunk of RAID 5 LVM space. If I had been bothered about performance I could have sliced it into a RAID 0, RAID 1 and a RAID 5 chunks for different purposes but it's not something I really care about at home.

BBS2 Installation
Debian pretty much installed out of the box, complete with my LVM and RAID requirements. I booted from a flash stick and everything worked fine until I rebooted, when it couldn't find a bootable device. I booted into rescue mode and accidentally installed a boot-block to the flash stick which at least got me up and running.

I burned several hours trying to get it to boot from the internal disks because grub is known to be slightly finnicky about the BIOS order of devices versus the operating system order, and I wondered if I'd mucked it up. But no, it seems that the problem is simply that the SII3124 hosted hot-swap disks aren't bootable. Damn.

For a while I booted it off the USB stick with a grub install but for longer term I got a compact flash to SATA adapter and a spare 128MB CF card stuck that in as a bootable disk. It seems to work and having yanked out each of the RAID disks in turn the box still boots. Success!

I set up 5GB LVM volumes for /usr and /var and a 100GB home partition which I installed as /srv/nfs/home. Most of the machines around the house already use autofs to mount /home from either /var/export/home on the local machine or over NFS. It seems /srv has become an official part of the FHS since I last looked so I'm switching from /var/export to /srv/nfs. Finally I also set up a /srv/backup for my rsync/hardlink based online backup system.

The idea for this box is that as a low-power server it will be on all the time and hosting the basic network services I need to boot anything else -- disk space (NFS and AoE), DHCP, DNS, tftp and NTP. Everything else can live on virtual machines hosted either on the BBS2 or, if necessary, on a more high-powered machine.

Initial tests seem to show that the disks manage real world performance (on top of RAID5 and LVM) of ~30MB/s, which though a bit disappointing is good enough for my purposes. For those interested, the hdparm -t results for various devices are:

Raw hard disk (/dev/sda): ~75MB/s
Raw RAID 1 (/dev/md0): ~70MB/s
Raw RAID 5 (/dev/md1): ~35MB/s
LVM/RAID 5 (/dev/array/test): ~28MB/s

I'm actually quite surprised by the loss in performance from raw disk to RAID 5, and that might be something I investigate at some point. I don't actually need the performance to be better but it bothers me.

Edit: The problem was that the RAID 5 array had lost a disk. It wasn't rebuilding but the performance still suffered, interestingly. Rebuilding occurred at ~30MB/s when I kicked it off and the performance this morning is ~85MB/s for both raw RAID 5 and LVM/RAID 5.

Disclaimer: I have nothing to do with TranquilPC. When I started thinking about this project I like something prebuilt for a change, particularly given I could shop based on whole-system power consumption that way. I did a quick search for low powered servers and TranquilPC came up as highly rated, quite cheap, small and independent and while not official supporting of Linux, at least very open-minded about it. So far I'm quite happy with my purchase.