ZFS on FreeBSD and Benefits of Software RAID
This was an unplanned journal entry. I wasn't planning on an upgrade and update to my home server which runs on FreeBSD. Bad things seem to happen all at once, and soon after I got a nasty throat infection, my home server motherboard died. During installation of the motherboard one of the mirrored disks of the main file storage device failed. Time to make lemonade I guess.
A few lessons here:
- Always have RAID-1 or RAID-5/RAID-Z, even for workstations. In this case, no priceless family photos or videos were lost. For workstations, you don't lose any time from work, and can grab a replacement disk later.
- Software RAID is flexible for commodity hardware which often does not have 1 to 1 replacements at the shop a year or so after you bought it. You can usually just connect the old drives to a new motherboard, controller or another PC and it will just work. For desktop users, Fedora Linux you can do it via GUI during installation. Hopefully Ubuntu will have it too, as I think it's a good thing if it's easy for home users.
- The RAID-1 of most motherboards works as it should, and you can disable the RAID setting and the drive(s) will still be easily accessible as a normal drive. As per the previous point, software RAID is recommended.
Time for ZFS
The two failures, conspired to forcing this upgraded setup earlier than anticipated. FreeBSD 7.1 had problems booting up on the MSI KA70VM as a PATA drive, forcing me to do a FreeBSD 8.0 binary upgrade from CD (totally trouble free I might add). Current best bang for the buck drives are 1TB and it's painful with UFS2. With ZFS production ready on 8.0, it's time for a modern storage layout.
ZFS Man (YouTube) is a funny and informative introduction to ZFS on FreeBSD.
These resources will get you going:
- http://wiki.freebsd.org/ZFS
- http://wiki.freebsd.org/ZFSQuickStartGuide
- http://en.wikipedia.org/wiki/ZFS
Some more tips here:
RAID-Z or Mirror?
Constantin Gonzalez has written an informative blog on this.
Your options are more space for cheaper (more space/drive) in a more inflexible setup (RAID-Z) or less space, with a more flexible and faster performance mirror setup. With 6 SATA ports, and the Antec P182 case having a 4 + 2 drive cage case, it makes more sense on commodity hardware to have a mirror setup where data loss is more of a factor than space.
Here is my list on why mirror makes more sense for commodity hardware:
- I don't need that much space. I don't have large media requirements for critical shared data. None-critical data can also sit safely on my mirrored workstation drives.
- You need boot disks, which should be mirrored. Curently I'm using 2 x 80GB PATA drives, but this won't be feaseable in near future. So that leaves you with 4 SATA ports.
- Another SATA port is taken up by your DVDR drive
- So you're left with 3 slots. With this amount, it doesn't make sense to run RAID-Z for me. Especially more so with the option to have 3-way mirror and swapping up larger drives to seamlessly upgrade your mirror. That makes sense on a household budget, where it's hard to justify buying 5 disks.
- More drives = more heat and power usage = more noise.
Since commodity drives are likely to fail anyways, I grabbed a pair of the cheapest 1TB drives available which currently are the Samsung Spinpoint F1. Performance surprisingly was not bad for these drives.
Setting it up
This part blew me away.. ZFS rocks.
I find out that my two new drives are ad0 and ad1, with atacontrol list:
ATA channel 0:
Master: ad0 <SAMSUNG HD103UJ/1AA01118> SATA revision 2.x
Slave: ad1 <SAMSUNG HD103UJ/1AA01118> SATA revision 2.x
ATA channel 1:
Master: ad2 <ST380023A/3.33> ATA/ATAPI revision 6
Slave: ad3 <Maxtor 6L250R0/BAH41G10> ATA/ATAPI revision 7
ATA channel 2:
Master: no device present
Slave: no device present
ATA channel 3:
Master: no device present
Slave: no device present
ATA channel 4:
Master: no device present
Slave: no device present
ATA channel 5:
Master: acd0 <PIONEER DVD-RW DVR-212/1.21> SATA revision 1.x
Slave: no device present
So let's create our mirror pool:
zpool create data mirror ad0 ad1
That's it, data is the pool name I used and it's automatically mounted at /data (no need to mess around with fstab and such).
Let's find out our new pool status:
[kaeru@xavier ~]$ zpool status
pool: data
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
ad0 ONLINE 0 0 0
ad1 ONLINE 0 0 0
errors: No known data errors
And where it's mounted and how much space is available:
[kaeru@xavier ~]$ zfs list NAME USED AVAIL REFER MOUNTPOINT data 105G 808G 27K /data ...
I've snipped some data here on some other mountpoints, hence some space is used already. This is immediately usable like any other filesytem.
Here is where some clarification is needed. The pool can act both as a device and filesystem. So by default data is the name of the pool and also the filesystem.
You can already copy files and such this /data filesystem, however everything in it will be treated as if its a single partition, so you can't do fancy stuff like set quotas, additional copies, compression and so on for subdirectories.
In order to do that, you need to create additional filesystems using the data pool:
zfs create data/jails zfs set mountpoint=/jails data/jails
This is going to create a jails filesystem in the data pool, and automatically mount it as /jails. The mount command will show how this works:
mount ... data/jails on /jails (zfs, local) data on /data (zfs, NFS exported, local) ...
ls /data/jails is going to say no such file or directory, because there is no directory there. You could mkdir /data/jails if you wish but that's a directory but not the filesystem.
By default, without the mountpoint option, data/jails would have been automatically mounted as /data/jails. In the above example the difference between a filesystem and normal directory is clear. This difference is important when you export filesystems and wonder why /data is empty.
Automatic exporting of NFS/SMB shares
Exporting filesystems can now be done automatically using zfs commands:
zfs set sharenfs=on data/
This will export any "children" datasets (or filesystems) automatically like data/jails:
[kaeru@xavier ~]$ showmount -e Exports list on localhost: /data/videos/family Everyone /data/videos Everyone /data/photos Everyone /data/music Everyone /data Everyone
You can set better security options of course. Back to the filesystems vs directory. If you NFS mount /data on a remote PC, you won't see /data/music or /data/photos. This is because they're not mounted in the /data filesystem(as a directory). If you want them available as /data/music on the client you'll have to mount them again, maybe as an nullfs mount on the server or as additional mounts on the client. Hierarchy here applies to datasets, not subdirectories, which work as normal POSIX filesystem. This should not be an issue in future with NFSv4 namespace support.
You can use old way of configuring /etc/exports if you want, but I like this way better, it makes sense.
Quotas
Similarly, no need to mess around with quotas anymore in fstab. One of the reasons for having jails dirs on MD disks, is a hard filesystem quota. With ZFS pools this is now no longer an issue:
xavier# zfs set quota=100GB data/jails xavier# zfs list NAME USED AVAIL REFER MOUNTPOINT data 97.1G 816G 27K /data data/jails 1.80G 98.2G 19K /jails data/jails/kaeru.my 1.80G 98.2G 1.80G /jails/kaeru.my data/music 55.6G 816G 55.6G /data/music data/photos 21.4G 816G 21.4G /data/photos data/videos 18.3G 816G 19K /data/videos data/videos/family 18.3G 816G 18.3G /data/videos/family
data/jails filesystem is now limited to 100GB, and now we want to limit kaeru.my jail to 20GB:
xavier# zfs quota=20GB data/jails/kaeru.my xavier# zfs list NAME USED AVAIL REFER MOUNTPOINT data 98.8G 815G 27K /data data/jails 1.80G 98.2G 19K /jails data/jails/kaeru.my 1.80G 18.2G 1.80G /jails/kaeru.my
kaeru.my jail is now limited to 20GB, whereas before it inherited jails limit of 100GB. Neat huh? Oh it's no longer UFS2 or and file backed MD disk.. no more long bgfsck's on unexpected reboots, no more double overhead of an MD file backed disk for performance.
There is a long list of other ZFS features, of which snapshots and the ability to send snapshots over pipes and ssh look the most interesting.
Some tuning needed
ZFS by default tends to eat up a lot of memory, and this can result in poor performance. After reboot, r/w performance was reduced to around 5-10MB/s after several minutes of use. I had to reduce the ZFS adaptive replacement cache (ARC) usage, to 512MB on my 4GB server.
In /boot/loader.conf:
vfs.zfs.arc_max="512M"
After this change, performance was closer to the limit of the drives and stayed there.
FreeBSD 8.0 Errata
FreeBSD 8 has a ton of new features, which will take a long time to explore. The good thing is that the performance features are immediately available such as the new scheduler. Here are some of the errata:
- Dummynet used for bandwidth shaping seems to have some bugs, but patches are available: http://www.mail-archive.com/freebsd-ipfw@freebsd.org/msg02261.html especially the "dummynet: OUCH! pipe should have been idle!" messages.
- Wifi setup has changed a bit, you need to setup wlan pseudo devices now.
- jails has new functions, and command options including multiple ip's per jail, ipv6 and jails within jails and network stack virtualization.
No
No

