You are here: Home Journal ZFS on FreeBSD and Benefits of Software RAID
About the Author

Khairil believes that one can earn a decent living and enjoy life while enriching the wider community.

 

He currently works as an IT consultant for Inigo, advocates FOSS and free knowledge and culture.

 

Contact

Commenting is enabled with OpenID login.

 

All content on this site shared under the Creative Commons Attribution License 3.0

.

 

This is a personal site, comments and content here does not reflect the official viewpoint of Inigo Consulting.

OpenID Log in

 

ZFS on FreeBSD and Benefits of Software RAID

— filed under: , ,

This was an unplanned journal entry. I wasn't planning on an upgrade and update to my home server which runs on FreeBSD. Bad things seem to happen all at once, and soon after I got a nasty throat infection, my home server motherboard died. During installation of the motherboard one of the mirrored disks of the main file storage device failed. Time to make lemonade I guess.

A few lessons here:

  • Always have RAID-1 or RAID-5/RAID-Z, even for workstations. In this case, no priceless family photos or videos were lost. For workstations, you don't lose any time from work, and can grab a replacement disk later.
  • Software RAID is flexible for commodity hardware which often does not have 1 to 1 replacements at the shop a year or so after you bought it. You can usually just connect the old drives to a new motherboard, controller or another PC and it will just work. For desktop users, Fedora Linux you can do it via GUI during installation. Hopefully Ubuntu will have it too, as I think it's a good thing if it's easy for home users.
  • The RAID-1 of most motherboards works as it should, and you can disable the RAID setting and the drive(s) will still be easily accessible as a normal drive. As per the previous point, software RAID is recommended.

Time for ZFS

http://kaeru.my/journal/images/zfs-man.jpg

The two failures, conspired to forcing this upgraded setup earlier than anticipated. FreeBSD 7.1 had problems booting up on the MSI KA70VM as a PATA drive, forcing me to do a FreeBSD 8.0 binary upgrade from CD (totally trouble free I might add). Current best bang for the buck drives are 1TB and it's painful with UFS2. With ZFS production ready on 8.0, it's time for a modern storage layout.

ZFS Man (YouTube) is a funny and informative introduction to ZFS on FreeBSD.

These resources will get you going:

Some more tips here:

RAID-Z or Mirror?

Constantin Gonzalez has written an informative blog on this.

Your options are more space for cheaper (more space/drive) in a more inflexible setup (RAID-Z) or less space, with a more flexible and faster performance mirror setup. With 6 SATA ports, and the Antec P182 case having a 4 + 2 drive cage case, it makes more sense on commodity hardware to have a mirror setup where data loss is more of a factor than space.

Here is my list on why mirror makes more sense for commodity hardware:

  • I don't need that much space. I don't have large media requirements for critical shared data. None-critical data can also sit safely on my mirrored workstation drives.
  • You need boot disks, which should be mirrored. Curently I'm using 2 x 80GB PATA drives, but this won't be feaseable in near future. So that leaves you with 4 SATA ports.
  • Another SATA port is taken up by your DVDR drive
  • So you're left with 3 slots. With this amount, it doesn't make sense to run RAID-Z for me. Especially more so with the option to have 3-way mirror and swapping up larger drives to seamlessly upgrade your mirror. That makes sense on a household budget, where it's hard to justify buying 5 disks.
  • More drives = more heat and power usage = more noise.

Since commodity drives are likely to fail anyways, I grabbed a pair of the cheapest 1TB drives available which currently are the Samsung Spinpoint F1. Performance surprisingly was not bad for these drives.

Setting it up

This part blew me away.. ZFS rocks.

I find out that my two new drives are ad0 and ad1, with atacontrol list:

ATA channel 0:
    Master:  ad0 <SAMSUNG HD103UJ/1AA01118> SATA revision 2.x
    Slave:   ad1 <SAMSUNG HD103UJ/1AA01118> SATA revision 2.x
ATA channel 1:
    Master:  ad2 <ST380023A/3.33> ATA/ATAPI revision 6
    Slave:   ad3 <Maxtor 6L250R0/BAH41G10> ATA/ATAPI revision 7
ATA channel 2:
    Master:      no device present
    Slave:       no device present
ATA channel 3:
    Master:      no device present
    Slave:       no device present
ATA channel 4:
    Master:      no device present
    Slave:       no device present
ATA channel 5:
    Master: acd0 <PIONEER DVD-RW DVR-212/1.21> SATA revision 1.x
    Slave:       no device present

So let's create our mirror pool:

zpool create data mirror ad0 ad1

That's it, data is the pool name I used and it's automatically mounted at /data (no need to mess around with fstab and such).

Let's find out our new pool status:

[kaeru@xavier ~]$ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      mirror    ONLINE       0     0     0
        ad0     ONLINE       0     0     0
        ad1     ONLINE       0     0     0

errors: No known data errors

And where it's mounted and how much space is available:

[kaeru@xavier ~]$ zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
data                  105G   808G    27K  /data
...

I've snipped some data here on some other mountpoints, hence some space is used already. This is immediately usable like any other filesytem.

Here is where some clarification is needed. The pool can act both as a device and filesystem. So by default data is the name of the pool and also the filesystem.

You can already copy files and such this /data filesystem, however everything in it will be treated as if its a single partition, so you can't do fancy stuff like set quotas, additional copies, compression and so on for subdirectories.

In order to do that, you need to create additional filesystems using the data pool:

zfs create data/jails
zfs set mountpoint=/jails data/jails

This is going to create a jails filesystem in the data pool, and automatically mount it as /jails. The mount command will show how this works:

mount
...

data/jails on /jails (zfs, local)
data on /data (zfs, NFS exported, local)

...

ls /data/jails is going to say no such file or directory, because there is no directory there. You could mkdir /data/jails if you wish but that's a directory but not the filesystem.

By default, without the mountpoint option, data/jails would have been automatically mounted as /data/jails. In the above example the difference between a filesystem and normal directory is clear. This difference is important when you export filesystems and wonder why /data is empty.

Automatic exporting of NFS/SMB shares

Exporting filesystems can now be done automatically using zfs commands:

zfs set sharenfs=on data/

This will export any "children" datasets (or filesystems) automatically like data/jails:

[kaeru@xavier ~]$ showmount -e
Exports list on localhost:
/data/videos/family                Everyone
/data/videos                       Everyone
/data/photos                       Everyone
/data/music                        Everyone
/data                              Everyone

You can set better security options of course. Back to the filesystems vs directory. If you NFS mount /data on a remote PC, you won't see /data/music or /data/photos. This is because they're not mounted in the /data filesystem(as a directory). If you want them available as /data/music on the client you'll have to mount them again, maybe as an nullfs mount on the server or as additional mounts on the client. Hierarchy here applies to datasets, not subdirectories, which work as normal POSIX filesystem. This should not be an issue in future with NFSv4 namespace support.

You can use old way of configuring /etc/exports if you want, but I like this way better, it makes sense.

Quotas

Similarly, no need to mess around with quotas anymore in fstab. One of the reasons for having jails dirs on MD disks, is a hard filesystem quota. With ZFS pools this is now no longer an issue:

xavier# zfs set quota=100GB data/jails
xavier# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
data                 97.1G   816G    27K  /data
data/jails           1.80G  98.2G    19K  /jails
data/jails/kaeru.my  1.80G  98.2G  1.80G  /jails/kaeru.my
data/music           55.6G   816G  55.6G  /data/music
data/photos          21.4G   816G  21.4G  /data/photos
data/videos          18.3G   816G    19K  /data/videos
data/videos/family   18.3G   816G  18.3G  /data/videos/family

data/jails filesystem is now limited to 100GB, and now we want to limit kaeru.my jail to 20GB:

xavier# zfs quota=20GB data/jails/kaeru.my
xavier# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
data                 98.8G   815G    27K  /data
data/jails           1.80G  98.2G    19K  /jails
data/jails/kaeru.my  1.80G  18.2G  1.80G  /jails/kaeru.my

kaeru.my jail is now limited to 20GB, whereas before it inherited jails limit of 100GB. Neat huh? Oh it's no longer UFS2 or and file backed MD disk.. no more long bgfsck's on unexpected reboots, no more double overhead of an MD file backed disk for performance.

There is a long list of other ZFS features, of which snapshots and the ability to send snapshots over pipes and ssh look the most interesting.

Some tuning needed

ZFS by default tends to eat up a lot of memory, and this can result in poor performance. After reboot, r/w performance was reduced to around 5-10MB/s after several minutes of use. I had to reduce the ZFS adaptive replacement cache (ARC) usage, to 512MB on my 4GB server.

In /boot/loader.conf:

vfs.zfs.arc_max="512M"

After this change, performance was closer to the limit of the drives and stayed there.

FreeBSD 8.0 Errata

FreeBSD 8 has a ton of new features, which will take a long time to explore. The good thing is that the performance features are immediately available such as the new scheduler. Here are some of the errata:

  • Dummynet used for bandwidth shaping seems to have some bugs, but patches are available: http://www.mail-archive.com/freebsd-ipfw@freebsd.org/msg02261.html especially the "dummynet: OUCH! pipe should have been idle!" messages.
  • Wifi setup has changed a bit, you need to setup wlan pseudo devices now.
  • jails has new functions, and command options including multiple ip's per jail, ipv6 and jails within jails and network stack virtualization.

No

No
Document Actions