Adventures with ZFS and Time Machine part 3

Well, really just adventures with ZFS this time.

About a year ago I had a disk fail—just a couple of weeks within the 5 year warranty—and when I got the replacement, ZFS dutifully resilvered the RAIDZ set. Then, recently, I had another disk (even older) fail, in another machine. And then I had another (different) disk fail in the RAIDZ disk. I was beginning to wonder what was going on (the computers are on a filtered power supply by the way).Well, to cut a long story short, it was actually the USB enclosure that failed (as far as I can tell). This led be to become acquainted with just how slow USB is.

Everyone says not to use USB for ZFS, mostly because a number of USB controllers lie about write integrity, which can cause even ZFS’s bullet-proof write system to fail. However, there are other reasons to be concerned about USB, namely performance, especially with scrub.

Given this, why use USB? Well, price and availability. A few years ago, USB + FireWire enclosures were reasonably common. Now, USB is just about all that is available unless you go to high end RAID—and I’m trying to let ZFS do the job here, rather than buy a RAID controller, so I would end up paying a lot more for something I don’t need. Of course, if I had a tower case with lots of drive bays I could just plug them in there, but I don’t. At the moment the ZFS machine is a Mac Mini, a splendid little machine, but one with FW800 and USB 2.0 only, and with its form-factor, no space for extra internal drives.

USB speed

So, after that diversion, back to my ‘discoveries’ about USB. USB 2.0 is nominally 480Mbps. Except that this is nowhere near the actual rate. The web informs me that normal USB 2.0 disk performance is 20-25MBps (160-220Mbps), or around 40% of the theoretical speed of USB 2.0. I knew that USB was slower than FireWire, but I had not realised quite how slow.

In fact, my measured speed on my current disks seems to be about 20MBps sustained. Note that this is only about 10% of the speed of even a consumer SATA disk. That has serious consequences for a number of operations

Disk Formatting

It’s quite a good idea to write zeros to every block of even a new disk. This causes the disk firmware to check every sector (and allocate spare sectors if necessary). This is also a way to check an older disk. With current disk sizes of 1, 2 and 3TB, this is going to take a long time using USB.

Time to format (hours)
1TB 2TB 3TB
USB 2.0 14.5 29 44
SATA 1.5 3 4.4

That’s also the amount of time it’s going to take to copy that much information to or from the disk.

Scrubbing/Resilvering

From the above, it’s obvious that a resilver is also going to take a very long time, although assuming that the ZFS pool is not full, it will take proportionally less time. A RAIDZ that is 75% full will take 22 hours to replace a 2TB drive.

That leads us to scrub. Normal ZFS best practice is to zpool scrub once a week for consumer grade drives, which should catch any errors while there is still time to do something about it. What scrub does is read every in-use block, on every disk in the pool, and compare it to a calculated checksum. If there is a mismatch zpool reports the error and tries to rewrite the block (which has a good chance of fixing the problem if the disk firmware remaps an ailing sector). That means, for a RAIDZ pool, a scrub will take (capacity used)/(disks in pool)/(sustained read speed). For USB 2.0, that means a long time. By comparison, SATA attached disks will be done in 1/10 the time.

There is an extra gotcha. If you are using MacZFS, there is a bug (fixed in later versions of ZFS) that restarts the scrub/resilver if you do a snapshot on the pool during said scrub/resilver. If, for instance, you take hourly snapshots…your scrub is never going to finish.

That being the case, it is a really good idea to modify any automatic scripts to check to see if a scrub or resilver is in progress and refrain from snapshotting if it is. The following Ruby fragment may prove useful.

Ruby fragment to check if a ZFS filesystem is busy

BIN_DIR = ‘/usr/sbin/’
ZPOOL_CMD = “#{BIN_DIR}zpool”

def busy?(name)
pool = name.match(‘^[^/]+’)[0]
IO.popen(“#{ZPOOL_CMD} status #{pool}”).any? { |l| l[‘scrub in progress’] or l[‘resilver in progress,’]}
end