Fix for Time Machine “Backup verification incomplete!”

I was getting some issues with Time Machine, where, after backing up, it would always attempt a verification, and then complain that the verification was incomplete. These messages were only visible in the system log—there was no alert popped up.

Sep  9 08:37:30 XXXXX.local com.apple.backupd[72558]: Verifying backup disk image.
Sep  9 08:37:34 XXXXX.local com.apple.backupd[72558]: Backup verification incomplete!

The backup is to an OS X Server, and the TM backup is kept in a disk image, called Machine-Name.sparsebundle.

I thought that I would manually verify the backup and opened up Disk Utility. There I noticed that the Machine-Name.sparesebundle was already in the sidebar, even though the remove volume was not mounted. I then noticed that the path to the volume was

/Root Volume Name/private/var/db/com.apple.backupd.backupverification/Machine-Name.sparsebundle

Aha!

Closer investigation showed that /private/var/db/com.apple.backupd.backupverification contained a slightly modified copy of an out of date version of the Time Machine disk image.

% sudo ls -lhFG /private/var/db/com.apple.backupd.backupVerification/Machine-Name.sparsebundle/
total 24
-rw-r--r--  1 root  wheel    11B  5 Jul 22:18 HC.progress.txt
-rw-r--r--  1 root  wheel   500B  5 Jul 22:15 Info.bckup
-rw-r--r--  1 root  wheel   500B  5 Jul 22:15 Info.plist
drwxr-xr-x  3 root  wheel   102B  5 Jul 23:49 bands/
-rw-r--r--  1 root  wheel     0B  5 Jul 22:15 token

Looking closely at this revealed that the disk image was corrupted, so I simply removed it

% sudo rm -rf /private/var/db/com.apple.backupd.backupVerification/Machine-Name.sparsebundle

and then did a manual verification (hold down option while selecting the Time Machine menu and choose Verify Backups). When the verification finished (successfully), the directory was left empty.

So, in summary, if you end up with the “Backup verification incomplete!” message, try deleting the /private/var/db/com.apple.backupd.backupverification/Machine-Name.sparsebundle and see if that fixes it.

Adventures with ZFS and Time Machine part 4

This should really be part 5, but part 4 was uneventful enough at the time that I didn’t blog it.

The MacPro upgrade

The main family computer used to be a first generation Mac Pro (model MacPro1,1). This did splendid service for many years, until Mountain Lion (OS X 10.8) came out. The MacPro1,1 was one of a number of models stuck on Lion (OS X 10.7). That was OK, but the machine was also starting to need more expansion than I could afford. Where 5GB had once seemed fine, it was now starting to be a little tight. Alas, the MacPro1,1 uses special FB-DIMM (DDR2) which is eye-wateringly expensive compared to modern RAM pricing. So, eventually, a shiny 27” iMac was purchased and the MacPro became the Time Machine server.

That was effortless, and worked perfectly. Given its internal SATA bays, it was much easier to fill it up with 3 disks (plus a forth for the operating system) which worked a lot faster than the previous FireWire/USB setup

O3X

(Or part 3.5)

The new server worked fine, with only minor annoyances from MacZFS not being actively developed and the odd interaction with AFP and MacZFS.

Then, along came the OpenZFS project, and O3X in particular. I tried out O3X on the iMac (with Mavericks OS X 10.9)  and it worked well. It promised to be faster; to have modern ZFS features; to play nicely with AFP; and it seemed to be stable enough to use for the backups. So, I decided to transition the Time Machine server to O3X.

This has proved to be a little more exciting than I expected.

O3X on Lion

The current (as of writing this blog) version of O3X is 1.2.7. This works on OS X versions up to 10.9—and beyond to the as yet unreleased Yosemite (10.10). However, it requires a 64-bit kernel. The last and only version of O3X that supports a 32-bit kernel is 1.2.0—and that lacks enough features and bug fixes that I’m not happy using it.

While Lion supports a 64-bit kernel (so O3X will happily run under Lion), the MacPro1,1 won’t run a 64-bit kernel. Well, not without some hacking.

Hackintosh

EFI

The MacPro1,1 has a perfectly fine 64-bit processor. However, its EFI firmware that allows it to boot is only 32 bits, so although Lion will fully support  64-bit apps, the kernel itself, along with its low-level drivers, is only 32-bit on 32-bit EFI machines. Since ZFS is a kernel extension, either I need a 32-bit O3X (1.2.0) or a 64-bit OS X kernel.

NVIDIA GeForce 7300 GT

The MacPro1,1 shipped with a number of video card options. The base (and the card I have) is the NVIDIA GeForce 7300 GT.

This card lacks the horsepower to support some of the heavy-lifting that Mountain Lion and after offloaded to the GPU, so is unsupported by 10.8 and beyond, even in an otherwise supported Mac. However it also lacks 64-bit firmware, and thus the ability to fully function in 64-bit mode.

boot.efi

The hacker community has come up with a solution to the EFI problem, in the person of Tiamo and a shim boot.efi that translates between 32 and 64 bit calls. This allows a MacPro1,1 to boot and run Mavericks and even Yosemite

That leaves the problem of the GeForce 7300 GT. Either replace it, or try running using it.

Replacement video card

Although there were a number of cards compatible with the MacPro1,1 available from Apple at one time, they are no longer available except second-hand. There are several modern cards that the community attests work fine, but they tend to be:

  • High-end (and expensive)
  • 64-bit EFI only, so no graphics during boot, until the login screen
  • Reflashed PC cards

Live with the GeForce 7300 GT

Posters to the forums indicate that the GeForce 7300 GT does work, but very slowly, and with a lot of flickering. However, since this mostly runs as a headless server, that may be just fine. I’m not going to be playing games or doing any graphical work on it apart from installing the software.

So, I determined to hackintosh my MacPro1,1 and live with the stock GeForce 7300 GT.

Background

  • The Time Machine server runs purely as a backup server. It contains no original data itself, so that if it is lost, all we’ve lost is the backups. Various family machines back up using Time Machine to the server (into sparsebundle disk images).
  • There are three disks, configured as RAIDZ.
  • A LaunchDaemon job snapshots each day and thins snapshots to the last 7 days and then one per month. This has proved useful to preserve ancient history when Time Machine has decided that a disk is corrupt and wants to start anew.
  • Another job rsyncs the mail from a mail server running OS X Server to ZFS, to a file system, which is snapshotted hourly and thinned to hourly, 7 days then one per month.
  • Finally, BOINC runs in the background to make good use of any spare cycles.

Step 1: backup the backup

Firstly, since I wanted to allow for 4k blocks (the disks are 4k, but the MacZFS pool was not set up with ashift=12) I needed to back up the existing 2TB of data before erasing the current disks and starting them up on O3X.

Since there are no spare disk slots, I needed to back up to an external disk. Fortunately, I have a spare 2TB USB 3.0 drive. Unfortunately, the MacPro1,1 has only USB 2.0.

So, I attached the disk to the iMac (which has USB 3.), and copied the data across over 1Gbps Ethernet.

No so fast

MacZFS doesn’t supports neither zfs send –R, nor piping directly between zfs send and zfs recv, so I had to produce a script that iterated through the file systems and snapshots and copied between systems using a FIFO.

USB strikes again

The transfer being anything but instant (several days worth in fact), the iMac went to sleep, the external USB drive followed suit, and the copy was corrupted beyond ZFS ability to correct.

So, I bought a second 2TB drive, configured it in a ZFS mirror—and also prevented the iMac from sleeping.

gzip and dedup

The amount of data to be backed up was perilously close to the 2TB capacity of the mirror set, so I set compression to gzip and also turned on dedup. As it happened, dedup only saved me a 100GB or so, but that was probably enough to let the copy finish, where it might not otherwise have done so.

It took several days.

Step 2: repartition the proto-Hackintosh

Since I want a fall back position in case the Hackintosh experiment fails, I decided to partition the system disk into two: one with Lion and one with the hacked boot loader and Mavericks.

Disk Utility does this nicely, and non-destructively.

Or not

Disk Utility started the partition, and then stuck at about 50%. A couple of hours later, it’s still stuck. Oops. What happens if I interrupt it I wonder? The more observant among you have realised that Step 3 should have been to backup the system disk and are laughing at my foolishness.

I stop it. I repair the disk. I attempt a reboot.

It works.

Always Have Two Backups

Fortunately, I have spare space on the iMac, and a copy of Carbon Copy Cloner, so I backup everything to there. (Time passes)

Then, since I also have a Mac Mini running OS X Server, I enable Time Machine and backup again, using Time Machine. (Somewhat more time passes).

Repartition

I boot into the Recovery Partition and ask to repartition into two equal disks. Which it won’t do, since it can’t unmount the entire disk. Sigh. Time for the USB key then.

Step 3: Create an Install Drive on a USB key

The instructions from Tiamo on how to do this are somewhat concise. In particular,

3.insert your board-id into OSInstall.mpkg(please google it)

is way more complicated than it seems, since I have to find the Flat Package Editor. I found a video that helps. Suffice it to say that I managed to create OSIinstall.mpkg with the appropriate board-id  board-id for the MacPro1,1 is Mac-F4208DC8, found from the following command

% ioreg -p IODeviceTree -r -n / -d 1

Eventually, I have a bootable USB drive.

Step 4: Repartition (again)

The MacPro1,1 happily boots off the USB key and lets me repartition the disk. After which I have two empty partitions, and no way to boot off the hard drive until I’ve got some sort of O/S back there.

Step 4: Restore Lion from Backup

Time Machine

The Recovery system will happily restore off a Time Machine backup. I point it at the one I just made.

Nope, it finds something wrong with it. Sigh.

I am so very glad I have two backups.

Carbon Copy Cloner

  • I power off the MacPro.
  • I extract the newly partitioned but otherwise empty drive and put it into an external (USB 3) enclosure.
  • I attach it to the iMac
  • I restore using Carbon Copy Cloner.

At this point I discover that I forgot to save the recovery partition when I cloned the drive! Carbon Copy Cloner is happy to create me one, but only using Mavericks since that’s what the iMac runs.

  • Unmount and power off the external drive
  • Remove the disk and replace it in the MacPro1,1, losing one of the four mounting screws in the process. (The last step is purely optional, but somehow seems appropriate. I found the screw on the floor later and replaced it).

Step 5: Boot into the newly restored Lion

Which works, much to my delight. I can at least resume using MacZFS where I left off, if the Hackintosh install fails.

Step 6: Download Lion from the App Store

In order to get a Lion recovery partition, I download it again from the App Store (which is how I upgraded in the first place, so I don’t need to purchase it again). I squirrel it away so I’ll have a copy after it deletes itself (which it does during an install).

Step 7: Re-install Lion

Alarming as this sounds, all it does is create a Lion Recovery partition.

Step 8: Clone Lion to the new partition

More time passes. This is actually slower than copying via USB, since the head is having to shuttle all over the disk.

Step 9: Reboot off the USB Key and install Mavericks

More time passes. Much more time (this process has taken several days so far). The install gets to the migration bit, where the log tells me it is migrating applications. And there it stops.

Step 10: Reboot into Mavericks

Hah! Fooled you there. Regardless of which partition I select, it boots into Lion.

I try reinstalling off the USB key and it refuses to do anything other than boot into Lion.

Step 11: Zap Mavericks partition

I’ll do a clean install and use Migration assistant afterwards.

Step 12: Recreate the USB key

Actually, I do this a couple of times, using various recipes found on the net, trying to find one that works. Eventually I succeed.

Step 13: Install Mavericks

And this time, it works! I resist any inclination to interrupt the process, and it does indeed finish cleanly and reboot. Into Mavericks! Just to be sure, I verify that it will also boot into Lion.

Step 14: Migration Assistant

Mavericks boots and asks me if I want to migrate data. Which I do and it does. It takes its time, but it works just fine. During this, the GeForce 7300 GT seems to perform just fine.

Step 15: Mavericks

And at last, I have a Hackintosh MacPro1,1 running OS X 10.9.4.

I update assorted apps, remove MacZFS and install O3X.

I’m using Screen Sharing and it is really slow. The mouse is quite decoupled at times which makes opening windows challenging. However, ssh and Terminal sessions are fine. System Profiler reports that the graphics card has 3MB of memory (Lion reports 256MB).

Step 16: OS X Server

Since I purchased OS X Server, I install it on the Hackintosh. Without event, and now I have a snappy GUI, at least for Server functionality.

Step 17: O3X

More leaps of faith. I don’t upgrade my pool, but rather reformat using O3X and then restore from the backs USB external drive mirror set.

Alas, the MacPro1,1 only has USB 2.0, so the restore takes a long time. However, at least I can use pipe and a single command per filesystem!

# zfs send –R Backup/Snapshot | zfs receive Target/Filesystem

And it works.

 


Ongoing

Slow Screen

I stopped using Screen Sharing, and everything sped up tremendously.

It seems that the GeForce 7300 GT doesn’t play well with Screen Sharing under Mavericks.

It does mean that need a screen and keyboard to boot.

Turn off Spotlight

A couple of times I found things locked, including Terminal, after a zpool or zfs command. A reboot was necessary to clear it. Then I remembered the instructions to turn off Spotlight, so a sudo mdutil -i off pool/fs or several later, and all is working well.

Summary

It seems to be working well. Our mail is being backed up, Time Machine is backing up, and this time it’s running through OS X Server. I am happy.

Adventures with ZFS and Time Machine part 3

Well, really just adventures with ZFS this time.

About a year ago I had a disk fail—just a couple of weeks within the 5 year warranty—and when I got the replacement, ZFS dutifully resilvered the RAIDZ set. Then, recently, I had another disk (even older) fail, in another machine. And then I had another (different) disk fail in the RAIDZ disk. I was beginning to wonder what was going on (the computers are on a filtered power supply by the way).Well, to cut a long story short, it was actually the USB enclosure that failed (as far as I can tell). This led be to become acquainted with just how slow USB is.

Everyone says not to use USB for ZFS, mostly because a number of USB controllers lie about write integrity, which can cause even ZFS’s bullet-proof write system to fail. However, there are other reasons to be concerned about USB, namely performance, especially with scrub.

Given this, why use USB? Well, price and availability. A few years ago, USB + FireWire enclosures were reasonably common. Now, USB is just about all that is available unless you go to high end RAID—and I’m trying to let ZFS do the job here, rather than buy a RAID controller, so I would end up paying a lot more for something I don’t need. Of course, if I had a tower case with lots of drive bays I could just plug them in there, but I don’t. At the moment the ZFS machine is a Mac Mini, a splendid little machine, but one with FW800 and USB 2.0 only, and with its form-factor, no space for extra internal drives.

USB speed

So, after that diversion, back to my ‘discoveries’ about USB. USB 2.0 is nominally 480Mbps. Except that this is nowhere near the actual rate. The web informs me that normal USB 2.0 disk performance is 20-25MBps (160-220Mbps), or around 40% of the theoretical speed of USB 2.0. I knew that USB was slower than FireWire, but I had not realised quite how slow.

In fact, my measured speed on my current disks seems to be about 20MBps sustained. Note that this is only about 10% of the speed of even a consumer SATA disk. That has serious consequences for a number of operations

Disk Formatting

It’s quite a good idea to write zeros to every block of even a new disk. This causes the disk firmware to check every sector (and allocate spare sectors if necessary). This is also a way to check an older disk. With current disk sizes of 1, 2 and 3TB, this is going to take a long time using USB.

Time to format (hours)
1TB 2TB 3TB
USB 2.0 14.5 29 44
SATA 1.5 3 4.4

That’s also the amount of time it’s going to take to copy that much information to or from the disk.

Scrubbing/Resilvering

From the above, it’s obvious that a resilver is also going to take a very long time, although assuming that the ZFS pool is not full, it will take proportionally less time. A RAIDZ that is 75% full will take 22 hours to replace a 2TB drive.

That leads us to scrub. Normal ZFS best practice is to zpool scrub once a week for consumer grade drives, which should catch any errors while there is still time to do something about it. What scrub does is read every in-use block, on every disk in the pool, and compare it to a calculated checksum. If there is a mismatch zpool reports the error and tries to rewrite the block (which has a good chance of fixing the problem if the disk firmware remaps an ailing sector). That means, for a RAIDZ pool, a scrub will take (capacity used)/(disks in pool)/(sustained read speed). For USB 2.0, that means a long time. By comparison, SATA attached disks will be done in 1/10 the time.

There is an extra gotcha. If you are using MacZFS, there is a bug (fixed in later versions of ZFS) that restarts the scrub/resilver if you do a snapshot on the pool during said scrub/resilver. If, for instance, you take hourly snapshots…your scrub is never going to finish.

That being the case, it is a really good idea to modify any automatic scripts to check to see if a scrub or resilver is in progress and refrain from snapshotting if it is. The following Ruby fragment may prove useful.

Ruby fragment to check if a ZFS filesystem is busy

BIN_DIR = ‘/usr/sbin/’
ZPOOL_CMD = “#{BIN_DIR}zpool”

def busy?(name)
pool = name.match(‘^[^/]+’)[0]
IO.popen(“#{ZPOOL_CMD} status #{pool}”).any? { |l| l[‘scrub in progress’] or l[‘resilver in progress,’]}
end