Replacing a failed drive in a raidz2 ZFS setup
This blog post details how to replace a broken drive in the mfsBSD, FreeBSD 9.0 raidz2 ZFS setup discussed earlier. The process is relatively straightforward, but can be tricky if you never did it before.
Symptoms
When the drive failed, controller and OS handled it gracefully:
backup kernel: (da0:mps0:0:0:0): SCSI command timeout on device handle 0x000c SMID 906 backup kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 state c xfer 0 backup last message repeated 4 times backup kernel: mps0: mpssas_abort_complete: abort request on handle 0x0c SMID 906 complete backup kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 state 0 xfer 0 backup kernel: mps0: mpssas_remove_complete on target 0x0000, IOCStatus= 0x8 backup kernel: (da0:mps0:0:GEOM_MIRROR0:: 0): lost device - 0 outstanding backup kernel: Request failed (error=22). da0p2[WRITE(offset=1180434432, length=131072)] backup kernel: GEOM_MIRROR: Device primaryswap: provider da0p2 disconnected. backup kernel: (da0:mps0:0:0:0): removing device entry
ZFS status shows the degraded pool:
[root@backup ~]# zpool status pool: tank state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors
Replacing the drive
Since the backplane supports hot swapping, connecting the new drive is very convenient.
da0 at mps0 bus 0 scbus0 target 0 lun 0 da0: <SEAGATE ST31000424SS 0006> Fixed Direct Access SCSI-5 device da0: 600.000MB/s transfers da0: Command Queueing enabled da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
Rebuilding the system requires the following steps:
- Partition the new drive using gpart
- Rebuild the gmirror swap container
- Replace device in ZFS pool and resilver
- Reinstall boot code
Partition the new drive using gpart
The original setup has been done using mfsBSD, using a swap partition size of 16GB. In case one is not certain how to partition the drive, just check the setup of another drive in the pool, in this case e.g.:
[root@backup ~]# gpart list da1 Geom name: da1 modified: false state: OK fwheads: 255 fwsectors: 63 last: 1953525134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da1p1 Mediasize: 65536 (64k) Sectorsize: 512 Stripesize: 0 Stripeoffset: 17408 Mode: r0w0e0 rawuuid: fa3ef576-83ed-11e1-bdd5-001517783d80 rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f label: (null) length: 65536 offset: 17408 type: freebsd-boot index: 1 end: 161 start: 34 2. Name: da1p2 Mediasize: 17179869184 (16G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 82944 Mode: r1w1e1 rawuuid: fa45c1b1-83ed-11e1-bdd5-001517783d80 rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: swap1 length: 17179869184 offset: 82944 type: freebsd-swap index: 2 end: 33554593 start: 162 3. Name: da1p3 Mediasize: 983024916992 (915G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 82944 Mode: r1w1e1 rawuuid: fa4f7605-83ed-11e1-bdd5-001517783d80 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk1 length: 983024916992 offset: 17179952128 type: freebsd-zfs index: 3 end: 1953525134 start: 33554594 Consumers: 1. Name: da1 Mediasize: 1000204886016 (931G) Sectorsize: 512 Mode: r2w2e4
Using this information we can partition the new drive:
gpart create -s GPT da0
gpart add -t freebsd-boot -s 128 da0
gpart add -t freebsd-swap -s 16G -l swap0 da0
gpart add -t freebsd-zfs -l disk0 da0
dd if=/dev/zero of=/dev/da0p2 bs=512 count=560
dd if=/dev/zero of=/dev/da0p3 bs=512 count=560
(dd is to make sure old gmirror/ZFS meta data is removed).
Rebuild the gmirror swap container
It's important to make gmirror forget the missing disk, so it can be replaced. Then the new partition is inserted at position 0 (where it used to be before).
gmirror forget primaryswap
gmirror insert -p 0 primaryswap /dev/da0p2
You can check the status any time during the process:
[root@backup ~]# gmirror status Name Status Components mirror/primaryswap DEGRADED da2p2 (ACTIVE) da4p2 (ACTIVE) da6p2 (ACTIVE) da0p2 (SYNCHRONIZING, 0%) mirror/secondaryswap COMPLETE da1p2 (ACTIVE) da3p2 (ACTIVE) da5p2 (ACTIVE) da7p2 (ACTIVE)
Once it's done the mirror is back to its normal state:
[root@backup ~]# gmirror status Name Status Components mirror/primaryswap COMPLETE da2p2 (ACTIVE) da4p2 (ACTIVE) da6p2 (ACTIVE) da0p2 (ACTIVE) mirror/secondaryswap COMPLETE da1p2 (ACTIVE) da3p2 (ACTIVE) da5p2 (ACTIVE) da7p2 (ACTIVE)
Replace device in ZFS pool and resilver
The final step is to replace the drive in the ZFS pool (which will start resilvering automatically). Please keep in mind that resilvering is I/O intensive. In my case it took a long time and slowed down the disk subsystem's performance quite dramatically. This was primarily because of the large number of small files on the system. There are ways to throttle the resilvering process, which I might try next time - namely vfs.zfs.scrub_limit, which can be set in /boot/loader.conf.
The procedure for replacing the drive is straightforward:
zpool replace tank da0p3
You can check the status of the resilver operation anytime using zpool status:
[root@backup ~]# zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 11.7M scanned out of 908G at 353K/s, (scan is slow, no estimated time) 1.31M resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors
Depending on your setup this can take a substantial amount of time.
Reinstall boot code
This is easy to miss, so make sure it's done:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
Conclusion
Replacing a drive is not completely plug and play, but definitely not rocket science either.