Saturday, August 15, 2009

Resilvering

The process of replacing a drive can take an extended period of time, depending on the size of the drive and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering, and can be monitored by using the zpool status command.

Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:

*

ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds, rather than resilvering the entire disk, or complicating matters with “dirty region” logging that some volume managers support. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-Gbyte disk can take seconds if only a few gigabytes of used space is in the pool.
*

Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.

To view the resilvering process, use the zpool status command. For example:

# zpool status tank
pool: tank
state: DEGRADED
reason: One or more devices is being resilvered.
action: Wait for the resilvering process to complete.
see: http://www.sun.com/msg/ZFS-XXXX-08
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
replacing DEGRADED 0 0 0 52% resilvered
c1t0d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0

In this example, the disk c1t0d0 is being replaced by c2t0d0. This event is observed in the status output by presence of the replacing virtual device in the configuration. This device is not real, nor is it possible for you to create a pool by using this virtual device type. The purpose of this device is solely to display the resilvering process, and to identify exactly which device is being replaced.

Note that any pool currently undergoing resilvering is placed in the DEGRADED state, because the pool cannot provide the desired level of redundancy until the resilvering process is complete. Resilvering proceeds as fast as possible, though the I/O is always scheduled with a lower priority than user-requested I/O, to minimize impact on the system. Once the resilvering is complete, the configuration reverts to the new, complete, configuration. For example:

# zpool status tank
pool: tank
state: ONLINE
scrub: scrub completed with 0 errors on Thu Aug 31 11:20:18 2006
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0

errors: No known data errors

The pool is once again ONLINE, and the original bad disk (c1t0d0) has been removed from the configuration.