Monday, November 18, 2013

Solaris Volume Manager (SVM): Understanding the Kernel Tunable Parameter "md:mirrored_root_flag"

Applies to:

Sun Solaris Volume Manager (SVM) - Version 11.9.0 to 11.10.0 [Release 11.0]
Solstice DiskSuite Software - Version 4.2.1 to 4.2.1 [Release 4.0]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on SPARC (32-bit)
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on x86 (32-bit)

Goal

When a system under SVM control boots, SVM consults the state database to determine the configuration and condition of metadevices. The state database is contained in replicas distributed over multiple disks.

SVM has a set of rules to determine the health of the state database itself. These rules take into account the possibility that some replicas may reside on disks with hardware problems or other operational issues. SVM requires that half of all state database replicas be available during normal operation. If more than half become unavailable, the system will panic to protect itself.

SVM requires that one more than half of all state database replicas be available at boot time, as a way to protect against the possibility of stale data. It is possible, in rare circumstances, that without this quorum rule, SVM may choose incorrectly when some replicas are broken.

Consider the following scenario. There are 3 replicas on each internal disk in a system with two internal disks. Let's call them disk1 and disk2. If disk1 experiences a transient failure, and then SVM comes along and makes a configuration change, the first attempt to write to the state database will mark the replicas on disk1 as broken. This state can only be written to the active replicas on disk2, of course.

The system is brought down, and at the next boot disk1 is working again, but disk2 has a transient failure making it inaccessible. Without a quorum rule requiring one more than half of all replicas to be healthy, the system would choose the stale database on disk1, which is all it can see. Remember that the replicas on disk1 do not "know" they are stale, and the replicas on disk2 are not available.

The purpose of this document is to explain the kernel tunable parameter md:mirrored_root_flag, which can modify the way SVM implements the quorum rule. This information is of particular interest for systems with only two internal disks available.

For a more complete discussion of the SVM state database, please consult the Solaris Volume Manager Administration Guide.

Solution

For systems with more than two disks, it is often practical to distribute state database replicas across more than two disks. Guidance on best practices is in the Solaris Volume Manager Administration Guide.

However, especially for systems with two internal disks, the quorum rule can lead to a system boot issue if one of the two disks becomes inoperable. The procedure to recover is simple, as described in Document 1010270.1 Solaris Volume Manager (SVM): Recovering From Insufficient Metadevice State Database Replicas. However, the procedure does not allow for unattended operation.

The md:mirrored_root_flag

This kernel tunable parameter is described in the Oracle Solaris Tunable Parameters Reference Manual.

This manual makes it clear that the parameter is not supported. There is no commitment about its operation, or about maintaining it in future releases. In general, it is not a good practice to carry tunable parameters from one release of Solaris to another. In this case, the parameter was first defined a number of years ago for a special purpose only, but its use has continued.

The parameter is not supported, nor is it recommended. But if you can evaluate the business need of the system and determine that the risk is acceptable, you may choose enter the following line in /etc/system.

set md:mirrored_root_flag=1

This setting overrides Solaris Volume Manager requirements for replica quorum and forces Solaris Volume Manager to start if any valid state database replicas are available.

If this parameter is enabled, the system might boot with a stale replica that inaccurately represents the system state. This situation could result in data corruption or system corruption.

No comments:

Post a Comment