veritas DG disabled due to serial split brain..

Issue

A disk group cannot be imported because the configuration databases do not agree on the actual and expected serial IDs (SSB's) on all the disks within the disk group. This is a true serial split brain condition, which Volume Manager cannot correct automatically. You must choose which configuration database to use on a specific disk to resolve the issue

Error

vxdg import demodg
VxVM vxdg ERROR V-5-1-10978 Disk group demodg: import failed:
Serial Split Brain detected. Run vxsplitlines to import the diskgroup
 

Environment

All platforms

Cause

The Serial Split Brain condition arises because VERITAS Volume Manager (tm) increments the serial ID (SSB) in the disk media record of each imported disk in the disk group. and if the serial IDs on the disks do not agree( due to network/disk/connection issue) with the expected values from the configuration copies on other disks in the disk group. 

Solution

Follow these steps to resolve a "Serial Split Brain" issue:

1. Check and ensure all the disk are in a  deported state

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
c2t1d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t2d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t3d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t9d0s2     auto:cdsdisk    -           (dgD280silo1) online

2. Run vxsplitlines on the affect disk group using the "-v" option to  display details
# vxsplitlines -g dgD280silo1 -v          <<<  for diskgroup with large number of disk, there is a delay in response
VxVM. vxsplitlines NOTICE  There are 2 pools
All the disks in the first pool have the same config copies
All the disks in the second pool may not have the same config copies
To see the configuration copy from a disk issue the command
/etc/vx/diag.d/vxprivutil dumpconfig <private path>
To import the diskgroup with config copy from a disk issue the command
/usr/sbin/vxdg (-s) -o selectcp=<diskid> import newdg
Pool 0
DEVICE    DISK  DISK ID                          DISK PRIVATE PATH
c2t1d0s2   d1     1092974296.21.gopal    /dev/vx/rdmp/c2t1d0s2
c2t2d0s2   d2     1092974302.21.gopal    /dev/vx/rdmp/c2t2d0s2
c2t3d0s2   d3     1092974311.21.gopal    /dev/vx/rdmp/c2t3d0s2

Pool 1
DEVICE    DISK  DISK ID                          DISK PRIVATE PATH
c2t9d0s2   d4     1092974296.21.gopal    /dev/vx/rdmp/c2t9d0s2
The disk group is split so that 3 disk, d1,d2 and d3 are in pool 0  and d4 is in pool 1.

2a.Optional. These output can be verified by using vxdisk list on each disk. A summary is shown below:

 
# vxdisk list c2t1d0s2
 
# vxdisk list c2t3d0s2
 
Device:    c2t1d0s2
Device:    c2t3d0s2
 
disk:      name=d9 id=1092974296.21.gopal
 
disk:      name=d3 id=1092974311.23.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
ssb:       actual_seqno=0.0
 
ssb:       actual_seqno=0.1
 
 config copy ..........DISABLED
  config copy ..........ENABLED

 

 
# vxdisk list c2t2d0s2
 
#  vxdisk list c2t9d0s2
 
Device:    c2t2d0s2
 
Device:    c2t9d0s2
 
disk:      name=d2 id=1092974302.22.gopal
 
disk:      name=d4 id=1092974318.24.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
ssb:       actual_seqno=0.1
 
ssb:       actual_seqno=0.1
 
  config copy ..........ENABLED
  config copy ......... ENABLED


3. Check the configuration from at least one disk in each pool, making sure the chosen disk has config copy=ENABLED.
 
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t2d0s2 > dumpconfig_c2t2d0s2   from pool 0
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t9d0s2 > dumpconfig_c2t9d0s2   from pool 1

4. Using the output from the dumpconfig for each disk determines which configuration output to use by running the command:

# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht  
# cat dumpconfig_c2t9d0s2 | vxprint -D - -ht  
By comparing the 2 vxprint outputs, you decide which configuration BEST matches the last time the disk group was imported.
 
5. At this stage, you should have chosen a configuration from one disk to be used to import the disk group. The following command imports the disk group using the configuration copy from disk d2.
 
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1

Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks.  
 
6. Check the status
# vxprint -qhtg dgD280silo1
dg dgD280silo1  default      default  26000    1095738111.20.gopal
 
dm d1           c2t1d0s2     auto     2048     35838448 -
dm d2           c2t2d0s2     auto     2048     35838448 -
dm d3           c2t3d0s2     auto     2048     35838448 -
dm d4           c2t9d0s2     auto     2048     35838448 -

v  SNAP-vol_db2silo1.1 -     DISABLED ACTIVE   1024000  SELECT    -        fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01        SNAP-vol_db2silo1.1-01 d3 0    512000   0/0       c2t3d0   ENA
sd d4-01        SNAP-vol_db2silo1.1-01 d4 0    512000   1/0       c2t9d0   ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v  SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE   544      SELECT    -        gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02        SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0        c2t3d0   ENA

v  orgvol       -            DISABLED ACTIVE   1024000  SELECT    -        fsgen
pl orgvol-01    orgvol       DISABLED ACTIVE   1024000  STRIPE    2/128    RW
sd d1-01        orgvol-01    d1       0        512000   0/0       c2t1d0   ENA
sd d2-01        orgvol-01    d2       0        512000   1/0       c2t2d0   ENA
 
7. Recover the volume, filesystem and mount

# vxrecover -g dgD280silo1 -sb

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking

# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol

# df /orgvol
/orgvol            (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks   127386 files
 
8. Final check the status of disk and configuration

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
c2t1d0s2     auto:cdsdisk    d1           dgD280silo1  online
c2t2d0s2     auto:cdsdisk    d2           dgD280silo1  online
c2t3d0s2     auto:cdsdisk    d3           dgD280silo1  online
c2t9d0s2     auto:cdsdisk    d4           dgD280silo1  online

# vxprint -htg dgD280silo1
dg dgD280silo1  default      default  26000    1095738111.20.gopal

dm d1           c2t1d0s2     auto     2048     35838448 -
dm d2           c2t2d0s2     auto     2048     35838448 -
dm d3           c2t3d0s2     auto     2048     35838448 -
dm d4           c2t9d0s2     auto     2048     35838448 -

v  SNAP-vol_db2silo1.1 -     ENABLED  ACTIVE   1024000  SELECT    SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01        SNAP-vol_db2silo1.1-01 d3 0    512000   0/0       c2t3d0   ENA
sd d4-01        SNAP-vol_db2silo1.1-01 d4 0    512000   1/0       c2t9d0   ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v  SNAP-vol_db2silo1.1_dcl - ENABLED  ACTIVE   544      SELECT    -        gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02        SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0        c2t3d0   ENA

v  orgvol       -            ENABLED  ACTIVE   1024000  SELECT    orgvol-01 fsgen
pl orgvol-01    orgvol       ENABLED  ACTIVE   1024000  STRIPE    2/128    RW
sd d1-01        orgvol-01    d1       0        512000   0/0       c2t1d0   ENA
sd d2-01        orgvol-01    d2       0        512000   1/0       c2t2d0   ENA

Comments

Popular posts from this blog

BMCs and the IPMI Protocol

Logical Domains Reference Manual

Understanding How ZFS Calculates Used Space