veritas DG disabled due to serial split brain..
Issue
A disk
group cannot be imported because the configuration databases do not
agree on the actual and expected serial IDs (SSB's) on all the disks
within the disk group. This is a true serial split brain condition,
which Volume Manager cannot correct automatically. You must choose which
configuration database to use on a specific disk to resolve the issue
Error
# vxdg import demodg
VxVM vxdg ERROR V-5-1-10978 Disk group demodg: import failed:
Serial Split Brain detected. Run vxsplitlines to import the diskgroup
VxVM vxdg ERROR V-5-1-10978 Disk group demodg: import failed:
Serial Split Brain detected. Run vxsplitlines to import the diskgroup
Environment
All platforms
Cause
The Serial Split Brain condition arises because VERITAS Volume
Manager (tm) increments the serial ID (SSB) in the disk media record of
each imported disk in the disk group. and if the serial IDs on the disks
do not agree( due to network/disk/connection issue) with the expected
values from the configuration copies on other disks in the disk group.
Solution
Follow these steps to resolve a "Serial Split Brain" issue:
1. Check and ensure all the disk are in a deported state
3. Check the configuration from at least one disk in each pool, making sure the chosen disk has config copy=ENABLED.
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks.
# vxrecover -g dgD280silo1 -sb
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking
# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
# df /orgvol
/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk d1 dgD280silo1 online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen
pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
1. Check and ensure all the disk are in a deported state
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk - (dgD280silo1) online
c2t3d0s2 auto:cdsdisk - (dgD280silo1) online
c2t9d0s2 auto:cdsdisk - (dgD280silo1) online
2. Run vxsplitlines on the affect disk group using the "-v" option to display details
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk - (dgD280silo1) online
c2t3d0s2 auto:cdsdisk - (dgD280silo1) online
c2t9d0s2 auto:cdsdisk - (dgD280silo1) online
2. Run vxsplitlines on the affect disk group using the "-v" option to display details
# vxsplitlines -g dgD280silo1 -v <<< for diskgroup with large number of disk, there is a delay in response
VxVM. vxsplitlines NOTICE There are 2 pools
All the disks in the first pool have the same config copies
All the disks in the second pool may not have the same config copies
To see the configuration copy from a disk issue the command
All the disks in the first pool have the same config copies
All the disks in the second pool may not have the same config copies
To see the configuration copy from a disk issue the command
/etc/vx/diag.d/vxprivutil dumpconfig <private path>
To import the diskgroup with config copy from a disk issue the command
/usr/sbin/vxdg (-s) -o selectcp=<diskid> import newdg
Pool 0
DEVICE DISK DISK ID DISK PRIVATE PATH
c2t1d0s2 d1 1092974296.21.gopal /dev/vx/rdmp/c2t1d0s2
c2t2d0s2 d2 1092974302.21.gopal /dev/vx/rdmp/c2t2d0s2
c2t3d0s2 d3 1092974311.21.gopal /dev/vx/rdmp/c2t3d0s2
Pool 1
DEVICE DISK DISK ID DISK PRIVATE PATH
c2t9d0s2 d4 1092974296.21.gopal /dev/vx/rdmp/c2t9d0s2
c2t1d0s2 d1 1092974296.21.gopal /dev/vx/rdmp/c2t1d0s2
c2t2d0s2 d2 1092974302.21.gopal /dev/vx/rdmp/c2t2d0s2
c2t3d0s2 d3 1092974311.21.gopal /dev/vx/rdmp/c2t3d0s2
Pool 1
DEVICE DISK DISK ID DISK PRIVATE PATH
c2t9d0s2 d4 1092974296.21.gopal /dev/vx/rdmp/c2t9d0s2
The disk group is split so that 3 disk, d1,d2 and d3 are in pool 0 and d4 is in pool 1.
2a.Optional. These output can be verified by using vxdisk list on each disk. A summary is shown below:
2a.Optional. These output can be verified by using vxdisk list on each disk. A summary is shown below:
# vxdisk list c2t1d0s2
|
# vxdisk list c2t3d0s2
|
Device: c2t1d0s2
|
Device: c2t3d0s2
|
disk: name=d9 id=1092974296.21.gopal
|
disk: name=d3 id=1092974311.23.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
ssb: actual_seqno=0.0
|
ssb: actual_seqno=0.1
|
config copy ..........DISABLED
|
config copy ..........ENABLED
|
|
|
# vxdisk list c2t2d0s2
|
# vxdisk list c2t9d0s2
|
Device: c2t2d0s2
|
Device: c2t9d0s2
|
disk: name=d2 id=1092974302.22.gopal
|
disk: name=d4 id=1092974318.24.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
ssb: actual_seqno=0.1
|
ssb: actual_seqno=0.1
|
config copy ..........ENABLED
|
config copy ......... ENABLED
|
3. Check the configuration from at least one disk in each pool, making sure the chosen disk has config copy=ENABLED.
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t2d0s2 > dumpconfig_c2t2d0s2 from pool 0
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t9d0s2 > dumpconfig_c2t9d0s2 from pool 1
4. Using the output from the dumpconfig for each disk determines which configuration output to use by running the command:
# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht
4. Using the output from the dumpconfig for each disk determines which configuration output to use by running the command:
# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht
# cat dumpconfig_c2t9d0s2 | vxprint -D - -ht
By comparing the 2 vxprint outputs, you decide which configuration BEST matches the last time the disk group was imported.
5.
At this stage, you should have chosen a configuration from one disk to
be used to import the disk group. The following command imports the disk
group using the configuration copy from disk d2.
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks.
6. Check the status
# vxprint -qhtg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen
pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen
pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
7. Recover the volume, filesystem and mount
# vxrecover -g dgD280silo1 -sb
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking
# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
# df /orgvol
/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files
8. Final check the status of disk and configuration
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk d1 dgD280silo1 online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen
pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
Comments
Post a Comment