Friday, November 22, 2013

Duplicate (ghost) disk entries reported by Veritas Volume Manager (VxVM) following incorrect (Solaris) bootdisk LUN removal procedure (cfgadm)

Issue



Veritas Volume Manager(VxVM) is reporting duplicate (ghost) disk entries for Veritas disk access (da) name "c1t1d0" following a recent LUN (bootdisk) replacement attempt.


Sample output

# vxdisk -eo alldgs list
<snippet>
VxVM vxconfigd ERROR V-5-1-13992 open failed on path: /dev/vx/rdmp/c1t1d0s2
VxVM vxconfigd ERROR V-5-1-13992 open failed on path: /dev/vx/rdmp/c1t1d0s2
VxVM vxconfigd ERROR V-5-1-13992 open failed on path: /dev/vx/rdmp/c1t1d0s2
DEVICE       TYPE           DISK        GROUP        STATUS    OS_NATIVE_NAME   ATTR
c1t0d0s2     auto:sliced    rootdisk     rootdg      online   c1t0d0s2         -
c1t1d0s2     auto           -            -           error                   c1t1d0s2         -
c1t1d0s2     auto:none      -            -           online invalid   c1t1d0s2         -
c1t1d0s2     auto           -            -           error                      c1t1d0s2      -
.
.
<snippet>

NOTE: Veritas disk access (da) name "c1t1d0" is reported three times by VxVM.





Environment




Solaris Sparc.


Cause




In this instance, the Solaris cfgadm (leadville) stack is reporting the access points for the removed LUN via two different WWN references.

# cfgadm -alo show_FCP_dev
Ap_Id                          Type         Receptacle   Occupant     Condition
c1                             fc-private   connected    configured   unknown
c1::500000e0139f7621,0         disk         connected    configured   failing                 <<<< reference to c1t1d0
c1::500000e013f04cb1,0         disk         connected    configured   unknown
c1::5000cca004ac2c85,0         disk         connected    configured   unknown          <<<< reference to c1t1d0


The bootdisk (c1t1d0) was removed incorrectly from the server, hence the stale disk entry recorded in the /etc/vx/disk.info file.


# cat /etc/vx/disk.info
FUJITSU%5FMAX3147FCSUN146G%5FDISKS%5F500000E013F04CB0 c1t0d0 0x4680000 0x1 c1t0d0 Disk DISKS
HITACHI%5FOPEN-V%20%20%20%20%20%20-SUN%5F05C13%5F177E c4t60d3 0x4680010 0x1 c4t60d3 TagmaStore-USP 05C13
HITACHI%5FOPEN-V%20%20%20%20%20%20-SUN%5F05C13%5F177C c4t60d1 0x4680018 0x1 c4t60d1 TagmaStore-USP 05C13
HITACHI%5FOPEN-V%20%20%20%20%20%20-SUN%5F05C13%5F177B c4t60d0 0x4680020 0x1 c4t60d0 TagmaStore-USP 05C13
HITACHI%5FOPEN-V%20%20%20%20%20%20-SUN%5F05C13%5F177D c4t60d2 0x4680028 0x1 c4t60d2 TagmaStore-USP 05C13
HITACHI%5FOPEN-V%20%20%20%20%20%20-SUN%5F05C13%5F177F c4t60d4 0x4680030 0x1 c4t60d4 TagmaStore-USP 05C13
INVALID c1t1d0 0x4680008 0x1 c1t1d0 Disk DISKS   <<<<< stale disk entry
#


Upon further investigation, c1t1d0 refers to WWN's "5000cca004ac2c85" and "500000e0139f7621".


# ls -la /dev/rdsk/c1t1d0*
<snippet>
lrwxrwxrwx   1 root     root          74 Jan 26 05:32 /dev/rdsk/c1t1d0s0 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:a,raw
lrwxrwxrwx   1 root     root          74 Jan 26 05:33 /dev/rdsk/c1t1d0s1 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:b,raw
lrwxrwxrwx   1 root     root          74 Jan 26 05:33 /dev/rdsk/c1t1d0s2 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:c,raw
lrwxrwxrwx   1 root     root          74 Jan 26 05:33 /dev/rdsk/c1t1d0s3 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:d,raw
lrwxrwxrwx   1 root     root          74 Jan 26 05:33 /dev/rdsk/c1t1d0s4 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:e,raw
lrwxrwxrwx   1 root     root          74 Jan 26 05:33 /dev/rdsk/c1t1d0s5 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w5000cca004ac2c85,0:f,raw
.

<snippet>



# ls -la /dev/rdsk/* | grep -i 7621
<snippet>
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s0 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:a,raw
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s1 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:b,raw
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s2 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:c,raw
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s3 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:d,raw
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s4 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:e,raw
lrwxrwxrwx   1 root     root          74 Jan 26 06:15 /dev/rdsk/c1t1d0s5 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0139f7621,0:f,raw
.
.
<snippet>






Solution




1.] Remove the Veritas disk access (da) name duplicated by "vxdisk list".


# vxdisk  list
<snippet>
DEVICE       TYPE           DISK        GROUP        STATUS    OS_NATIVE_NAME   ATTR
c1t0d0s2     auto:sliced    rootdisk     rootdg      online   c1t0d0s2         -
c1t1d0s2     auto           -            -           error                   c1t1d0s2         -
c1t1d0s2     auto:none      -            -           online invalid   c1t1d0s2         -
c1t1d0s2     auto           -            -           error                      c1t1d0s2      -
<snippet>


In this instance, there are three entries for the same Veritas disk access name "c1t1d0", therefore, the "vxdisk rm <da-name> " needs to be issued three times:


# vxdisk rm c1t1d0
# vxdisk rm c1t1d0
# vxdisk rm c1t1d0


2.] Confirm using the "vxdisk list" command, that VxVM no longer reports the problematic Veritas disk access name.

# vxdisk  list
<snippet>
DEVICE       TYPE           DISK        GROUP        STATUS    OS_NATIVE_NAME   ATTR
c1t0d0s2     auto:sliced    rootdisk     rootdg      online   c1t0d0s2         -
<snippet>



3.] Using the Solaris luxadm interface, mark all the paths to the disk as offline.


# luxadm -e offline /dev/rdsk/c#t#d#s2


NOTE: In this instance, there is only a single path via both WWN's to c1t1d0.


4.] Clean-up the stale OS device handles using Solaris command "devfsadm -Cvc disk".


Sample output


# devfsadm -Cvc disk
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s0
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s1
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s2
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s3
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s4
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s5
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s6
devfsadm[5935]: verbose: removing file: /dev/dsk/c1t1d0s7
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s0
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s1
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s2
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s3
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s4
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s5
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s6
devfsadm[5935]: verbose: removing file: /dev/rdsk/c1t1d0s7



5.] In this instance, the Solaris cfgadm (leadville) stack was reporting the multiple access points for the removed LUN (c1t1d0).


Refresh the cfgadm output, by running "cfgadm -alo show_FCP_dev".



# cfgadm -alo show_FCP_dev
Ap_Id                          Type         Receptacle   Occupant     Condition
c1                             fc-private   connected    configured   unknown
c1::500000e013f04cb1,0         disk         connected    configured   unknown
c1::5000cca004ac2c85,0         disk         connected    configured   unusable          <<<< reference to c1t1d0


NOTE: The previous access point for "c1::500000e0139f7621,0" is no longer listed by the cfgadm interface:


c1::500000e0139f7621,0         disk         connected    configured   failing                 <<<< did reference c1t1d0


6.]  Force Fibre Channel SAN disk rescan using the Solaris "luxadm" interface.


a.] Establish the port reference to be rescanned.


# luxadm -e port
/devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl                     CONNECTED


b.] Force the rescan using the port reference.

# luxadm -e forcelip /devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl


7.] Refresh the cfgadm (leadville) stack


#  cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             fc-private   connected    configured   unknown
c1::500000e013f04cb1           disk         connected    configured   unknown
c1::5000cca004ac2c85           disk         connected    configured   unknown    <<<<<<<  Updated from unusable to "unknown"
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok


8.] Refresh both VxVM and the /etc/vx/disk.info file contents.


# vxdisk scandisks


# vxddladm assign names   ( Introduced in 5.0 MP3 onwards)


9.] Execute "vxdisk scandisks" an additional time to ensure the duplicate (ghost) disk entries do not reappear.

# vxdisk scandisks

# vxdisk  list
<snippet>
DEVICE       TYPE           DISK        GROUP        STATUS    OS_NATIVE_NAME   ATTR
c1t0d0s2     auto:sliced    rootdisk     rootdg      online   c1t0d0s2         -
c1t1d0s2     auto:none      -            -           online invalid   c1t1d0s2         -
<snippet>


NOTE: Veritas disk access name "c1t1d0s2" contains a single line entry now, therefore it can be safely initialized for VxVM use.


Process complete

No comments:

Post a Comment