Before you replace (what you believe is) a failed Solaris Volume Manager (SVM) disk, you need to establish whether it has indeed failed or is still in the process of failing. Why is it important to determine if an SVM disk has failed? It could save you a little time replacing a failed SVM disk as opposed to a failing one.
Read How To Tell The Difference Between A Failed Disk And A Failing Disk to find out which one your disk is. If your disk hasn’t quite failed yet, this article will show you How To Replace A Failing SVM Disk.
Now that you have established that you do have a failed SVM disk, find out if the disk contains SVM metadatabase replicas and delete them. Assuming that the failed disk is c1t1d0.
# metadb | grep c1t1d0
W p l 16 8192 /dev/dsk/c1t1d0s7
W p l 8208 8192 /dev/dsk/c1t1d0s7
W p l 16400 8192 /dev/dsk/c1t1d0s7
#
# metadb -d c1t1d0s7
#
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
#Unconfigure the failed SVM disk
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#
# cfgadm -c unconfigure c1::dsk/c1t1d0
cfgadm: Component system is busy, try again: failed to offline:
Resource Information
------------------ -------------------------
/dev/dsk/c1t1d0s2 Device being used by VxVM
#Note: This host uses SVM to manage internal disks and Veritas Volume Manager (VxVM) to manage SAN attached disks. VxVM keeps track of the internal disks – even if it doesn’t actually manage them – and may not allow you to unconfigure them. To get around this restriction, you may need to forcibly unconfigure the failed SVM disk by specifying the -f parameter to cfgadm.
# cfgadm -f -c unconfigure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected unconfigured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#Verify that the failed SVM disk is marked “unconfigured” as above. Sun servers with hot-swappable disks will also have the disk’s blue “ready to remove” LED lit.
Pull the failed SVM disk out of the drive bay and insert the new disk. The following message will come up in /var/adm/messages.
Jul 20 14:46:09 eap52 rmclomv: [ID 978967 kern.error] DISK @ HDD1 has been inserted.Configure the new disk.
# cfgadm -c configure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#Verify that the new disk has been configured as above.
Copy the volume table of contents (VTOC) from the other disk in the mirror set, c1t0d0, onto the new disk.
# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
fmthard: New volume table of contents now in place.
#If prtvtoc returns with an error similar to this, “/dev/rdsk/c1t1d0s2: Cannot get disk geometry“, you will need to run format to label the disk.
# format
Searching for disks...done
c1t1d0: configured with capacity of 72.36GB
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@0,0
1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@1,0
2. c1t2d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@2,0
3. c1t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@3,0
Specify disk (enter its number): 1
selecting c1t1d0
[disk formatted]
Disk not labeled. Label it now? y
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
format> q
#Recreate the metadatabase replicas on the new disk.
# metadb -a -c 3 c1t1d0s7
#
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a u 16 8192 /dev/dsk/c1t1d0s7
a u 8208 8192 /dev/dsk/c1t1d0s7
a u 16400 8192 /dev/dsk/c1t1d0s7
#Update the new disk’s device ID entry in SVM. This step may not be required but it’s a good idea to do it just in case.
# metadevadm -u c1t1d0
Updating Solaris Volume Manager device relocation information for c1t1d0
Old device reloc information:
id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
New device reloc information:
id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#Enable the submirrors on the replacement disk. Start with the swap partition as this won’t affect any data in case SVM runs into a problem. You may enable the submirrors in the new disk in parallel or in sequence. If the I/O load on the system is heavy then do it in sequence. Otherwise, enable the submirrors in parallel.
# metareplace -e d1 c1t1d0s1
d1: device c1t1d0s1 is enabled
solaris_1# metastat d1
d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Resyncing
Resync in progress: 0 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 10491456 blocks (5.0 GB)
d11: Submirror of d1
State: Okay
Size: 10491456 blocks (5.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Okay Yes
d21: Submirror of d1
State: Resyncing
Size: 10491456 blocks (5.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Resyncing Yes
Device Relocation Information:
Device Reloc Device ID
c1t0d0 Yes id1,sd@SFUJITSU_MAW3073NCSUN72G_000707B0KHT4____DAN0P720KHT4
c1t1d0 Yes id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#SVM will resync the submirrors as soon as they are enabled. This is done in the background and may take a fair amount of time depending on the size of the submirrors. Now is a good time to go for a cup of coffee. Don’t forget to check the progress of the resync when you return.
Read How To Tell The Difference Between A Failed Disk And A Failing Disk to find out which one your disk is. If your disk hasn’t quite failed yet, this article will show you How To Replace A Failing SVM Disk.
Now that you have established that you do have a failed SVM disk, find out if the disk contains SVM metadatabase replicas and delete them. Assuming that the failed disk is c1t1d0.
# metadb | grep c1t1d0
W p l 16 8192 /dev/dsk/c1t1d0s7
W p l 8208 8192 /dev/dsk/c1t1d0s7
W p l 16400 8192 /dev/dsk/c1t1d0s7
#
# metadb -d c1t1d0s7
#
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
#Unconfigure the failed SVM disk
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#
# cfgadm -c unconfigure c1::dsk/c1t1d0
cfgadm: Component system is busy, try again: failed to offline:
Resource Information
------------------ -------------------------
/dev/dsk/c1t1d0s2 Device being used by VxVM
#Note: This host uses SVM to manage internal disks and Veritas Volume Manager (VxVM) to manage SAN attached disks. VxVM keeps track of the internal disks – even if it doesn’t actually manage them – and may not allow you to unconfigure them. To get around this restriction, you may need to forcibly unconfigure the failed SVM disk by specifying the -f parameter to cfgadm.
# cfgadm -f -c unconfigure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected unconfigured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#Verify that the failed SVM disk is marked “unconfigured” as above. Sun servers with hot-swappable disks will also have the disk’s blue “ready to remove” LED lit.
Pull the failed SVM disk out of the drive bay and insert the new disk. The following message will come up in /var/adm/messages.
Jul 20 14:46:09 eap52 rmclomv: [ID 978967 kern.error] DISK @ HDD1 has been inserted.Configure the new disk.
# cfgadm -c configure c1::dsk/c1t1d0
#
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 fc-fabric connected configured unknown
c3::5006016239a02018 disk connected configured unknown
c3::5006016b39a02018 disk connected configured unknown
c3::5006048452a70c17 disk connected configured unknown
c3::5006048c52a70c07 disk connected configured unknown
c4 fc-fabric connected configured unknown
c4::5006016339a02018 disk connected configured unknown
c4::5006016a39a02018 disk connected configured unknown
c4::5006048452a70c18 disk connected configured unknown
c4::5006048c52a70c08 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
#Verify that the new disk has been configured as above.
Copy the volume table of contents (VTOC) from the other disk in the mirror set, c1t0d0, onto the new disk.
# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
fmthard: New volume table of contents now in place.
#If prtvtoc returns with an error similar to this, “/dev/rdsk/c1t1d0s2: Cannot get disk geometry“, you will need to run format to label the disk.
# format
Searching for disks...done
c1t1d0: configured with capacity of 72.36GB
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@0,0
1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@1,0
2. c1t2d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@2,0
3. c1t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,700000/scsi@2/sd@3,0
Specify disk (enter its number): 1
selecting c1t1d0
[disk formatted]
Disk not labeled. Label it now? y
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
format> q
#Recreate the metadatabase replicas on the new disk.
# metadb -a -c 3 c1t1d0s7
#
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a u 16 8192 /dev/dsk/c1t1d0s7
a u 8208 8192 /dev/dsk/c1t1d0s7
a u 16400 8192 /dev/dsk/c1t1d0s7
#Update the new disk’s device ID entry in SVM. This step may not be required but it’s a good idea to do it just in case.
# metadevadm -u c1t1d0
Updating Solaris Volume Manager device relocation information for c1t1d0
Old device reloc information:
id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
New device reloc information:
id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#Enable the submirrors on the replacement disk. Start with the swap partition as this won’t affect any data in case SVM runs into a problem. You may enable the submirrors in the new disk in parallel or in sequence. If the I/O load on the system is heavy then do it in sequence. Otherwise, enable the submirrors in parallel.
# metareplace -e d1 c1t1d0s1
d1: device c1t1d0s1 is enabled
solaris_1# metastat d1
d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Resyncing
Resync in progress: 0 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 10491456 blocks (5.0 GB)
d11: Submirror of d1
State: Okay
Size: 10491456 blocks (5.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Okay Yes
d21: Submirror of d1
State: Resyncing
Size: 10491456 blocks (5.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Resyncing Yes
Device Relocation Information:
Device Reloc Device ID
c1t0d0 Yes id1,sd@SFUJITSU_MAW3073NCSUN72G_000707B0KHT4____DAN0P720KHT4
c1t1d0 Yes id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA
#SVM will resync the submirrors as soon as they are enabled. This is done in the background and may take a fair amount of time depending on the size of the submirrors. Now is a good time to go for a cup of coffee. Don’t forget to check the progress of the resync when you return.
No comments:
Post a Comment