Friday, December 13, 2013

Systems With Solaris 10 or Solaris 11 Installed in an Oracle VM for SPARC I/O Domain Will Panic Repeatedly in Certain Setups

Description
sun4v systems with Solaris 10 or Solaris 11 installed on a Virtual Disk (VDISK) in an Oracle VM for SPARC I/O domain and providing guests with access to a physical disk, will go into a panic loop on reboot. The panic loop will be stopped only when the guests are unbound; shutting them down is not sufficient.

Occurrence
This issue can occur in the following releases:

SPARC sun4v Platform:
•Solaris 10 without patch 150840-01 or later
•Solaris 11 without SRU11.1.12.5.0 or later
Notes:

  1. Solaris 8, Solaris 9, and Solaris x86 are not affected by this issue. The sun4u and sun4us platform is not affected by this issue.

  2. This issue only occurs on SPARC sun4v systems that are partitioned using Oracle VM for SPARC.

  3. This issue does not affect the primary domain, since it cannot be installed on a VDISK.

To determine the architecture of the system, execute the following command:

   # uname -m
   sun4v
To determine if a system is partitioned using Oracle VM for SPARC, execute the following command:

   # ldm list
   NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
   primary          active     -n-cv-  UART    16    34560M   0.1%  0.0%  35d 48m
   dom1             active     -n----  5001    16    16G      0.0%  0.0%  3d 23h 12m
   dom2             active     -n--v-  5000    16    16G      0.1%  0.0%  55m
More than three lines with the state 'active' indicates that the system might be vulnerable.

Note 4: This issue only affects Solaris installed on a virtual disk (VDISK) in an Oracle VM for SPARC I/O domain and provides guests with access to physical disks.

To determine whether a domain is an I/O domain and provides guests with access to physical devices, execute the following command in the primary domain:

   # ldm list-services dom2
   ...
   VDS
   NAME        LDOM  VOLUME  OPTIONS  MPGROUP  DEVICE
   vds-dom2    dom2  disk1                    /dev/rdsk/c0t5000CCA01537C66Cd0s6
                     disk2                    /dev/rdsk/c0t5000CC301557C67Cd0s6
   ...
If output for a domain other than 'primary' contains 'VDS' and at least one of the entries contains a reference to a physical disk (starting with '/dev/rdsk/'), then it is an I/O domain providing guests with access to physical devices.

To determine whether Solaris in a given I/O domain is installed on a VDISK, execute the following commands while logged in to the Solaris instance in the domain:

For Solaris 10 and Solaris 11 installed on ZFS root:

   # mount | grep '/\ '
   / on rpool/ROOT/s12_25 read/write/setuid/devices/rstchown/dev=4750002 on Wed Dec 31 19:00:00 1969
   # zpool status rpool
   ...
             c2d0s0  ONLINE       0     0     0
   ...
   # ls -la /dev/dsk/c2d0s0
   lrwxrwxrwx   1 root     root          62 Jun 10 08:46 /dev/dsk/c2d0s0 -> ../../devices/virtual-devices@100/channel-devices@200/disk@0:aFor Solaris 10 installed on UFS:

   # mount | grep '/\ '
   / on /dev/dsk/c2d0s0 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=1d80010 on Wed Jun 19 09:03:01 2013
   # ls -la /dev/dsk/c2d0s0
   lrwxrwxrwx   1 root     root          62 Jun 10 08:46 /dev/dsk/c2d0s0 -> ../../devices/virtual-devices@100/channel-devices@200/disk@0:a
If the device path contains 'virtual-devices' then the I/O domain is installed on a VDISK.

Symptoms
If the described issue occurs, the system will panic repeatedly with the following message and top of the stack:

   panic[cpu0]/thread=10012000: recursive rw_enter, lp=1086a7a0 wwwh=10012004 thread=10012000

   Warning - stack not written to the dumpbuf
   000000001000d100 platsvc:mdeg_register+3c (c4006b7f8ab8, 70515288, 7bef05f0, c4006a7ec750, c4006a7ec768, 1086a400)
     %l0-3: 0000c4006b7f8ab8 0000000000000004 000000001086a7a0 000000000000000c
     %l4-7: 0000000000000003 0000000000000003 0000000000000004 00000000107d2000
   000000001000d1b0 vpci:vpci_do_attach+108 (4000d2a0c00, 70515000, 0, 70515380, 7bef4340, c4006a7ec750)
     %l0-3: 000000007bef4000 0000000000000000 0000c4006b7f8ab8 0000c4006a181388
     %l4-7: 000000007bef0400 0000000070515000 000000007bef4378 0000000000000000
   000000001000d260 vpci:vpci_attach+2c (4000d2a0c00, 0, 0, 0, 12d4ab0, 0)
     %l0-3: feedfacefeedface 00000000107c4c00 0000000000000001 0000000000000000
     %l4-7: 0000000000000000 00000000feedf800 0000000000000000 000000000000000e
   000000001000d310 genunix:devi_attach+c0 (4000d2a0c00, 1081fc00, 1000d3c0, 0, 0, 7bef0bb0)Workaround
To prevent this issue from occurring, add the following line to '/etc/system':

   forceload: drv/vpci
If the panic loop was already triggered, it is necessary to stop and unbind all of the guests provided with the services by the I/O domain to stop it.

Note: Remember that it is necessary to update the boot-archive after adding entries to the '/etc/system' file. This can be done by using the following command:

   # bootadm update-archiveThis issue is resolved in the following releases:

SPARC sun4v Platform:
•Solaris 10 with patch 150840-01 or later
•Solaris 11 with SRU11.1.12.5.0 or later

No comments:

Post a Comment