Friday, December 13, 2013

On Solaris 10/11 Systems With Read-Modify-Write (RMW) Enabled, Unaligned SD/SSD Disk I/Os May Cause Data Integrity Issues

Description
When Read-Modify-Write (RMW) is enabled on a Solaris 10 or Solaris 11 system, unaligned sd(7D)or ssd(7D) disk I/Os may cause data loss and/or other data integrity issues.

Note: Read-Modify-Write is used only if the alignment of the I/O workload is different than that of the physical device, eg: the filesystem is doing 512byte aligned I/O and the device supports 4K I/O.

This issue is only triggered if an I/O workload is modifying the same data block (in the >2TB Logical Block Address (LBA) range) before the previous one has been committed to disk (ie: sd/ssd has sent or is in the process of sending the I/O to the storage) and has not received the acknowledgment.

Occurrence
This issue can occur in the following releases:

SPARC Platform

•Solaris 10 with patch 147440-18 through 147440-26 and without patch 147440-27
•Solaris 11 with SRU 11/11 SRU10 or later
•Solaris 11.1 without SRU 3.4
x86 Platform

•Solaris 10 with patch 147441-18 through 147441-26 and without patch 147441-27
•Solaris 11 with SRU 11/11 SRU10 or later
•Solaris 11.1 without SRU 3.4
Notes:

This issue only occurs if all of the following conditions are met:

- Any filesystem that resides on a partition that starts/covers beyond 2TB (ie: System has a zpool(1M) or UFS/VxFS filesystem with more than 2TB of data). Other filesystems may also be  affected by this issue.

- Physical block size is different from file system's I/O block size

- System has the RMW (Read-Modify-Write) feature enabled ("emulation-rmw" property set to 1) in either '/kernel/drv/sd.conf' or '/kernel/drv/ssd.conf' and the physical-block-size set.
  
- Concurrent I/Os are issued to unaligned blocks within a physical block

To verify if RMW is enabled, use the following commands as 'root':

    # echo "::sd_state ! grep un_f_enable_rmw" | mdb -k
    # echo "::ssd_state ! grep un_f_enable_rmw" | mdb -kIf RMW is ENABLED = 1, the output will be:

    un_f_enable_rmw = 1
    un_f_enable_rmw = 1
    un_f_enable_rmw = 1This value will be set to "0" if the RMW feature is disabled.

Symptoms
If the described issue occurs, systems may encounter data loss or loss of data integrity and will exhibit symptoms similar to the following:

For ZFS pools, 'zpool status [poolname]' will show a high number of checksum errors and may show the pool as 'state: DEGRADED' depending on the number of errors detected, as in the following:    

# zpool status -v tankin
          pool: tankin
          state: DEGRADED
          status: One or more devices has experienced an error resulting in data
                  corruption.  Applications may be affected.
          action: Restore the file in question if possible.  Otherwise restore the
                  entire pool from backup.
         see: http://www.sun.com/msg/ZFS-8000-8A
         scan: none requested
         config:

                NAME                       STATE     READ WRITE CKSUM
                tankin                     DEGRADED     0     0   736
                c0t60A980006467682D4B6F6D5746614632d0  DEGRADED     0     0 4.35K  too many errors

        errors: Permanent errors have been detected in the following files:

                /san/tankin/part01-msg/msg/=user/e0/63/=user151880/00
                /san/tankin/part01-msg/msg/=user/60/12/=user106134/00
                /san/tankin/part01-msg/msg/=user/c2/06/=user40993/=+Drafts
                [...snip...]If the checksum errors are reported against metadata which is, by default, replicated, then there will be no files shown in the list.  If the pool is resilient (ie: mirror or raidz), it is possible that no files will be listed.

'fmdump -e' will show zfs.data and zfs.checksum errors:

     TIME                 CLASS
        Sep 11 12:51:42.9213 ereport.fs.zfs.data           
        Sep 11 12:51:42.9213 ereport.fs.zfs.checksum       
        Sep 11 12:51:42.9213 ereport.fs.zfs.checksum       
        Sep 11 12:52:23.7862 ereport.fs.zfs.data'/var/adm/messages' will contain a message similar to the following:

        Sep 11 05:41:44 hostname fmd: SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
        Sep 11 05:41:44 hostname EVENT-TIME: Tue Sep 11 05:41:44 UTC 2012
        Sep 11 05:41:44 hostname PLATFORM: ORCL,SPARC-T4-2, CSN: -, HOSTNAME: hostnamevery specific set of requirements
        Sep 11 05:41:44 hostname SOURCE: zfs-diagnosis, REV: 1.0
        Sep 11 05:41:44 hostname EVENT-ID: 8242ba3d-4725-4ee6-9530-e1eccfa03e06
        Sep 11 05:41:44 hostname DESC: The ZFS pool has experienced currently unrecoverable I/O
        Sep 11 05:41:44 hostname failures.  Refer to http://sun.com/msg/ZFS-8000-HC for more information.
        Sep 11 05:41:44 hostname AUTO-RESPONSE: No automated response will be taken.
        Sep 11 05:41:44 hostname IMPACT: Read and write I/Os cannot be serviced.
        Sep 11 05:41:44 hostname REC-ACTION: Make sure the affected devices are connected, then run
        Sep 11 05:41:44 hostname 'zpool clear'.For UFS filesystems, the system may panic and/or report errors in /var/adm/messages indicating an fsck(1M) is required.

Workaround
There is no workaround for this issue.

This issue is addressed in the following releases:

SPARC Platform

•Solaris 10 with patch 147440-27 or later
•Solaris 11.1 with SRU 3.4 or later
x86 Platform

•Solaris 10 with patch 147441-27 or later
•Solaris 11.1 with SRU 3.4 or later

No comments:

Post a Comment