A Network Data Management Protocol (NDMP) Issue With Solaris 11.1 and ZFSSA Releases 2011.1.4.0 and 2011.1.4.2 may Cause Truncation of Data During Tape Backup

Description
An issue with the Network Data Management Protocol (NDMP) in Solaris 11.1 and in IDRs for Sun ZFS Storage Appliance (ZFSSA) Software releases 2011.1.4.0 and 2011.1.4.2 may cause truncation of data on tape backups. Attempts to restore any data from such a backup will fail entirely.

Occurrence
This issue can occur in the following releases:

SPARC Platform:
•Solaris 11.1 without SRU 1.4
x86 Platform:

•Solaris 11.1 without SRU 1.4
Sun ZFS Storage Appliance Software 2011.1.4.0 with the following IDRs:

•2011.04.24.4.0,1-2.21.13.1
•2011.04.24.4.0,1-2.21.12.1
•2011.04.24.4.0,1-2.21.11.1
•2011.04.24.4.0,1-2.21.10.1
•2011.04.24.4.0,1-2.21.9.1
•2011.04.24.4.0,1-2.21.8.1
•2011.04.24.4.0,1-2.21.7.1
•2011.04.24.4.0,1-2.21.6.1
•2011.04.24.4.0,1-2.21.5.1
Sun ZFS Storage Appliance Software 2011.1.4.2 with the following IDRs:

•2011.04.24.4.2,1-2.28.11.1
•2011.04.24.4.2,1-2.28.10.1
•2011.04.24.4.2,1-2.28.9.1
•2011.04.24.4.2,1-2.28.8.1
•2011.04.24.4.2,1-2.28.7.1
•2011.04.24.4.2,1-2.28.6.1
•2011.04.24.4.2,1-2.28.5.1
•2011.04.24.4.2,1-2.28.4.1
•2011.04.24.4.2,1-2.28.3.1
•2011.04.24.4.2,1-2.28.2.1
•2011.04.24.4.2,1-2.28.1.1
•2011.04.24.4.2,1-2.28.6.4
•2011.04.24.4.2,1-2.28.6.3
•2011.04.24.4.2,1-2.28.6.2
•2011.04.24.4.2,1-2.28.5.2
•2011.04.24.4.2,1-2.28.1.2
Note 1: Solaris 8, 9, 10 and 11 are not impacted by this issue.

Note 2: ZFS Storage Appliance Software 2011.1.3.0 (or earlier) and 2011.1.5.0 (or later) are not impacted by this issue.

Note 3: All future ZFSSA IDRs are based upon 2011.1.5 where this issue is fixed. Therefore, no further IDRs will be affected by this issue.

To determine the software release on Appliance systems using the Browser User Interface (BUI), do the following:

1.Navigate to: Maintenance -> System
2.Click on the "i" next to the "Current System Software" entry in the table of available releases
A pop-up will show the release. For example: "2011.04.24.4.2,1-2.28.11.1". This is one of the affected IDRs listed above.

To determine the software release on Appliance systems using the Command Line Interface (CLI), execute the command below from CLI. The version under the STATUS "current" has to be looked at:

   hostname:> maintenance system updates ls
   Updates:

   UPDATE                                   DATE                      STATUS
   ak-nas@2011.04.24.1.0,1-1.8              2011-12-21 22:32:50       previous
   ak-nas@2011.04.24.5.0,1-1.33             2012-12-13 21:28:50       previous
   * ak-nas@2011.04.24.5.0,1-2.33.12.1      2013-2-15 17:05:58        currentSymptoms
When a NDMP restore of the particular backup is attempted, it fails. Error messages will be seen in "DMA" and "Data Service". Error messages from DMA (OSB in this case) will be as follows (where the job ID of DMA is 171748):

<OSB snippet>
    # obtool catxcr -l 0 admin/171748|tail
    19:11:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:11:51 MNPO: mover halted, data service didn't (its state=active)
    19:13:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:13:51 MNPO: mover halted, data service didn't (its state=active)
    19:15:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:15:51 MNPO: mover halted, data service didn't (its state=active)
    19:17:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:17:51 MNPO: mover halted, data service didn't (its state=active)
    19:19:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:19:51 MNPO: mover halted, data service didn't (its state=active)
    19:21:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:21:51 MNPO: mover halted, data service didn't (its state=active)
    19:23:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
    19:23:51 MNPO: mover halted, data service didn't (its state=active)
    19:25:51 await_ndmp_event timed out waiting for a service to halt, flag is 4
</OSB snippet>
NDMP log error messages from "Data Service" for the restore will be as follows:

<Restore>
    1/01 11:54:26 env(DATA_BLOCK_SIZE): "64"
    1/01 11:54:26 env(DIRECT): "n"
    1/01 11:54:26 env(EXTRACT_ACL): "y"
    1/01 11:54:26 env(FILESYSTEM): "/pool22a/local/tcfs4o_22/o007_fmw@AUTOHOTSNAP_vmohscfst008_TCFS4O_PROJ_20121214_2205"
    1/01 11:54:26 env(HIST): "y"
    1/01 11:54:26 env(LEVEL): "0"
    1/01 11:54:26 env(REPLICATE): "n"
    1/01 11:54:26 env(TYPE_OVERRIDE): "off"
    1/01 11:54:26 env(UPDATE): "y"
    1/01 11:54:26 env(ZFS_BACKUP_SIZE): "11783310336"
    1/01 11:54:26 Local operation: 64512
    1/01 11:54:26 ZFS_BACKUP_SIZE: 11783310336
    1/01 11:54:26 env(LEVEL): "0"
    1/01 11:54:26 env(ZFS_MODE) not specified, defaulting to recursive
    1/01 11:54:26 env(ZFS_FORCE) not specified, defaulting to FALSE
    1/01 11:54:26 env(UPDATE): "y"
    1/01 11:54:26 env(DMP_NAME) not specified, defaulting to 'level'
    1/01 11:54:26 restore path: pool22a/local/tcfs4o_22_3_31BI2N3/o007_fmw
    1/01 11:54:26 tape header: NDMPUTF8MAGIC; 0 0; 64512
    1/01 11:54:26 nz_zfs_force: 0
    1/01 12:17:28 *** E R R O R *** [ndmpd_zfs_abort:1299]:ndmpd_zfs_abort() called...aborting recover operation
    1/01 12:17:35 Restoring to "pool22a/local/tcfs4o_22_3_31BI2N3/o007_fmw" aborted.
    1/01 17:20:30
</Restore>
Workaround
There is no workaround for this issue. Please see the resolution below.

This issue is addressed in the following releases:

SPARC Platform:
•Solaris 11.1 with SRU 1.4 or later
x86 Platform:

•Solaris 11.1 with SRU 1.4 or later
ZFS:

•Sun ZFS Storage Appliance Software 2011.1.5.0 or later.
Sun ZFS Storage Appliance software updates are available from MOS by doing the following:

1.Select the "Patches & Updates" tab.
2.Search for "Sun ZFS Storage Appliance" by product family.
For a listing of ZFS Storage Appliance Software Releases and version information, please see:

•https://wikis.oracle.com/display/FishWorks/Software+Updates
Note: Truncated backups cannot be restored with this fix and they are not recoverable and are permanently lost.

Search This Blog

Unix Administration

A Network Data Management Protocol (NDMP) Issue With Solaris 11.1 and ZFSSA Releases 2011.1.4.0 and 2011.1.4.2 may Cause Truncation of Data During Tape Backup

Comments

Post a Comment

Popular posts from this blog

BMCs and the IPMI Protocol

Logical Domains Reference Manual

Understanding How ZFS Calculates Used Space