Tuesday, June 10, 2014

To resolved fmd errors

After any hardware replacement or just reboot of server we need to check and cleared the fmd errors on the system.

To check the fmd errors :

# fmdump -v
# fmadm faulty
# fmadm faulty -r
# fmadm faulty -a

To rotate the fmd errors use below script ( run at least two times) .

#GZ:
fmdump

#Clean up old entries:
for i in `/usr/sbin/fmdump|awk '{print $4}'`
do
 /usr/sbin/fmadm repair $i
done
sleep 2
/usr/sbin/fmadm rotate fltlog
/usr/sbin/fmadm rotate errlog

If the errors are still exist in # fmadm faulty –a then we need to clear the cache of fmd using the below steps.

[Clear ereports and resource cache:
# cd /var/fm/fmd
# rm e* f* c*/eft/* r*/*

[Clearing out FMA files with no reboot needed:
svcadm disable -s svc:/system/fmd:default
cd /var/fm/fmd
find /var/fm/fmd -type f -exec ls {} \;
find /var/fm/fmd -type f -exec rm {} \;
svcadm enable svc:/system/fmd:default

And monitor the system for few hrs (or one day) if the errors are came again then we need raise the oracle SR to get it fix.

How to Force a Crash Dump When the Solaris Operating System is Hung

First of all, you need to drop the system into OK Prompt. For Old model which has Sun keyboard, you can press STOP+A, or for newer model / terminal, press break key sequences, example:
————-
~.
#.
#~
~#

If your console is a terminal, you can type :

“shift-break”    or
“ctrl-break”    or
“ctrl-\” (ctrl-backslash) or
“<enter>” followed by “~” and
“ctrl-break”     on Solaris Sparc,

To send a <BREAK> from Hyperterm, use <Ctrl>-<Pause> or <Alt>-<Pause>
On Hyperterminal, Ctrl-Break
————

Okay after you able to drop system to the OK prompt, you will see below PROMPT messages:

Type ‘go’ to resume
ok



All you need to do is to type ‘sync’ (without the quotes) and press Enter. The system will immediately panic.

Now the hang condition has been converted into a panic, so an image of memory can be collected for later analysis.

The system will attempt to reboot after the dump is complete.

Avoid filling zpools beyond 80% of their capacity

Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. If the primary workload is immutable files (write once, never remove), then you can keep a pool in the 95-96% utilization range. Keep in mind that even with mostly static content in the 95-96% range, write, read, and resilvering performance might suffer.

•    Issues specific to 80% full zpool are:
o    If the zpool is fragmented and has less free space available, then it will take longer and require more CPU cycles in the kernel to find a suitable block of free space for each write.  This results in lower write performance if the zpool has less than 20% free space. This issue is addressed by the following document:

SunAlert: ZFS(zfs(1M)) filesystem(5) Performance May Drop Significantly if the ZFS Pool Becomes Full (Doc ID 1019947.1)

Bug 15418573 / 6596237 - Stop looking and start ganging

Even with the fix, write performance still degrades as the pool becomes more fragmented and running with less than 20% free. There is another bug open to address the short coming.

Bug 15702274 / 7026795: Stop looking and start ganging, really
The fix is still being worked by ZFS sustaining.

Important when increasing mirrored stripe/concat volumes

mirror=target stands for = The attribute mirror=target specifies that volumes should be mirrored between identical target IDs on different controllers.

root@# vxdisk -e list | egrep 'emc2_07ce|emc2_07cf|apevmx14_0880|apevmx14_0881'
apevmx14_0880 auto:sliced    emc1_1ffc_ol1_1dbdg_m2  ol1_1dbdg   online thinrclm      c15t5000097408472964d111s2 tprclm
apevmx14_0881 auto:sliced    emc1_200c_ol1_1dbdg_m2  ol1_1dbdg   online thinrclm      c15t5000097408472964d112s2 tprclm

emc2_07ce    auto:sliced    emc1_1ffc_ol1_1dbdg_m  ol1_1dbdg   online thinrclm      c15t500009740847255Cd105s2 tprclm
emc2_07cf    auto:sliced    emc1_200c_ol1_1dbdg_m  ol1_1dbdg   online thinrclm      c15t500009740847255Cd106s2 tprclm


mirror=enclr enclr:enc1 enclr:enc2

The disks in one data plex are all taken from enclosure enc1 and the disks in the other data plex are all taken from enclosure enc2.

================================

root@# vxdisk -e list | egrep 'emc7_0326|emc7_034a|emc6_0327|emc6_034b'
emc6_034b    auto:cdsdisk   dbs10c1_us3db5dg  us3db5dbdg  online               c4t5006048452A8CCD2d218s2 std
emc6_0327    auto:cdsdisk   dbs09c1_us3db5dg  us3db5dbdg  online               c4t5006048452A8CCD2d214s2 std

emc7_034a    auto:cdsdisk   dbs10c0_us3db5dgm  us3db5dbdg  online               c4t5006048452A9CD92d212s2 std
emc7_0326    auto:cdsdisk   dbs09c0_us3db5dgm  us3db5dbdg  online               c4t5006048452A9CD92d208s2 std

LiveUpgrade issue - Solution

root@# time lucreate -c s10u9 -m /:/dev/md/dsk/d210:ufs,mirror -m /:/dev/md/dsk/d12:detach,attach,preserve -m /var:/dev/md/dsk/d230:ufs,mirror -m /var:/dev/md/dsk/d32:detach,attach,preserve -m /opt:/dev/md/dsk/d250:ufs,mirror -m /opt:/dev/md/dsk/d52:detach,attach,preserve -n s10u11 -C /dev/dsk/c3t0d0s0
Determining types of file systems supported
Validating file system requests
The device name </dev/md/dsk/d210> expands to device path </dev/md/dsk/d210>
The device name </dev/md/dsk/d230> expands to device path </dev/md/dsk/d230>
The device name </dev/md/dsk/d250> expands to device path </dev/md/dsk/d250>
Preparing logical storage devices
Preparing physical storage devices
Configuring physical storage devices
Configuring logical storage devices
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <s10u9>.
Creating initial configuration for primary boot environment <s10u9>.
INFORMATION: No BEs are configured on this system.
The device </dev/dsk/c3t0d0s0> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <s10u9> PBE Boot Device </dev/dsk/c3t0d0s0>.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device </dev/dsk/c0t0d0s0> is not a root device for any boot environment; cannot get BE ID.
Creating configuration for boot environment <s10u11>.
Source boot environment is <s10u9>.
Creating file systems on boot environment <s10u11>.
Preserving <ufs> file system for </> on </dev/md/dsk/d210>.
Preserving <ufs> file system for </opt> on </dev/md/dsk/d250>.
Preserving <ufs> file system for </var> on </dev/md/dsk/d230>.
Mounting file systems for boot environment <s10u11>.
ERROR: mount: The state of /dev/md/dsk/d210 is not okay
        and it was attempted to be mounted read/write
mount: Please run fsck and try again
ERROR: cannot mount mount point </.alt.tmp.b-fng.mnt> device </dev/md/dsk/d210>
ERROR: failed to mount file system </dev/md/dsk/d210> on </.alt.tmp.b-fng.mnt>
ERROR: cannot mount boot environment by icf file </etc/lu/ICF.2>
WARNING: Unable to mount BE <s10u11>.
Removing incomplete BE <s10u11>.
ERROR: Cannot make file systems for boot environment <s10u11>.

real    3m1.695s
user    0m14.315s
sys     0m24.271s

root@# fsck /dev/md/rdsk/d210
** /dev/md/rdsk/d210
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
127965 files, 4754892 used, 1441355 free (6331 frags, 179378 blocks, 0.1% fragmentation)

NOTE:  AFTER THIS YOU WILL HAVE TO RE-RUN LUCREATE HOWEVER TO DO SO, YOU WILL HAVE TO REMOVE NEWLY CREATED METATDEVICES, SYNC SUB-MIRROS TO ORIGINAL METADEVICES, WAIT TILL THEY ALL RE-SYNC BEFORE YOU SPLIT THEM AGAIN AFTER RE-EXECUTING THE LUCREATE AS SAME ABOVE….  TIME CONSUMING? YES – CERTAINLY IT IS!! SO USE BELOW TRICK!

root@# time lucreate -c s10u9 -m /:/dev/md/dsk/d210:ufs -m /var:/dev/md/dsk/d230:ufs -m /opt:/dev/md/dsk/d250:ufs  -n s10u11 -C /dev/dsk/c3t0d0s0
Determining types of file systems supported
Validating file system requests
Preparing logical storage devices
Preparing physical storage devices
Configuring physical storage devices
Configuring logical storage devices
Analyzing system configuration.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device </dev/dsk/c0t0d0s0> is not a root device for any boot environment; cannot get BE ID.
Creating configuration for boot environment <s10u11>.
Source boot environment is <s10u9>.
Creating file systems on boot environment <s10u11>.
Creating <ufs> file system for </> in zone <global> on </dev/md/dsk/d210>.
Creating <ufs> file system for </opt> in zone <global> on </dev/md/dsk/d250>.
Creating <ufs> file system for </var> in zone <global> on </dev/md/dsk/d230>.
Mounting file systems for boot environment <s10u11>.
Calculating required sizes of file systems for boot environment <s10u11>.
Populating file systems on boot environment <s10u11>.
Analyzing zones.
Mounting ABE <s10u11>.
Cloning mountpoint directories.
Generating file list.
Copying data from PBE <s10u9> to ABE <s10u11>.
100% of filenames transferred
Finalizing ABE.
Fixing zonepaths in ABE.
Unmounting ABE <s10u11>.
Fixing properties on ZFS datasets in ABE.
Reverting state of zones in PBE <s10u9>.
Making boot environment <s10u11> bootable.
Setting root slice to Solaris Volume Manager metadevice </dev/md/dsk/d210>.
Population of boot environment <s10u11> successful.
Creation of boot environment <s10u11> successful.

HTH!!

_________________________________________________________________

# luupgrade -n sol10U11 -u -s /mnt -k /tmp/sysidcfg

67352 blocks
miniroot filesystem is <lofs>
Mounting miniroot at </mnt/Solaris_10/Tools/Boot>
ERROR: Unable to mount boot environment <>.

Solution for above error -

Live Upgrade uses /a as a temporary mount point and directory for some of its actions.  Accordingly, it needs to be an empty directory, or ideally shouldn't exist at all.

To resolve this issue, move your current /a out of the way on both the original boot environment and the new boot environments as follows:

# mv /a /a.orig
# lumount <alt_BE> /mnt
# mv /mnt/a /mnt/a.orig
# luumount <alt_BE>

If you have no need for the contents of /a, you can safely delete the file or directory instead of renaming it.
Once you have removed or renamed /a, you should find luupgrade will now operate as expected.

# luupgrade -n s10u11 -u -s /mnt -k /tmp/sysidcfg

[...]

Upgrading Solaris: 100% completed
Installation of the packages from this media is complete.
Updating package information on boot environment <s10u11>.
Package information successfully updated on boot environment <s10u11>.
Adding operating system patches to the BE <s10u11>.
The operating system patch installation is complete.

[...]

The Solaris upgrade of the boot environment <s10u11> is partially complete.
Installing failsafe
Failsafe install is complete.
___________________________________________________________________________

root@# lucreate -n s10u11 -p rpool
Analyzing system configuration.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <s10u11>.
Source boot environment is <s10u7>.
Creating file systems on boot environment <s10u11>.
Populating file systems on boot environment <s10u11>.
Analyzing zones.
Duplicating ZFS datasets from PBE to ABE.
Creating snapshot for <rpool/ROOT/s10u7> on <rpool/ROOT/s10u7@s10u11>.
Creating clone for <rpool/ROOT/s10u7@s10u11> on <rpool/ROOT/s10u11>.
Creating snapshot for <rpool/ROOT/s10u7/var> on <rpool/ROOT/s10u7/var@s10u11>.
Creating clone for <rpool/ROOT/s10u7/var@s10u11> on <rpool/ROOT/s10u11/var>.
Mounting ABE <s10u11>.
ERROR: error retrieving mountpoint source for dataset < >
ERROR: failed to mount file system < > on </.alt.tmp.b-.nb.mnt/opt>
ERROR: unmounting partially mounted boot environment file systems
ERROR: cannot mount boot environment by icf file </etc/lu/ICF.2>
ERROR: Failed to mount ABE.
Reverting state of zones in PBE <s10u7>.
ERROR: Unable to copy file systems from boot environment <s10u7> to BE <s10u11>.
ERROR: Unable to populate file systems on boot environment <s10u11>.
Removing incomplete BE <s10u11>.
ERROR: Cannot make file systems for boot environment <s10u11>.

Problem - Having /opt as a separate ZFS filesystem, it prevents liveupgrade to function well.

root@:/root# zfs list -r rpool
NAME                   USED  AVAIL  REFER  MOUNTPOINT
rpool                 93.3G  40.6G   100K  /rpool
rpool/ROOT            14.3G  40.6G    21K  legacy
rpool/ROOT/s10u7      14.3G  40.6G  2.29G  /
rpool/ROOT/s10u7/var  11.3G  41.3G  11.3G  /var
rpool/dump            32.0G  40.6G  32.0G  -
rpool/export_home       24K  20.0G    24K  /rpool/export/home
rpool/homevol          218K  20.0G   218K  /homevol
rpool/opt             13.0G  2.04G  7.74G  /opt
rpool/opt/oracle      5.22G  2.04G  5.22G  /opt/oracle
rpool/swap              32G  72.6G    16K  -
rpool/var_log         23.3M  30.0G  23.3M  /var/log

root@:/root# init 0

{9} ok boot -F failsafe
Resetting...

# mv opt opt_save
# zfs set mountpoint=/opt_save rpool/opt
# zfs create rpool/ROOT/s10u7/opt
# mv * ../opt/
# zfs set quota=15g rpool/ROOT/s10u7/opt

# df -kh /a/opt
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/s10u7/opt    15G   7.7G   7.3G    52%    /a/opt

# zfs set mountpoint=none rpool/opt

# zfs set mountpoint=/opt/oracle rpool/opt/oracle

# zfs list -r
NAME                   USED  AVAIL  REFER  MOUNTPOINT
rpool                 93.3G  40.6G   100K  /a/rpool
rpool/ROOT            22.0G  40.6G    21K  legacy
rpool/ROOT/s10u7      22.0G  40.6G  2.29G  /a
rpool/ROOT/s10u7/opt  7.74G  7.26G  7.74G  /a/opt
rpool/ROOT/s10u7/var  11.3G  41.3G  11.3G  /a/var
rpool/dump            32.0G  40.6G  32.0G  -
rpool/export_home       24K  20.0G    24K  /a/rpool/export/home
rpool/homevol          218K  20.0G   218K  /a/homevol
rpool/opt             5.22G  9.78G    21K  none
rpool/opt/oracle      5.22G  2.78G  5.22G  /a/opt/oracle
rpool/swap              32G  72.6G    16K  -
rpool/var_log         23.3M  30.0G  23.3M  /a/var/log

# zfs set canmount=off rpool/opt

# init 6

After reoot.

root@:/root# svcs -vx;uptime;df -kh | grep rpool
  1:50am  up 3 min(s),  1 user,  load average: 1.17, 0.64, 0.27
rpool/ROOT/s10u7       134G   2.3G    41G     6%    /
rpool/ROOT/s10u7/var   134G    11G    41G    22%    /var
rpool/ROOT/s10u7/opt    15G   7.7G   7.3G    52%    /opt
rpool/homevol           20G   217K    20G     1%    /homevol
rpool/opt/oracle       8.0G   5.2G   2.8G    66%    /opt/oracle
rpool                  134G    99K    41G     1%    /rpool
rpool/export_home       20G    24K    20G     1%    /rpool/export/home
rpool/var_log           30G    23M    30G     1%    /var/log

Now let's try and run lucreate

root@:/root# lucreate -n s10u11 -p rpool
Analyzing system configuration.
ERROR: All datasets within a BE must have the canmount value set to noauto.
/usr/lib/lu/ludefine: cannot return when not in function
ERROR: All datasets within a BE must have the canmount value set to noauto.
/usr/lib/lu/ludefine: cannot return when not in function
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <s10u11>.
Source boot environment is <s10u7>.
Creating file systems on boot environment <s10u11>.
ERROR: All datasets within a BE must have the canmount value set to noauto.
/usr/lib/lu/ludefine: cannot return when not in function
Populating file systems on boot environment <s10u11>.
ERROR: All datasets within a BE must have the canmount value set to noauto.
/usr/lib/lu/ludefine: cannot return when not in function
Analyzing zones.
Duplicating ZFS datasets from PBE to ABE.
Creating snapshot for <rpool/ROOT/s10u7> on <rpool/ROOT/s10u7@s10u11>.
Creating clone for <rpool/ROOT/s10u7@s10u11> on <rpool/ROOT/s10u11>.
Creating snapshot for <rpool/ROOT/s10u7/opt> on <rpool/ROOT/s10u7/opt@s10u11>.
Creating clone for <rpool/ROOT/s10u7/opt@s10u11> on <rpool/ROOT/s10u11/opt>.
Creating snapshot for <rpool/ROOT/s10u7/var> on <rpool/ROOT/s10u7/var@s10u11>.
Creating clone for <rpool/ROOT/s10u7/var@s10u11> on <rpool/ROOT/s10u11/var>.
Mounting ABE <s10u11>.
Generating file list.
Finalizing ABE.
Fixing zonepaths in ABE.
Unmounting ABE <s10u11>.
Fixing properties on ZFS datasets in ABE.
Reverting state of zones in PBE <s10u7>.
Making boot environment <s10u11> bootable.
Population of boot environment <s10u11> successful.
Creation of boot environment <s10u11> successful.


It's sucessful. :)

root@:/root# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10u7                      yes      yes    yes       no     -
s10u11                     yes      no     no        yes    -

e1000g_LSO_BUG_6909685_description

Large Send Offload
Large Send Offload (LSO) is a hardware off-loading technology. LSO off-loads TCP Segmentation to NIC hardware to improve the network performance by reducing the workload on the CPUs. LSO is helpful for 10Gb network adoption on systems with slow CPU threads or lack of CPU resource. This feature integrates basic LSO framework in Solaris TCP/IP stack, so that any LSO-capable NIC might be enabled with LSO capability.

Bug ID 6909685
Synopsis TCP/LSO should imply HW cksum
State 11-Closed:Will Not Fix (Closed)
Category:Subcategory kernel:tcp-ip
Keywords BOP
Responsible Engineer Jonathan Anderson
Reported Against s10u9_02 , s10u8_fcs , solaris_10u8
Duplicate Of Introduced In Commit to Fix Fixed In Release Fixed Related Bugs 6838180 , 6855964 , 6908844
Submit Date 11-December-2009
Last Update Date 5-March-2010
Description
As described in 6908844, customer disabled HW cksum in IP stack by /etc/system modification:

set ip:dohwcksum=0

As result, the LSO packets are dropped by NIC driver as NIC driver believes this is
invalid. When sending TCP over LSO, the IP stack must guarantee that hwcksum
is either set to a) or b):

a) HCK_IPV4_HDRCKSUM + HCK_PARTIALCKSUM
b) HCK_FULLCKSUM

Depending on hw capabilities announced by NIC driver.

In 6908844 the root cause of TX stall is that the NIC driver gets from IP stack TCP LSO packet without hwcksum flag. It says that the checksum fields are already set, which is not correct. Large LSO packet is sent by ethernet controller as multiple frames with size <= MTU size. Each frame has unique checksum values and they need to be calculated by hw:

- IP checksum is different in each frame
  (as IP ident field is sequentially increased in each frame)
- TCP checksum is different in each frame
  (as each frame might contain different payload, although
  the TCP pseudo header is same for each frame)

When the hwcksum is disabled in the NIC driver (e.g. in .conf file),
the NIC driver also disables LSO automatically. Perhaps similar check should
be done in IP stack.
The stack should not do such check. If the user modify dohcksum to 0, it should understand the impact to the system. If the driver drop the packets, it's problem of the driver, a LSO packet including software-calculated checksum is valid.
Work Around
If hardware checksum needs to be disabled:

If the device itself supports this (e.g. e1000g) then do it there. This will
implicitly disable LSO.

If the IP variable dohwcksum has to be used then explicitly disable LSO at the
same time with:

ndd -set /dev/ip ip_lso_outbound 0
Comments
N/A


…it is also possible to disable tcp offloading in the e1000 driver directly:

/kernel/drv/e1000g.conf:

tx_hcksum_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
lso_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;

update_drv, unplumb, plumb or reboot.

Slow NFS due to FIXEDMTU After LiveUpgrae

Last days we were facing issue on performance issue with NFS  share after live upgrade . Later we noticed it was due to the FIXEDMTU feature on interface . 
FIXEDMTU was  not due to Live Upgrade . It was due to the mtu parameter on hostname.interface file which reflected on reboot.


There is no reason to set a MTU value of 1500, the default mtu for Ethernet is already 1500

example:
./hostname.e1000g3
auksvaqs-prod mtu 1500

./hostname.e1000g2
auksvaqs-nas mtu 1500


Please make  sure LSO is disabled over reboot ..

A side note regarding LSO , which could cause performance degradation over network . 
Large Send Offload and Network Performance
One issue that I continually see reported by customers is slow network performance.  Although there are literally a ton of issues that can affect how fast data moves to and from a server, there is one fix I’ve found that will resolve this 99% of time — disable Large Send Offload on the Ethernet adapter.
So what is Large Send Offload (also known as Large Segmentation Offload, and LSO for short)?  It’s a feature on modern Ethernet adapters that allows the TCP\IP network stack to build a large TCP message of up to 64KB in length before sending to the Ethernet adapter.  Then the hardware on the Ethernet adapter — what I’ll call the LSO engine — segments it into smaller data packets (known as “frames” in Ethernet terminology) that can be sent over the wire. This is up to 1500 bytes for standard Ethernet frames and up to 9000 bytes for jumbo Ethernet frames.  In return, this frees up the server CPU from having to handle segmenting large TCP messages into smaller packets that will fit inside the supported frame size.  Which means better overall server performance.  Sounds like a good deal.  What could possibly go wrong?
Quite a lot, as it turns out.  In order for this to work, the other network devices — the Ethernet switches through which all traffic flows — all have to agree on the frame size.  The server cannot send frames that are larger than the Maximum Transmission Unit (MTU) supported by the switches.  And this is where everything can, and often does, fall apart.
The server can discover the MTU by asking the switch for the frame size, but there is no way for the server to pass this along to the Ethernet adapter.  The LSO engine doesn’t have ability to use a dynamic frame size.  It simply uses the default standard value of 1500 bytes, or if jumbo frames are enabled, the size of the jumbo frame configured for the adapter.  (Because the maximum size of a jumbo frame can vary between different switches, most adapters allow you to set or select a value.)  So what happens if the LSO engine sends a frame larger than the switch supports?  The switch silently drops the frame.  And this is where a performance enhancement feature becomes a performance degradation nightmare.
1.    With LSO enabled, the TCP/IP network stack on the server builds a large TCP message.
2.    The server sends the large TCP message to the Ethernet adapter to be segmented by its LSO engine for the network.  Because the LSO engine cannot discover the MTU supported by the switch, it uses a standard default value.
3.    The LSO engine sends each of the frame segments that make up the large TCP message to the switch.
4.    The switch receives the frame segments, but because LSO sent frames larger than the MTU, they are silently discarded.
5.    On the server that is waiting to receive the TCP message, the timeout clock reaches zero when no data is received and it sends back a request to retransmit the data.  Although the timeout is very short in human terms, it rather long in computer terms.
6.    The sending server receives the retransmission request and rebuilds the TCP message.  But because this is a retransmission request, the server does not send the TCP message to the Ethernet adapter to be segmented.  Instead, it handles the segmentation process itself.  This appears to be designed to overcome failures caused by the offloading hardware on the adapter.
7.    The switch receives the retransmission frames from the server, which are the proper size because the server is able to discover the MTU, and forwards them on to the router.
8.    The other server finally receives the TCP message intact.
This can basically be summed up as offload data, segment data, discard data, wait for timeout, request retransmission, segment retransmission data, resend data.  The big delay is waiting for the timeout clock on the receiving server to reach zero.  And the whole process is repeated the very next time a large TCP message is sent.  So is it any wonder that this can cause severe network performance issues. This is by no means an issue that effects only Peer 1.  Google is littered with artices by major vendors of both hardware and software telling their customers to turn off Large Send Offload.  Nor is it specific to one operating system.  It effects both Linux and Windows.
____________________________________________________________________





the issue, slow copy on NFS, was solved

I set following in /kernel/drv/e1000g.conf...

tx_hcksum_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
    # this parameter disables hardware checksum creation
lso_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
    # this paramter disable LSO feature in driver

...and rebooted the server


With this, the LSO feature is completely disabled in the e1000g driver.

We already disabled this feature via a ndd parameter but obviousley this was not enough and the dirver still affected the transfer!

Upgrade zone attach/detach

Since the Solaris 10 5/08 Operating System (Solaris 10 5/08 s10s_u5wos_10 SPARC ==> 10_u5), system administrators have had the ability to detach and attach zones, that is, detach a zone from one system and attach it to another. Some restrictions applied with the initial functionality, in that the source and destination system where the non-global zone was being attached had to have the same software level in terms of package versions, patch levels, and architecture. In other words you couldn't move a zone from a sun4v system to a sun4u system or from a prior Solaris release to the current Solaris update release but later updates this is possible now.

In the Solaris 10 10/08 release (Solaris 10 10/08 s10s_u6wos_07b SPARC ==> 10_u6), new functionality was provided by way of the "update on attach" command, the -u argument to zoneadm attach.

Upgrade zones -

Shutdown the zones -

root:XXXXXX:/ # zlogin XXXXXX-z1 shutdown -y -g0 -i 0
root:XXXXXX:/ # zlogin XXXXXX-z2 shutdown -y -g0 -i 0

root:XXXXXX:/ # zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - XXXXXX-z1      installed  /zones/XXXXXX-z1             native   shared
   - XXXXXX-z2      installed  /zones/XXXXXX-z2             native   shared


Detach the zones -

root:XXXXXX:/ # zoneadm -z XXXXXX-z1 detach
root:XXXXXX:/ # zoneadm -z XXXXXX-z2 detach

Attach the zones -

root:XXXXXX:/ # zoneadm -z XXXXXX-z1 attach -U
root:XXXXXX:/ # zoneadm -z XXXXXX-z2 attach -U


BEFORE:

root:XXXXXX-z1:$PWD # head -1 /etc/release
                   Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC

AFTER:

root:XXXXXX-z1:$PWD # head -1 /etc/release
                   Oracle Solaris 10 1/13 s10s_u11wos_24a SPARC

vxlist command in vxvm version 5.1 and above


Example:

root@:/etc/vx/bin# ./vxlist
VxVM DCLI vxlist ERROR V-50-49971-158 Authentication or communication could not be established with the server.

vxdclid is not setup.
Run /opt/VRTSsfmh/adm/dclisetup.sh as root.

If you get above error, please run the command as below:
root@depuc4ia:/etc/vx/bin# /opt/VRTSsfmh/adm/dclisetup.sh
root@depuc4ia:/etc/vx/bin#

root@:/etc/vx/bin# ./vxlist disk
TY   DEVICE          DISK            DISKGROUP        SIZE    FREE STATUS
disk apevmx09_01b9   -               -                   -       - notsetup
disk apevmx09_01d6   apevmx09_01d6   unica_dg      119.96g   4.96g imported
disk apevmx09_01d7   -               -              59.97g       - free
disk apevmx09_01d8   -               -              17.97g       - free
disk apevmx09_01d9   -               -              17.97g       - free


root@:/root# cd /etc/vx/bin
root@:/etc/vx/bin# ./vxlist disk
TY   DEVICE          DISK            DISKGROUP           SIZE     FREE STATUS
disk apevmx09_058a   -               vxfencoorddg     987.87m        - deported
disk apevmx09_058b   -               vxfencoorddg     987.87m        - deported
disk apevmx09_0582   -               pkgbksae_dg       59.97g        - foreign
disk apevmx09_0585   apevmx09_0585   pkgbksaf_dg       59.97g   10.95g imported
disk apevmx09_0589   -               vxfencoorddg     987.87m        - deported
disk apevmx10_024c   appl1_01        appl1_dg         119.96g   60.95g imported
disk apevmx10_0516   -               -                      -        - notsetup
disk apevmx10_0517   -               -                      -        - notsetup

VxVM Bible - Quick Ref

Section 1. Foundation Suite Concepts
1.01 Identify the physical objects used by VERITAS Volume Manager to store data
            VTOC & partitions
1.02 Describe the process by which a physical disk is brought under Volume Manager control
            Removes partition table entries from VTOC (except slice 2)
            Rewrites VTOC and creates private (disk header, configuration database & kernel logs) & public regions
            Default size of private region is 2048 blocks (sectors) & maximum size is 524288 blocks (sectors)
            With 512b blocks, default size is 1048576b (1MB) & maximum size is 268435456b (256MB)
1.03 Identify the virtual objects created by Volume Manager to manage data storage
            disk groups, VM disks, subdisks, plexes and volumes
1.04 Identify common virtual storage layout types
            concat, striped, mirrored, RAID-5, layered
Section 2. Installing Foundation Suite
2.01 Identify VxVM and VxFS installation prerequisites
            license key requirements:
            customer number from License Key Request form
            order number from License Key Request form
            host id  # hostid
            machine type  # uname -i
            # vxlicense -c  Add license key
            # vxlicense -p  View license key
2.02 Identify guidelines for planning initial setup of VxVM
VxVM 3.5 Packages
VRTSvxvm
VRTSvlic
VRTSvmdoc
            VRTSvmman
2.03 Describe how to set up VxVM by using the vxinstall program
            # /cdrom/cdname/installer
            # vxinstall
Section 3. Managing Storage with Volume Manager
3.01 Describe the three VxVM user interfaces
            cli, vxdiskadm, VMSA
3.02 Explain how to access the VxVM CLI commands
            /etc/vx/bin, /usr/sbin, /usr/lib/vxvm/bin
            vxassist           Creates and manages volumes in a single step
            vxprint             Lists information from the VxVM configuration records
            vxdg                 Operates on disk groups—creates new disk groups and administers existing disk groups
            vxdisk              Administers disks under VxVM control—defines special disk devices, initializes information, etc.
3.03 Explain how to access the vxdiskadm main menu
            # vxdiskadm
3.04 Determine the steps to start the VMSA server and client
            # vmsa &
3.05 Describe how to customize VMSA security
            Users need appropriate privileges and access can be restricted to specific users
3.06 Identify the steps to run VMSA in read-only mode
            Menu bar / Options / Set VMSA to read-only mode
3.07 Describe the two device-naming schemes available in VxVM
            Standard device naming and Enclosure-based naming
3.08 Explain how to add a disk to a VxVM disk group
            Configure a disk for VxVM control:
# vxdisksetup -i devicetag [attributes: config, noconfig, privlen=length, privoffset=offset, publen=length, puboffset=offset]
            Add initialized disks to a disk group:
# vxdg -g diskgroup adddisk diskname=devicetag
3.09 Choose the effective method to view disk information, given various methods
            # vxdisk list                             List disk information
            # vxdisk list diskname             List disk information for specific disk
# vxdisk -o alldgs list              List disk information for all disk groups
# vxdisk -s list                         Summarize for all disks
# prtvtoc /dev/rdsk/…               Displays VTOC configuration ; Tag 14=public region, Tag 15=private region
# vxdg -g diskgroup free         Show free space in diskgroup
3.10 Explain how to remove a disk from a disk group
            # vxevac –g diskgroup  from_disk  to_disk
# vxdg [-g diskgroup]  rmdisk  diskname
            If no diskgroup is specified, default is rootdg; Returns disk to the free disk pool
            # vxdiskunsetup [-C]  devicetag
            -C to clear host locks on device
            Deconfigures a disk by returning it to an uninitialized state
•      To exclude specific disks from VxVM control, create /etc/vx/disks.exclude
•      To exclude all disks on a specific controller, create /etc/vx/cntrls.exclude
•      To exclude all disks in specific enclosure, create /etc/vx/enclr.exclude
3.11 Explain the method, or methods, for moving a disk from one disk group to another
           # vxdg -g diskgroup rmdisk diskname
            # vxdg -g diskgroup adddisk diskname=devicetag
Renaming a disk:
# vxedit -g diskgroup rename old_diskname new_diskname
3.12 Identify the purposes of disk groups
            enable logical grouping, move host to host, high availability
3.13 Explain how to create, deport, import, rename and destroy a disk group
•     Create a disk group using initialized disk:
            # vxdg init diskgroup diskname=devicename
•     Create a spare disk for a disk group:
            # vxedit -g diskgroup set spare=on | off disk_media_name
•     Deport a disk group:
           # vxdg deport diskgroup
•     Deport and rename:
           # vxdg -n new_name deport old_name
•     Deport to a new host:
           # vxdg -h hostname deport diskgroup
•     Import a disk group:
•     # vxdg import diskgroup
•     # vxvol start
•     Rename:
            # vxdg -n new_name deport old_name
            # vxdg import new_name
            # vxvol start
or
# vxdg deport old_name
# vxdg -n new_name import old_name
            # vxvol start
•     Import and temporarily rename:
            # vxdg -t -n temp_new_name import real_name
            # vxvol start
•     Import, clear locks and temporarily rename:
            # vxdg -tC -n temp_new_name import real_name
            # vxvol start
•     Force an import::
            # vxdg -f import diskgroup
•     Move a disk group:
            # vxdg -h hostname deport diskgroup
            # vxdg import diskgroup
            # vxvol start
•     Destroy a disk group:
            # vxdg destroy diskgroup
3.14 View information about a disk group
            # vxdg list diskgroup
3.15 Describe how to upgrade the disk group version
            # vxdg [-T version] upgrade diskgroup    (if no version is specified, default is the latest version)
            Specify version when creating disk group:
            # vxdg -T 50 init diskgroup diskname=devicetag
3.16 Identify the features of volume layouts supported by VxVM
•      Disk spanning—concatenation, striping
•      Data redundancy—mirroring, parity
•      Resilience—layered volume
•      RAID

•      RAID-0—simple concatenation or striping
•      RAID-1—mirroring
•      RAID-5—stripe with parity
•      RAID-0+1—adding a mirror to concat or striped layout; striping + mirroring = mirror-stripe layout; concatting + mirroring = mirror-concat layout; mirroring occurs ABOVE the concat or striping
•      RAID-1+0—adding a mirror BELOW concat or striped layout; mirrors each column of stripe or chunk of concat; called the layered volume

Layout types:
•      Concatenated
•      Striped
•      Mirrored
•      RAID-5
•      Layered volumes—stripe-mirror (Striped Pro) and concatenated-mirror (Concatenated Pro)
3.17 Explain how to create a volume
            # vxassist [-g diskgroup] [-b] make volume_name [m | k | g] [attributes]
            Attributes:
            layout=nostripe
            layout=stripe
            layout=mirror-stripe
            layout=mirror-concat
ncol=n
            nstripe=n or stripes=n
            stripeunit=size (default is 64k)
            layout=raid5,nolog (default is to create log)  (default stripe unit size is 16k)
            layout=mirror
            nmirror=n
            layout=stripe,mirror  (striped volume that is mirrored)
            logtype=drl | drlseq
            nlog=n
            [disk names…]
•      /dev/vx/dsk/diskgroup/volume = block device
•      /dev/vx/rdsk/diskgroup/volume = character (raw) device
•      To determine largest possible size for the volume to be created:
            # vxassist -g diskgroup maxsize attributes…
•         To determine how much a volume can expand:
            # vxassist -g diskgroup maxgrow volume
3.18 Describe how to display volume layout information
            # vxprint -g diskgroup  [options]
            options:
            -vpsd   Select volume, plex, subdisk, or disk
            -h         List hierarchies below selected records
            -r          Display related records
            -t          Print single-line output
            -l          Display long listing
            -a         Display all information
            -A         Select from all active disk groups
            -e pattern   Show matching records
3.19 Remove a volume from VxVM
            # vxassist [-g diskgroup] remove volume vol_name
            or
            # vxedit [-g diskgroup] –rf rm vol_name
3.20 Explain how to add a mirror to, and remove a mirror from, an existing volume
•      Adding a mirror [to specific disk]:
      # vxassist -g diskgroup mirror vol_name [disk_name]
•      Mirroring all volumes:
      # vxmirror –g diskgroup –a
•      Creating | removing mirror volumes by default:
      # vxmirror –d yes | no
•      Adding non-mirror with mirror volume default:
      # vxassist [-g diskgroup] remove volume vol_name
•      Removing a mirror:
      # vxassist [-g diskgroup] remove mirror volume [!]dm_name        
      or
      # vxplex [-g diskgroup] dis plex_name      
      # vxplex [-g diskgroup] –rf rm plex_name
      or
      # vxplex -g diskgroup -o rm dis plex_name           
3.21 Identify the steps to add a log to an existing volume
            # vxassist -g diskgroup  addlog  vol_name  [logtype=drl]  [nlog=n]  [disks_for_log]
•      Removing a log:
            # vxassist -g diskgroup remove log [nlog=n] vol_name
3.22 Explain how to change the volume read policy for a mirrored volume
            3 read policies:
            - round robin
            - preferred
            - select (default)
            # vxvol -g diskgroup rdpol round | prefer | select  vol_name [preferred_plex]
3.23 Describe how to allocate storage for a volume
[adding a file system to a volume]
            # mkfs -F fstype /dev/vx/rdsk/diskgroup/volume
            # mkdir /mount_point
            # mount -F fstype /dev/vx/rdsk/diskgroup/volume /mount_point
            Specifying storage attributes:
            # vxassist [-g diskgroup]  make vol_name  length  [layout=layout]  storage_attributes…
            Layouts:
            layout=nostripe
            layout=stripe
            ncol=n
            nstripe=n or stripes=n
            stripeunit=size (default is 64k)
            layout=raid5,nolog (default is to create log)  (default stripe unit size is 16k)
            layout=mirror
            layout=mirror-stripe  (striped below, mirrored above)
nmirror=n
            layout=stripe,mirror  (striped volume that is mirrored)
            logtype=drl | drlseq
            nlog=n
maxsize
Storage attributes:
            disk_name
            ctlr:controller_name
            enclr:enclosure_name
            target:target_name
            c#tray#  (for trays)
            ! (to exclude)
            mirror=ctlr
            mirror=enclr
            mirror=target
            (mirror=disk default)

            Specifying ordered allocation of storage for volumes:
            Storage is allocated in the following order:
1.    VxVM concatenates the disk
2.    VxVM forms columns
3.    VxVM forms mirrors
            # vxassist [-g diskgroup] [-o ordered]  make vol_name length [layout=layout] storage_attributes…
            -o ordered options:
            col_switch=size1,size2…
            logdisk=disk  (for RAID-5 volumes unless nolog or noraid5log is specified)

            Regular (nonlayered) mirrored volume is mirror-stripe layout (striped first, then mirrored)
Layered volume is stripe-mirror layout
3.24 List the benefits of layered volumes
            Regular mirrored volume:  if 2 drives fail, volume survives 2 out of 6 (1/3) times
            Layered volume:  if 2 drives fail, volume survives 4 out of 6 (2/3) times
            Improved redundancy
            Faster recovery times
Cons:   Requires more VxVM objects
            Fills up disk group configuration database sooner
3.25 Identify the types of mirrored and enhanced mirrored (layered) volume layouts available in VxVM
            Nonlayered (recommended for less than 1GB of space):
•      mirror-concat—top level volume contains more than one plex, and plexes are concatenated
•      mirror-stripe—top level volume contains more than one plex, and plexes are striped

Layered (not recommended for root or swap volumes):
•      concat-mirror—top level volume is concatenated plex, and subdisks are mirrored (Concatenated Pro) (requires at least 2 disks)
•      stripe-mirror—top level volumes is striped plex, and subdisks are mirrored (Striped Pro) (requires at least 4 disks)
3.26 Explain how to create a layered volume
            # vxassist [-g diskgroup]  make vol_name length layout=concat-mirror [attributes…]
            # vxassist [-g diskgroup]  make vol_name length layout=stripe-mirror [attributes…]
3.27 Identify how to control the default behavior of VxVM when creating mirrored volume layouts
            trigger point default = 1GB
            Default mirroring behavior:
            Striped layouts:
•      mirror-stripe—trigger points ignored
•      stripe,mirror—trigger points applied
•      stripe-mirror—mirroring at column or subdisk level and trigger point attributes applied
            Concatenated layouts:
•      mirror—trigger points applied
•      mirror-concat—trigger points ignored
•      concat-mirror—mirroring at subdisk level and trigger point attributes applied
3.28 View information about a layered volume
# vxprint -rth vol_name
            -r  display subvolume configuration for layered volume
            -t  prints single-line output records
            -h hierarchical
3.29 Describe how to resize a volume
            vxresize—automatically resizes both the volume and the file system  (resize everything at once)
            vxassist—resizes the volume but not the file system  (assist me with volume, then the file system)

            # vxassist -g diskgroup growto | growby | shrinkto | shrinkby vol_name size

            # vxresize [-bsx] -F fstype  -g diskgroup  vol_name  [+ | -]new_length
            -b  background
            -s  make sure to shrink volume length
            -x  make sure to expand volume length
3.30 Determine when and how to change the volume layout
            vxresize—automatically resizes both the volume and the file system  (resize everything at once)
            vxassist—resizes the volume but not the file system  (assist me with volume, then I’ll do the file system)
3.31 Describe how to manage volume maintenance tasks
            vxtask—monitor tasks, modify task states (pause, continue, and abort), modify rate of progress

            # vxtask [-ahlpr]  list  [task_id | task_tag]
            -a  show aborted tasks
            -h  hierarchical
            -l  long format
            -p  show paused tasks
            -r  show running tasks

            # vxtask [-c count]  [-ln]  [-t time]  [-w interval]  monitor  [task_id | task_tag]
            -c  show this number of tasks
-l   long format
            -n  show newly registered tasks
-t  exit after time in seconds
            -p  show paused tasks
            -w  show waiting after interval seconds with no activity

            # vxtask  abort | pause | resume   [task_id | task_tag]

            # vxtask [-i task_id]  set  slow=value

            Can display status of, control progress rate, reverse, or start a relayout operation:
            # vxrelayout -g diskgroup  status | reverse | start  vol_name
            # vxrelayout  -o slow=iodelay | iosize=size  [task_id | task_tag]

Other stuff to know:
            Create volume snapshot mirror to be backed up:
            # vxassist -g diskgroup  -b snapstart  vol_name

            To ensure the snapshot mirror is synchronized before detaching:
            # vxassist -g diskgroup  snapwait  orig_vol_name
           
            Create snapshot volume (detaches snapshot mirror, creates new volume and attaches mirror to new volume):
            # vxassist -g diskgroup  snapshot  orig_volume  new_volume

            To remove snapshot mirror that has not been detached and moved to new volume (not needed):
            # vxassist -g diskgroup  snapabort  orig_vol_name

            To remove the snapshot new_volume:
            # vxassist -g diskgroup  remove volume  new_volume

            Reassociate snapshot volume with original volume:
            # vxassist -g diskgroup  snapback  new_volume

            To use new_volume to merge back to orig_volume:
            # vxassist -g diskgroup  -o resyncfromreplica  snapback  new_volume

            To disassociate the snapshot from its original volume:
            # vxassist -g diskgroup  snapclear  new_volume
            Changing volume layouts:
            # vxassist -g diskgroup  relayout  vol_name | plex_name  layout=layout  ncol=+-n  stripeunit=size
[tmpsize=size]
Temporary (scratch) storage space defaults:
orig volume is less than 50MB, temp storage=volume size
orig volume is 50MB-1GB, temp storage=50MB
orig volume 1GB+, temp storage=1GB
Changing resilience level of volume (converting nonlayered to layered or vice versa):
            # vxassist -g diskgroup  convert   vol_name | plex_name  layout=layout
mirror-stripe    to         stripe-mirror
stripe-mirror    to         mirror-stripe
mirror-concat   to         concat-mirror
concat-mirror   to         mirror-concat

Section 4. Managing File Systems
4.01 Identify how to create a file system using the mkfs command
            # mkfs -F vxfs  [-o specific_options]  /dev/vx/rdsk/datadg/datavol  [size]
4.02 Explain how to set file system properties using mkfs command options
            -o  options:
N                      doesn’t really create it
            largefiles         supports 2+GB files  (default is nolargefiles)
            version=          specify layout version  (4 and 5 are valid; 5 is default)
            bsize=              sets logical block size (default is 1024b/1K)
            logsize=           number of blocks allocated for logging area (intent log) (default log size is 16384)
4.03 Select the best method to mount a file system using the mount command
            # mount -F vxfs  [-r] [-o specific_options]  /dev/vx/dsk/datadg/datavol  /my data
            -r   read-only
            mount -v   status of mounted file systems and options, verbose
            mount -p   display mounted file systems in  /etc/vfstab format
            mount -a   mount all file systems listed in /etc/vfstab
4.04 Explain how to maintain file system consistency by using the fsck command
            # fsck -F vxfs  [-m]  [-y|-n]  [-o full,nolog] [-o p]  /dev/vx/rdsk/datadg/datavol
            -m        check but don’t repair
            -o full   perform full system check
            -o nolog            do not replay the log
            -o p      perform parallel log replay (versus checking devices sequentially)
4.05 Identify how to resize a file system
            Before resizing a file system, check available free space of underlying device:
            prtvtoc or format to check size of disk partitions for file systems mounted on partitions
            vxprint to check size of VxVM volumes
            vxdg to check available free space on any disk within disk group
•      Resize VxVM file system:
            # fsadm -F vxfs [-b newsize]  [-r raw_device]  /mount_point

            newsize specified by sectors/512b
            raw_dev specifies path of raw device if not listed in /etc/vfstab and fsadm cannot determine raw device

            # vxdg -g datadg free
            # vxassist -g datadg growto datavol 1024000  (expand the volume)
            # fsadm -F vxfs -b 1024000 -r /dev/vx/rdsk/datadg/datavol  /datavol  (expand the file system)
            # df -k /datavol

            # fsadm -F vxfs -b -512000 -r /dev/vx/rdsk/datadg/datavol  /datavol  (shrink the file system)
            # vxassist -g datadg shrinkto datavol 512000  (shrink the volume)
4.06 Describe how to create and manage a snapshot file system

4.07 List features of the four VxFS file system layout options
            Version 1 Layout—VxFS 1.x—intent logging, extent allocation, unlimited inodes
            Version 2 Layout—VxFS 2.x—added dynamic inode allocation, ACLs, quotas
            Version 4 Layout—VxFS 3.2.x—added large file support, ability for extents to span allocation units
            Version 5 Layout—VxFS 3.5 +—added support for file system sizes up to 32TB
4.08 Define two types of fragmentation
            Directory fragmentation & Extent fragmentation—extent fragmentation most critical
            Fragmented file system has one or more of the following characteristics:
•      More than 50% of free space used by small extents of less than 64 blocks in length
•      A large number of small extents that are free, usually greater than 5% of free space in extents of less than 8 blocks in length
•      Less than 5% of the total file system size available is in large extents (i.e., lengths of 64 or more blocks)
4.09 Explain how to run fragmentation reports by using the fsadm command
            # fsadm -D /mount_point                    monitor Directory fragmentation
            # fsadm -E [-l largesize] /mount_point            monitor Extent fragmentation (largesize default value is 64 blocks)
            # df -F vxfs -o s                                   prints the number of free extents of each size
4.10 Define how to defragment a file system by using the fsadm command
            # fsadm [-d] [-D] [-e] [-E] [-s] [-v] [-l largesize] [-a days] [-t time] [-p passes] [-r rawdev]  /mount_point
            options:
                  -d                    reorganizes directories and reordered to place subdirectory entries first, then other entries                                      decreasing by time of last access; directory is also compacted to remove free space
                  -e                    reorganizes extents, to have the minimum number of extents
                  -D                    reports on directory fragmentation; directory reorgs complete before extent reorgs
                  -E                    reports on extent fragmentation
                  fragmentation reports are produced both before and after the reorganization (for -d, -e, -D, -E combinations)
                  -s                     summary of activity
                  -v                     verbose
                  -l largesize    extent reorg tries to group large files into large extents of 64 (default) blocks
                  -a days           show files not accesses within specific number of days as “aged” files (default is 14 days)
                  -t time             maximum length of time to run in seconds; fsadm exits when condition met
                  -p passes      number of passes (default is 5 passes); fsadm exits when condition met
                  -r rawdev       specify raw device
Section 5. Foundation Suite Architecture
5.01 Describe the role of the intent log in a VERITAS file system
The intent log records pending changes to the file system structure and writes the log records to disk in advance of the changes to the file system.  Once the intent log has been written, the other updates to the fs can be written in any order.  In event of system failure, VxFS fsck utility replays the intent log to recover a fs without completing a full structural check o f the entire fs.  The fs can be recovered and mounted only seconds after a failure.  Default intent log size is 16384 blocks.
5.02 Identify guidelines for selecting an intent log size to maximize file system performance
            Larger log sizes may improve performance for intensive synchronous writes, but may increase recovery time, memory                 requirements, and log maintenance time.  Log size should never be more than 50% of the             physical memory size of the             system.
5.03 Explain how to control logging behavior by using mount command options
            #mount -F vxfs  [generic_options]  [-o specific_options]  /mount_point
            -o options:
                log                          guarantees that all structural changes to the file system have been logged on disk when the                                                           system call returns; if system failure occurs, fsck replays recent changes so that they are not lost.
                delaylog                                default option that does not need to be specified.  Some system calls return before the intent log is                                                      written and the logging delay improves the system performance; some changes are not guaranteed until                                  shortly after system call returns when the intent log is written.  If a system failure occurs, recent                                                     changes may be lost.
                tmplog                   intent logging is almost always delayed.  This option greatly improves performance but recent changes                                                disappear if system crashes.  This mode is only recommended for temporary file systems.
                nodatainlog          use on systems with disks that do not support bad block revectoring
                blkclear                  used in increased security environments; guarantees that all storage is initialized before being                                                      allocated to files.  Increased integrity is provided by clearing extents on disk when they are allocated                                                 within a file.  10% slower than standard mode VxFS file system.
                logiosize=size      performance of storage devices using read-modify-write features improves if the writes are performed                                             in a particular size or multiple thereof.  Specify size in bytes of 512, 1024, 2048, 4096, 8192.
INTEGRITY              -o blkclear       -o log               -o delaylog                  -o tmplog    -> PERFORMANCE            -o nodatainlog                        
5.04 Interpret VxVM configuration database information, given examples
            configuration database quotas:
            By default, each disk group, VxVM maintains a minimum of 5 active database copies on the same controller.  If different                 controllers are represented on the disks in the same disk group, VxVM maintains a minimum of 2 active copies per             controller.
            To list detailed information from the configuration database about specific disks:
            # vxdisk -g diskgroup list  disk_name
5.05 Describe how to control the VxVM configuration daemon
            vxconfigd modes:
            Enabled—normal operating state
            Disabled—most operations not allowed
            Booted—part of normal system startup while acquiring rootdg
            Use vxdctl to control vxconfigd
•     To determine whether the configuration daemon is enabled / display the status:
            # vxdctl mode
•     To enable vxconfigd
            # vxdctl enable
            This command forces the configuration daemon to read all the disk drives in the system and to set up its tables to reflect     each known drive.  When a drive fails and the admin fixes the drive, this command enables VxVM to recognize the drive.
•     To disable | stop  vxconfigd
            # vxdctl disable | stop
            or #vxdctl -k stop to send a kill -9 to vxconfigd
•     To start vxconfigd:
            # vxconfigd
            Once started, vxconfigd automatically becomes a background process; by default vxconfigd issues errors to the console,                 but can be configured to issue errors to a log file.
Other stuff to know:
vxconfigd    when a system is booted, command vxdctl enable is automatically executed to start the VxVM configuration daemon—vxconfigd.  VxVM reads /etc/vx/volboot file to determine disk ownership and imports rootdg and other disk groups owned by host.  Vxconfigd reads kernel log to determine state of VxVM objects, the reads the configuration database on the disks, then uses the kernel log to update the state information of VxVM objects.
/etc/vx/volboot      contains a host ID that is used by VxVM to establish ownership of physical disks and is a list of disks to scan for the rootdg disk group
•       To view contents of volboot:
                # vxdctl list
•       To change host ID in volboot (if you change the host name in UNIX), and on all disks in disk groups currently imported on machine:
                # vxdctl hostid host_name
                # vxdctl enable
•       To recreate the volboot file if removed or invalidated:
              # vxdctl init [host_name]
3 types of VxVM disks:
simple disk—created dynamically in the kernel and has public and private regions contiguous inside a single partition
sliced disk—has separate slices for the public and private regions
NOPRIV disk—does not have a private region

Section 6. Introduction to Recovery
6.01 Describe how VxVM maintains data consistency after a system crash
Resynchronization—after system crash all mirrors in mirrored volumes contain exactly the same data and data and parity in RAID-5 volumes agree.  VxVM records when a volume is first written to and marks it as dirty; when all the writes have been completed, Volume Manager removes the dirty flag for the volume.  Only volumes marked as dirty when system reboots require resynchronization.
VxVM uses 2 types of resynchronization processes to maintain consistency of plexes in a volume:
Atomic-Copy resynchronization—sequential writing of all blocks of a volume to a plex
•      used in adding a new plex
•      reattaching a detached plex to a volume
•      online reconfiguration operations like moving or copying a plex, creating a snapshot, or moving a subdisk
Read-Writeback resynchronization—for volumes that were fully mirrored prior to a system failure, and there may be outstanding writes to the volume when system crashed
Plexes that were ACTIVE at crash are set to ACTIVE but volume is placed in SYNC or NEEDSYNC state
Read thread started for entire volume and blocks written back to other plexes
When resync is complete, SYNC flag changed to ACTIVE
To minimize resynchronization impact on performance use:
•          Dirty region logging for mirrored volumes
•          RAID-5 logging for RAID-5 volumes
•          FastResync for mirrored and snapshot volumes
•          SmartSync Recovery Accelerator for volumes used by database applications (resilvering)
6.02 Describe the hot-relocation process
vxrelocd     hot-relocation daemon; if redundancy failures are detected, vxrelocd automatically relocates affected data from mirrored or RAID-5 subdisks to spare disks/free space within disk group; notifies sys admin by email of relocation activity.
1.    vxrelocd detects disk failure
2.    administrator notified by email
3.    subdisks relocated to a spare
4.    volume recovery is attempted
Partial disk failure—redundant data on the failed portion of the disk is relocated and existing volumes on the unaffected portions of the disk remain accessible.  The disk is not removed from VxVM control and labeled FAILING rather than FAILED.  Before removing a failing disk for replacement, you must evacuate any remaining volumes on the disk.
Hot relocation is performed for redundant (mirrored or RAID-5) subdisks on a failed disk.  Nonredundant subdisks on a failed disk are not relocated, but sys admin is notified of the failure.
6.03 Identify the steps to manage spare disks
•     Create a spare disk for a disk group:
            # vxedit -g diskgroup set spare=on | off disk_media_name
•     Exclude a disk from hot relocation:
            # vxedit -g diskgroup set nohotuse=on | off disk_media_name
•     To force hot relocation to only use spare disks:
            Add spare=only to /etc/default/vxassist
•     To include spare disks in a space check:
            # vxassist -g diskgroup  -r  maxsize | maxgrow  layout=stripe
•     Reserve a set of disks for special purposes (unless specifically mentioned on vxassist make command line):
            # vxedit  set reserve=on | off  disk_name
6.04 Describe how to replace a failed disk
Temporary                             Failure                                                       Intermittent
Status=disabled                                disabled (vxdisk list)                                                   failing or io fail
Solution=turn power back on          disk replacement                                                         disk replacement
# devfsadm –C                                  # devfsadm –C                                                            # vxevac <source> <destination>
# vxdctl enable                                  # vxdctl enable                                                             (copies data off & reassociates objects)
# vxreattach –r                                   # vxdisksetup –i                                                           # vxdg -g <dg>  -k rmdisk  <failing_disk>
                                                              # vxdg -d <dg>  -k adddisk  <disk>=c#t#d#            (keeps object name & in removed state)
                                                              # vxrecover                                                                   # vxevac <source> <destination>
6.05 Return relocated subdisks back to their original disk, given commands
            # vxunreloc [-f]  [-g diskgroup] [-t tasktag] [-n diskname]  orig_diskname
            orig_diskname            disk where relocated subdisks originally resided
            -n diskname    unrelocates to a disk other than the original disk & specify new media name
            -f                      forces unrelocation if exact offsets are not possible

            To display all subdisks that were hot-relocated from a failed disk:
            # vxprint -g diskgroup  -se `sd_orig_dmname=”disk_name”’
6.06 Describe how to recover a volume
•      To reattach disks to disk group (for temporarily failed disks):
# vxreattach [-bcr]  [disk_name]
-b         performs in background
-c         checks to determine if reattachment is possible; no operation is performed
-r          attempts to recover stale plexes by invoking vxrecover
•      To recover specific volumes or all volumes on a disk:
# vxrecover [-bnpsvV] [-g diskgroup]  [volume_name | disk_name]
-b         performs in background
-n         starts volumes but does not perform recovery
-p         displays list of startable volumes
-s         starts disabled volumes
-v         displays info about each task started by vxrecover
-V         even more info
6.07 Describe tasks used to protect the VxVM configuration
•      To save VxVM disk group configuration database:
# vxprint -g diskgroup -hmQqr > backup.diskgroup 
(saves definition of volumes, plexes, subdisks, and diskgroup itself)
# vxprint -g diskgroup -hmvpsQqr > backup.diskgroup
(saves definition of volumes, plexes, and subdisks only)
•      To display entire disk group definition with all its objects:
# vxprint -D  -  -rht < backup.diskgroup
Section 7. Troubleshooting Foundation Suite
7.01 Identify examples of state changes in VxVM objects caused by I/O failure
            If VxVM can still access the private region on disk:
•      Disk marked as FAILING
•      Plex with affected subdisk is set with IOFAIL condition flag
•      Hot relocation relocates the affected subdisk, if enabled and if there is available redundancy
            If VxVM cannot access private region on disk:
•      Failed disk is detached and marked as FAILED
•      Plexes using that disk are changed to NODEVICE state
•      Nonredundant volumes on disk are disabled
•      If hot relocation is enabled, it is performed for redundant volumes
7.02 Explain how to resolve disk failures by using VxVM commands
Temporary                             Permanent Failure                                    Intermittent
Status=disabled                                disabled (vxdisk list)                                                   failing or io fail
Solution=turn power back on          disk replacement                                                         disk replacement
# devfsadm –C                                  # devfsadm –C                                                            # vxevac <source> <destination>
# vxdctl enable                                  # vxdctl enable                                                             (copies data off & reassociates objects)
# vxreattach –r                                   # vxdisksetup –i                                                           # vxdg -g <dg>  -k rmdisk  <failing_disk>
# vxrecover                                         # vxdg -d <dg>  -k adddisk  <disk>=c#t#d#            (keeps object name & in removed state)
                                                              # vxrecover                                                                   # vxevac <source> <destination>
7.03 Choose how to display state information for VxVM objects
            # vxprint -g diskgroup -ht
7.04 Interpret volume states, given appropriate information
Volume states:
EMPTY
CLEAN
ACTIVE
SYNC
NEEDSYNC—internal read thread has not been started
NODEVICE—bad drives
7.05 Interpret kernel states, given appropriate information
Kernel states:
ENABLED—object can transfer both system I/O to private region and user I/O to public region
DETACHED—object can transfer system I/O but not user I/O (maintenance mode)
DISABLED—no I/O can be transferred (offline state for volume or plex)