Tuesday, June 10, 2014

e1000g_LSO_BUG_6909685_description

Large Send Offload
Large Send Offload (LSO) is a hardware off-loading technology. LSO off-loads TCP Segmentation to NIC hardware to improve the network performance by reducing the workload on the CPUs. LSO is helpful for 10Gb network adoption on systems with slow CPU threads or lack of CPU resource. This feature integrates basic LSO framework in Solaris TCP/IP stack, so that any LSO-capable NIC might be enabled with LSO capability.

Bug ID 6909685
Synopsis TCP/LSO should imply HW cksum
State 11-Closed:Will Not Fix (Closed)
Category:Subcategory kernel:tcp-ip
Keywords BOP
Responsible Engineer Jonathan Anderson
Reported Against s10u9_02 , s10u8_fcs , solaris_10u8
Duplicate Of Introduced In Commit to Fix Fixed In Release Fixed Related Bugs 6838180 , 6855964 , 6908844
Submit Date 11-December-2009
Last Update Date 5-March-2010
Description
As described in 6908844, customer disabled HW cksum in IP stack by /etc/system modification:

set ip:dohwcksum=0

As result, the LSO packets are dropped by NIC driver as NIC driver believes this is
invalid. When sending TCP over LSO, the IP stack must guarantee that hwcksum
is either set to a) or b):

a) HCK_IPV4_HDRCKSUM + HCK_PARTIALCKSUM
b) HCK_FULLCKSUM

Depending on hw capabilities announced by NIC driver.

In 6908844 the root cause of TX stall is that the NIC driver gets from IP stack TCP LSO packet without hwcksum flag. It says that the checksum fields are already set, which is not correct. Large LSO packet is sent by ethernet controller as multiple frames with size <= MTU size. Each frame has unique checksum values and they need to be calculated by hw:

- IP checksum is different in each frame
  (as IP ident field is sequentially increased in each frame)
- TCP checksum is different in each frame
  (as each frame might contain different payload, although
  the TCP pseudo header is same for each frame)

When the hwcksum is disabled in the NIC driver (e.g. in .conf file),
the NIC driver also disables LSO automatically. Perhaps similar check should
be done in IP stack.
The stack should not do such check. If the user modify dohcksum to 0, it should understand the impact to the system. If the driver drop the packets, it's problem of the driver, a LSO packet including software-calculated checksum is valid.
Work Around
If hardware checksum needs to be disabled:

If the device itself supports this (e.g. e1000g) then do it there. This will
implicitly disable LSO.

If the IP variable dohwcksum has to be used then explicitly disable LSO at the
same time with:

ndd -set /dev/ip ip_lso_outbound 0
Comments
N/A


…it is also possible to disable tcp offloading in the e1000 driver directly:

/kernel/drv/e1000g.conf:

tx_hcksum_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
lso_enable=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;

update_drv, unplumb, plumb or reboot.

1 comment:

  1. It’s really great information for becoming a better Blogger. Keep sharing, Thanks. For more details to visit UNIX Administration in SAS

    ReplyDelete