[OpenAFS] Windows client and firewall/antivirus/MTU conflict - possibility of file corruption

Richard Brittain Richard.Brittain@dartmouth.edu
Tue, 12 Jan 2010 12:40:31 -0500 (EST)

Hi all,
   we have noticed file corruption in certain circumstances when using the 
OpenAFS for Windows client in combination with antivirus products and 
non-default network MTU settings.

Isolating exactly what factors _must_ be present has been a challenge, but 
with tremendous help from Jeff Altman, we think we are close enough now to 
report a general warning.

The combination of properties we have investigated most thoroughly is:

  Windows XP (32-bit)
  Symantec EndPoint Protection 11.0.4 firewall (Network Threat Protection)
  OpenAFS 1.5.34 or later
  Windows network interface with a non-default MTU set

Only the 'Network Threat Protection' component of Symantec EndPoint 
Protection (SEP) affects OpenAFS.  Disabling that software makes the 
problem go away.  It seems that SEP 11.0.5 (latest) does NOT cause 
corruption, although we find terrible performance issues with that 
version.  It also seems to be necessary that the network MTU be explicitly 
set.  Several of our problematic systems turned out to have this set to 
1300, although this parameter isn't exposed through the normal Windows 
tools.  It is a registry setting that 3rd party network tuning tools might 
tweek though.  We don't know yet what range of MTU causes a problem, or if 
just setting it at all causes the problem.

The relevent change with OpenAFS 1.5.34 is that, prior to that version, 
the client internally capped packet size at 1260 to work around problems 
with some VPNs, and that reduced packet size may have been shielding this 
problem.  The default behaviour since then is to use whatever Windows 
reports for MTU.

Other observations:

- The corruption appears as blocks of changed bytes in the file, often a
multiple of 168 adjacent bytes, and shows up clearly if checksums are
performed, or the file format has inherent checksums, but might be subtle
and hard to detect if there is nothing to compare to.  The corruption
occurs only on file writes to AFS, never reads.

- Small files (<10MB) were never affected.  Files > 50MB had a high 
probability of corruption.

- The corruption is similar in general characteristics, but different in 
detail, each time we copy a file.

- Re-reading a file just written to AFS, and small enough to be entirely 
in cache, gives a correct checksum on the Windows client, but a wrong one 
from any other client, implying the data were changed between the cache 
manager and the network.

- We also tried some variants of Windows (XP 32/64, Vista 32/64) running 
under Parallels on a Mac.  Parallels has its own 'security' tool which 
implements antivirus/firewall functionality, although not much 
documentation about exactly what.  Some of the tests with parallels 
security turned on _also_ generated file corruption.  The Windows guest 
did not have SEP firewall installed.  The Parallels tests were all done 
with 1.5.66. We haven't been able to do much testing of the parallels VMs, 
but the corruption was superficially similar to that produced by SEP -- 
isolated blocks of adjacent changed bytes scattered around the file.  As 
far as I can tell Parallels for Mac licenses 'Kaspersky' antivirus 
software - it doesn't use Symantec behind the scenes.

Given that other software and registry settings on the client systems are 
largely beyond our control, we'll probably be setting the RxMaxMTU 
parameter to its old value of 1260 as a workaround for now.

The relevent registry keys to look for are:

(not present by default)


(default value is 0)

The MTU on the real network interface is the relevent one (not the 
loopback adapter)


Richard Brittain,  Research Computing Group,
                    Kiewit Computing Services, 6224 Baker/Berry Library
                    Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085