[OpenAFS] Windows client and firewall/antivirus/MTU conflict - possibility of
file corruption
Richard Brittain
Richard.Brittain@dartmouth.edu
Tue, 12 Jan 2010 12:40:31 -0500 (EST)
Hi all,
we have noticed file corruption in certain circumstances when using the
OpenAFS for Windows client in combination with antivirus products and
non-default network MTU settings.
Isolating exactly what factors _must_ be present has been a challenge, but
with tremendous help from Jeff Altman, we think we are close enough now to
report a general warning.
The combination of properties we have investigated most thoroughly is:
Windows XP (32-bit)
Symantec EndPoint Protection 11.0.4 firewall (Network Threat Protection)
OpenAFS 1.5.34 or later
Windows network interface with a non-default MTU set
Only the 'Network Threat Protection' component of Symantec EndPoint
Protection (SEP) affects OpenAFS. Disabling that software makes the
problem go away. It seems that SEP 11.0.5 (latest) does NOT cause
corruption, although we find terrible performance issues with that
version. It also seems to be necessary that the network MTU be explicitly
set. Several of our problematic systems turned out to have this set to
1300, although this parameter isn't exposed through the normal Windows
tools. It is a registry setting that 3rd party network tuning tools might
tweek though. We don't know yet what range of MTU causes a problem, or if
just setting it at all causes the problem.
The relevent change with OpenAFS 1.5.34 is that, prior to that version,
the client internally capped packet size at 1260 to work around problems
with some VPNs, and that reduced packet size may have been shielding this
problem. The default behaviour since then is to use whatever Windows
reports for MTU.
Other observations:
- The corruption appears as blocks of changed bytes in the file, often a
multiple of 168 adjacent bytes, and shows up clearly if checksums are
performed, or the file format has inherent checksums, but might be subtle
and hard to detect if there is nothing to compare to. The corruption
occurs only on file writes to AFS, never reads.
- Small files (<10MB) were never affected. Files > 50MB had a high
probability of corruption.
- The corruption is similar in general characteristics, but different in
detail, each time we copy a file.
- Re-reading a file just written to AFS, and small enough to be entirely
in cache, gives a correct checksum on the Windows client, but a wrong one
from any other client, implying the data were changed between the cache
manager and the network.
- We also tried some variants of Windows (XP 32/64, Vista 32/64) running
under Parallels on a Mac. Parallels has its own 'security' tool which
implements antivirus/firewall functionality, although not much
documentation about exactly what. Some of the tests with parallels
security turned on _also_ generated file corruption. The Windows guest
did not have SEP firewall installed. The Parallels tests were all done
with 1.5.66. We haven't been able to do much testing of the parallels VMs,
but the corruption was superficially similar to that produced by SEP --
isolated blocks of adjacent changed bytes scattered around the file. As
far as I can tell Parallels for Mac licenses 'Kaspersky' antivirus
software - it doesn't use Symantec behind the scenes.
Given that other software and registry settings on the client systems are
largely beyond our control, we'll probably be setting the RxMaxMTU
parameter to its old value of 1260 as a workaround for now.
The relevent registry keys to look for are:
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<adapter-id>\MTU
(not present by default)
and
HKLM\System\CurrentControlSet\Services\TransarcAFSDaemon\Parameters\RxMaxMTU
(default value is 0)
The MTU on the real network interface is the relevent one (not the
loopback adapter)
Richard
--
Richard Brittain, Research Computing Group,
Kiewit Computing Services, 6224 Baker/Berry Library
Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085