[OpenAFS] Re: OS X 10.5 oddity

Joel hashbang@gmail.com
Wed, 8 Apr 2009 03:55:06 -0500


An update: this still occurs using 1.4.9rc1

Perhaps this is an OS issue, unless OpenAFS is dealing with
checksumming the packet.
Any ideas (or similar experiences) would be appreciated.

Thanks

On Sun, Feb 22, 2009 at 3:01 AM, Joel <hashbang@gmail.com> wrote:

I'm at a loss of where to even turn to about this problem. I've used
OpenAFS on OSX for years, but recently I've been having some strange
issues. At seemingly random points in AFS operations, my client will
hang and ultimately contact to the fileserver will be lost, then
restored soon after.

Going into detail...

Using the OpenAFS 1.4.8 OSX package, with OSX 10.5.6.  The
machine is a Mac Pro, using the onboard Intel8254X NIC.

When the client hangs, dmesg repeats this until the "afs: Lost contact
with file server..." message:
in_delayed_cksum_offset: ip_len 51200 (200) doesn't match actual length 214

which means pretty much what it says, that the length in the packet
doesn't match the actual on-wire length. It seems to always be 14
bytes shy (the size of the UDP header??). This packet never makes it
through, so I assume the client times out waiting for it and triggers
the "lost contact" state.

Also, when I watch the traffic on both the client and fileserver, the
offending packet's checksum is wrong.

One more data point I captured, the packet sizes are either 206, 208,
209, 210, 212, 213, 214, or 218 looking back at two weeks of logs.

If anyone can provide insight to this, I would love to hear it. I'd be
willing to test/try anything since this is an annoying problem.

Thanks, Joel