[OpenAFS] Connection timed out?

Harald Barth haba@kth.se
Wed, 11 Mar 2009 18:20:42 +0100 (CET)


> During this test we encounter 'Permission denied' errors, which seem to
> coincide with 'kernel: afs: failed to store file (110)' entries in
> /var/log/messages. 110=Connection timed out. The fileserver is busy but
> responsive, about 25 builds (out of 50) complete normally.

I don't know if this is a coincidence or not. I have 1.4.8 clients
that does not behave against a 1.4.2 (yeah, I know...) server:

Mar 11 13:21:18 a03c11n14 kernel: afs: Waiting for busy volume 537086116 (prj.sbc.aronh.13) in cell pdc.kth.se
Mar 11 13:21:20 a03c11n14 kernel: afs: failed to store file (network problems)
Mar 11 13:23:33 a03c11n14 last message repeated 3 times
Mar 11 13:25:26 a03c11n14 last message repeated 4 times
Mar 11 13:27:23 a03c11n14 last message repeated 4 times
Mar 11 13:29:30 a03c11n14 last message repeated 4 times
Mar 11 13:31:37 a03c11n14 last message repeated 4 times
Mar 11 13:33:39 a03c11n14 last message repeated 4 times
Mar 11 13:35:36 a03c11n14 last message repeated 4 times
Mar 11 13:37:34 a03c11n14 last message repeated 4 times
Mar 11 13:39:38 a03c11n14 last message repeated 4 times

Then silence.

Console said something like:
	Call Trace: ... system_call+0x7e/0x83
	.... do_sys_open+0x5c/0xbe
.. Kernel panic - not syncing: Fatal exception

As this is (eh, was) a parallell job several but not all clients
involved did crash like this. Unfortunately, I have no way how to
repeat. I have moved the volume to a 1.4.8 server to start with.

Harald.