[OpenAFS] vos move operation stalls

Volkmar Glauche volkmar.glauche@uniklinik-freiburg.de
Mon, 24 Mar 2014 13:47:33 +0100


Dear list,

I have installed a new fileserver, running Gentoo Linux, kernel 3.8.13,
OpenAFS 1.6.5 (also tried 1.6.6). The server has two /vicep partitions,
each with an XFS filesystem:
* ~800GB /vicepa on the system disk - mdadm mirror with two SATA disks
* 24TB /vicepb on an external RAID system, connected via iSCSI

Moves to the /vicepa partitions work as expected. Moves to the /vicepb
start, but after some time (usually 10-20sec) there is no progress anymore:
- no disk i/o
- package counters (vos status/rxdebug) stay constant forever
- transactions do not time out within 3-4 days

Other fileserver/volserver operations on other volumes seem to be
unaffected. Also, access to the iSCSI partition from the OS is still
possible, there are no disk/iSCSI problems reported.

When I try to stop the move operation manually or when bos tries to
execute the configured weekly server restart, there are volserver
threads that do not respond to any signal. In that situation, I also can
not use bos to restart/shutdown the fileserver anymore. The volserver
threads keep files on /vicepb open, forcing an unclean shutdown or
reboot of the entire server.

Has anyone seen similar problems before? Does anyone have suggestions
what I could try to debug the problem?

Best regards,

Volkmar