[OpenAFS] vos move operation stalls

Harald Barth haba@kth.se
Tue, 25 Mar 2014 09:25:54 +0100 (CET)

> I have installed a new fileserver, running Gentoo Linux, kernel 3.8.13,
> OpenAFS 1.6.5 (also tried 1.6.6). The server has two /vicep partitions,
> each with an XFS filesystem:
> * ~800GB /vicepa on the system disk - mdadm mirror with two SATA disks
> * 24TB /vicepb on an external RAID system, connected via iSCSI
> Moves to the /vicepa partitions work as expected. Moves to the /vicepb
> start, but after some time (usually 10-20sec) there is no progress anymore:
> - no disk i/o
> - package counters (vos status/rxdebug) stay constant forever
> - transactions do not time out within 3-4 days

That's very strange, as the AFS server processes do access the file
system as any other process would. So there is nothing special about
them. We have used XFS/Linux for AFS server /vicep* for a long time
(however not on iSCSI) and did not have any problems with it. We have
moved now to ZFS on local HD.

Can you strace the hanging process and see what syscall it's hanging on?

> Other fileserver/volserver operations on other volumes seem to be
> unaffected. 

At least something.

> Also, access to the iSCSI partition from the OS is still
> possible, there are no disk/iSCSI problems reported.

Does a find /vicepb/ run through completely?
Can you write files to /vicepa/TEST/ or something like that?

> Has anyone seen similar problems before? Does anyone have suggestions
> what I could try to debug the problem?

Have you tried something else that XFS?

have you tried to put the log part of XFS somewhere else? For performance,
you might want it on mirrored local HD.