[OpenAFS] File server, bos salvage hang
Miles Davis
miles@cs.stanford.edu
Fri, 5 Nov 2004 20:14:12 -0800
Over the past couple of days, one of my file servers (RedHat Linux 9,
openafs 1.2.11, nothing custom, LD_ASSUME_KERNEL=2.4.1 is set) has
developed an annoying problem.
On occasion, we have the classic gconf problem, where for reasons I don't
know (but have heard have been fixed in 1.3.X) where a user's gconf lock
file .gconfd/lock/ior becomes corrupt and/or unusable, requiring a salvage
of the volume. Normally, not a big deal, it happens only rarely. However,
I've got a file server that I can no longer salvage volumes on; Running
bos salvage <server> <part> <vol> never finishes, and the file server is
never quite the same again until a restart (killing the file server) or
reboot. By "never quite the same" I mean things like 'vol listvol' fails,
though the file server it sill working for volume other than the one being
salvaged. I haven't seen this behaviour with any of our other file
servers, ever.
Before starting another salvage, I turned on logging via kill -TSTP
<fileserver>, but I don't see anything standing out. Maybe somebody else
does. I let it run in debug for about 10 minutes and then turned it off
again. The result is at http://cs.stanford.edu/people/miles/FileLog.
--
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University