[OpenAFS] Re: Need volume state / fileserver / salvage knowledge

Andrew Deason adeason@sinenomine.net
Mon, 31 Jan 2011 11:36:24 -0600


On Mon, 31 Jan 2011 11:54:24 -0500
Steve Simmons <scs@umich.edu> wrote:

> > Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15
> > Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15
> > Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15
> > Wed Jan 26 12:58:19 2011: bos shutdown: fileserver failed to shutdown within 1800 seconds
> > Wed Jan 26 12:58:37 2011: fs:file exited on signal 9
> 
> We have seen similar issues. It occurs when there is a given vice
> partition where lots of clients have registered callbacks but those
> clients are no longer accessible. Not all the clients have responded
> when the 1800 second timer goes off, and the fileserver goes down
> uncleanly.

Also, in this specific case, it may not be just that shutting down
volumes took too long. 1.4.11 has known problems that can cause this
(e.g. the host list gets a loop in it, and something spins forever
trying to traverse the whole list).

-- 
Andrew Deason
adeason@sinenomine.net