[OpenAFS] Re: Need volume state / fileserver / salvage
knowledge
Stephen Joyce
stephen@physics.unc.edu
Mon, 31 Jan 2011 12:17:04 -0500 (EST)
On Mon, 31 Jan 2011, Steve Simmons wrote:
> We have seen similar issues. It occurs when there is a given vice
> partition where lots of clients have registered callbacks but those
> clients are no longer accessible. Not all the clients have responded when
> the 1800 second timer goes off, and the fileserver goes down uncleanly.
>
> We have about 235,000 volumes spread across 40 vice partitions. Our 'fix'
> is a combination of lengthening that timeout to a 3600 seconds and
> keeping our vice partitions no longer than 2TB. Active partitions are
> spread roughly equally across those 40 partitions. But that's just a
> stopgap; the longer a server stays up, the more likely it accumulates
> dead callbacks.
Assuming this is true, isn't this a good argument to keep the weekly server
process restarts?
Cheers,
Stephen