[OpenAFS] Odd behavior during vos release
Wed, 9 Nov 2011 17:31:23 -0500
In addition to Andrew's questions, something else that would be
useful: run the release
in verbose mode, and tell us what messages correspond with these time point=
On Wed, Nov 9, 2011 at 2:38 PM, Kevin Hildebrand <firstname.lastname@example.org> wrote:
> We've been having unusual slowness and hangs at times on some of our
> fileservers, and I think I have a handle on the sequence of events, if no=
> the cause. =A0I could use some assistance in filling in the gaps so I can=
> if we can fix things.
> Right now, I have a heavily used volume (by many clients) that is release=
> on a frequent basis (as often as every ten minutes). =A0This volume has t=
> read-only replicas. =A0The volume is about 200MB in size.
> What I'm observing is that as soon as the vos release begins, one or more=
> the readonly replicas start accumulating connections in the 'error' state=
> =A0FileLog shows incoming FetchStatus RPCs to that replica are not being
> answered. =A0If this condition occurs long enough, all of these connectio=
> eventually fill up the thread pool and the fileserver stops serving data =
> everything else.
> At some point, up to five minutes later, as the release proceeds, the
> replica in question gets marked offline by the release process. =A0At thi=
> time, all of the stuck RPCs get 'FetchStatus returns 106' (VOFFLINE), at
> which point the connection pool clears, and life on the fileserver return=
> to normal.
> What I can't figure out is what's going on during the time the RPCs are
> hung, and why the connections show 'error'. =A0(How does one determine wh=
> the error condition is, when viewing rxdebug output?)
> Why would an RO replica be hung during a vos release?
> Any clues on where to look next would be appreciated.
> Kevin Hildebrand
> University of Maryland, College Park
> Office of Information Technology
> OpenAFS-info mailing list