[OpenAFS] Re: advice on troubleshooting blocked cache manager on MacOS?

Derrick Brashear shadow@gmail.com
Wed, 27 Jan 2010 12:30:49 -0500

On Wed, Jan 27, 2010 at 12:10 PM, Adam Megacz <adam@megacz.com> wrote:
> Derrick Brashear <shadow@gmail.com> writes:
>>> I might be able to try that, but it will take a few days.
>> if true, you should see output in cmdebug now
> Okay, I just caught it red-handed. =A0Can anybody help with reading the
> tea leaves here?
> =A0megacz@quine:~$cmdebug localhost
> =A0Lock afs_xvcache status: (none_waiting, write_locked(pid:11013 at:335)=

=A0 =A0 =A0 writelocked =3D (0 =3D=3D NBObtainWriteLock(&afs_xvcache, 335))=

in afs_vop_reclaim

xvreclaim not held, which means we're presumably in afs_FlushVCache.

> =A0Lock afs_xserver status: (none_waiting, 1 read_locks(pid:0))

somewhere has afs_xserver read locked. for obvious reasons we can't
track these. no one's blocked on it.

> =A0Lock afs_xvcb status: (writer_waiting, write_locked(pid:0 at:273), 1 w=

=A0 =A0 =A0 =A0ObtainWriteLock(&afs_xvcb, 273);

is in afs_FlushVCBs (called with lockit true). assuming you're not
running disconnected and actively trying to disconnect, this is the
system daemon which does this (afs_Daemon). that also explains
"pid:0". We don't know who's waiting, but only this, QueueVCB and
RemoveVCB actually *get* afs_xvcb.

So, let's be clever. FlushVCache? Calls QueueVCB. So we can assume
it's blocking.

So then the question is why FlushVCBs is blocking you. well, you said
you had multihomed fileservers.

RXAFS_GiveUpCallBacks is called here. you didn't perchance grab
rxdebug output for the client at this point? (no is fine, this is
probably the answer)

so, presumably (and now from memory, i'm not looking at the code) you
block for like a minute while it times out a fileserver, then it fails
over to another address, afs_Analyze returns shouldretry=3D1, you look,
afs_ConnByHost probably gets the other address, and the loop proceeds
and wins.

could we address this? yes! how? well, i suppose we could on network
events (macos has support for this) and when a new server is
discovered, probe all addresses, so any unreachable addresses are
marked down in advance.