[OpenAFS] System hangs, OSX 10.6.8, OpenAFS 1.6.5

dorian taylor dorian.taylor.lists@gmail.com
Sun, 1 Dec 2013 12:06:43 -0800


On Sun, Dec 1, 2013 at 10:37 AM, Sergio Gelato
<Sergio.Gelato@astro.su.se> wrote:

> The subject line of this thread does contain exact (and hopefully accurat=
e)
> numbers.

Yes, that's correct. By my knowledge 1.6.5 is the latest binary for
OSX 10.6.x. (I haven't tried compiling a newer version to see if the
problem goes away on its own.)

>> This seems like a very strange problem, and I don't have any
>> concrete ideas off the top of my head.  In terms of trying random
>> things to see if they help, you might try explicitly setting a FILE:
>> credentials cache instead of using the default API: one.

I'll definitely give this a shot.

> I haven't seen that kind of behaviour either. My (vague) guess is that so=
me
> important part of the system is loaded out of AFS so that bad things happ=
en
> when it becomes inaccessible, but on second thought I can't come up with =
a
> scenario that actually fits all the reported details.

Yes, it seems to happen if a process is touching the AFS file system,
where "touching" could just mean reading or having read (e.g. an Emacs
session or Finder window). I'm not sure about the counterfactual; I've
yet to see it misbehave when I can guarantee my AFS cell remains
untouched.

There definitely are all sorts of things that touch the AFS cell (e.g.
my iTunes), the difference in experience, however, between 10.5 and
10.6, is that it wasn't fatal for AFS to go away for a second, or even
hours or days.

I don't know though: if a file on an AFS cell is open and the
authentication momentarily goes away, does that choke things out?

> One question, though: did the system have an OpenAFS client installed whi=
le
> running OSX 10.5, and if so, what version was that? (I'm assuming that th=
e
> OS upgrade was really an upgrade and not a full reinstall.)

Correct; I just installed the OpenAFS 1.6.5 package overtop of
whatever OpenAFS was there already, after I upgraded the system, which
hadn't been upgraded in a while. All the file mtimes in
/Library/OpenAFS however are either the day of the package release
(Jul 23) or the day I installed it (Sep 18). I suspected a conflict
but have yet to hunt down any other places OpenAFS hides files on OSX.

> Another clarification: does the system hang only when an attempt is made =
to
> renew the ticket and token, or is expiration of the old credentials a
> sufficient trigger by itself?

This isn't entirely clear. What I've been trying to do is just renew
the ticket/token by hand before the expiration, which usually
cooperates but not always. If it doesn't cooperate, it's as I
described: Any process that tries to open a file (maybe?) or socket
(definitely) will hang. Existing sockets stay open; new sockets can't
be created; trying to kill a program with an open socket will also
hang. All standard tools capable of diagnosing the problem hang, as
does shutdown/reboot.

What I haven't been able to determine is if it's explicitly kinit or
aklog that's causing the problem, or if the system is already messed
up by the time I try.

> Oh, one more thing: what's the lifetime of your ticket and token? (I'm
> wondering whether that window of a few minutes after the credentials appe=
ar
> to have expired could be due to the token lifetime having been rounded up
> to the next representable value.)

Both 10 hours, the default from the server (which is in the same room,
Ubuntu somethingorother, running OpenAFS 1.6.1 and MIT Kerberos 1.10,
plus an ntpd the rest of the network uses). The logs on the KDC don't
show anything interesting=E2=80=94in fact the TGT is getting automatically
renewed by OSX (in the middle of the night while I'm sleeping). That
implies that the AFS token is getting destroyed. I don't know the
exact mechanism by which the AFS token gets renewed, but I can say
that it never "worked", even in 10.5=E2=80=94at least in conjunction with t=
he
OS's native Kerberos ticket renewing behaviour. The difference,
however, is that if the AFS token didn't get renewed in OSX 10.5, that
didn't freeze the machine.

(For the record, I also double-checked mdutil to make sure Spotlight
wasn't trying to index the AFS cell.)

Thanks,

--=20
Dorian Taylor
http://doriantaylor.com/