[OpenAFS] aklog Lies OR Cache Manager Not Storing Tokens

Randy Kemp rkemp@srhs.net
Mon, 24 Nov 2008 14:32:22 -0500

I'm running openafs 1.4.7 client on Ubuntu Intrepid.  It's running on a
multi-user application server where all the users connect from thin
clients via SSH sessions, in other words LTSP 5.  I'm using
pam_afs_session to get tokens at login.  I'm having an intermittent
problem where users will sometimes log in and not get an AFS token. 
Since it's trying to load a graphical session, the users in effect can't
log in because their home directory can't be accessed.  When this starts
happening for users it tends to occur for almost all users.  Users that
are already logged in don't loose their tokes.  Restarting the app.
server will fix it but sometimes it just resolves itself after a while. 
If I have a user log in to a shell via SSH they can manually exec
'aklog' and then it will work fine, even if they log out and back in. 
Once it starts occurring, the users that are already logged in can
typically log out and back in and get their token without a problem but
if they exec 'unlog' before logging out the won't get a token when they
attempt to log back in.

I've enabled debugging for pam_afs_session which shows that 'aklog' is
being called.  I've even made a shell script which I've set
pam_afs_session to call instead that execs 'aklog -d' and logs the
output and confirmed that 'aklog' is (or says it is) always getting a
token.  Yet the output of 'tokens' shows that the cache manager is not
holding a token.  So is 'aklog' not really getting the token even though
it says it or is the cache manager not storing it for some reason?

Another problem that may be related, some may remember my messages from
last month where the cache manager would hang causing all of the user
sessions to in effect hang.  I discovered by accident that if I run
'cmdebug' it will cause the cache manager to un-hang.  Since I had no
idea of any other way to fix the problem I just set a cron job to run
'cmdebug localhost' every minute and I haven't had the cache manager
hang since then unless I remove the cron job.  Yeah, I know that's an
ugly hack of a semi-fix but it does work, the question is why.

Any clues on the token problem?

Randy Kemp