[OpenAFS] Re: perl afs module question

Norbert Gruener nog@MPA-Garching.MPG.DE
Wed, 16 Apr 2003 12:26:01 +0200


Hi David,

On Tue, Apr 15 2003, David Botsch wrote:
> 
> I have been trying to use your perl afs module to allow a user to 
> change his or her password.
> 
> However, it is failing with the following error when I look at 
> $AFS::CODE
> 	"ticket contained unknown key version number"
[snipped]
> Thanks for any insights you can provide!

in principle I do have the same problem.  But in 1996 there was a
thread about that problem which explained what is going on.

-------------------  thread 1996 --------------------------------
From: Marcus Watts <mdw@umich.edu>
To: schemers@stanford.edu
Cc: info-afs@transarc.com
Subject: Re: unknown key version number... 
Date: Wed, 31 Jul 96 21:56:06 -0400

schemers@stanford.edu writes:

> 
> Hi. We have some account creation scripts that run out cron every
> night (in a pagsh) that grab an admin token and start creating users,
> volumes, etc. After a while the script fails with the following error:
> 
> Creating user xxxxxxx  : [rxk] ticket contained unknown key version number
> 
> Anyone know why we get this error?

and

> I also forgot to mention that each create is done using 
> "kas create ... -pass ..." (its on a secure server), so every
> create gets a new token. I'll probably rewrite things to use
> a custom "kas" command that grabs the admin DES] key from a srvtab and
> creates multiple accounts in one fail swoop, checking for errors and
> getting a new token if need be.
> 
> I'm still puzzled as to why the "kas create" command would fail since
> each creation is done with a fresh token.

That message sounded so familiar, and now I know why!  (I think...)

It definitely has to do with running lots of those commands right in a
row.  There are, indeed, some vaguely evil things about all this.
The first is that each kas command is creating a separate connection.
This is, actually, the root of the problem.  Those connections
don't go away when the kas command exits, but hang around for
"a while" in kaserver, consuming bits of server memory & such in the meantime.
That isn't so bad in itself (other than slowing the server down), but
I recall some other problems somewhere, that caused the server to somehow
eventually become "confused" about which connection a packet belongs to.
The end result is that you're getting that message because an old useless
connection is snagging the packet and becoming unhappy.

There is also another problem, somewhere in ubik, that means when you
do tons of back to back operations, eventually, one of them is going
to hit a bad timing case, and not work.  I vaguelly recall a fix in 3.4
that "improves" this, but doesn't make it perfect.

So, the following two things will definitely help:

	(1) batch the kas operations up, and run a bunch of
	them with one "kas" command.  Don't run "too many",
	because while they're running, you are hurting your
	kaserver's performance.  10-30 may be a good number,
	depending on the size of your cell.  Sleep a while
	between each batch.  This should eliminate the unknown
	key version problem, but you will still see other
	occasional problems.

	(2) look for failures, & retry them, perhaps after a
	suitable short delay.

The "custom" program is definitely a useful approach.  At UM,
uniqname is our answer to the whole problem of dealing with
the whole mess.

The "pts" command doesn't come with an "interactive" mode, unlike
kas, so it's not so easy to batch "pts" commands up.  We ended
up adding an "interactive" mode to pts, & a "sleep" command, so
that we could run scripts that add lots of users to groups.  Also,
our ptserver uses a "ubik" that is just a little bit different...

				-Marcus Watts
				UM ITD PD&D Umich Systems Group

From: auvenj@vnet.ibm.com
To: info-afs@transarc.com
Subject: unknown key version number...
Date: Wed, 31 Jul 96 13:20:52 PDT


 >Hi. We have some account creation scripts that run out cron every
 >night (in a pagsh) that grab an admin token and start creating users,
 >volumes, etc. After a while the script fails with the following error:
 >
 >Creating user xxxxxxx  : [rxk] ticket contained unknown key version number
 >
 >Anyone know why we get this error?
 >
 >thanks, roland

 I can't help with the "why" but I can say that we have received this error
 also when creating accounts with a shell script.  What we had to do was
 parse the output for this error and, if it occurred, try the operation again
 up to 10 times after a sleep of 10 seconds.  This seemed to give us reliable
 functionality.

                           ...Jason Auvenshine
                           (auvenj@vnet.ibm.com)

 ISSC Tucson/San Jose AFS Team
-------------------  thread 1996 --------------------------------

So my solution to this problem is the following

   - create a KAS instance: $kas = AFS::KAS->AuthServerConn(...);

   - do your KAS action: $ok = $kas->ChangePassword(...)

   - check the return code if the action was unsuccessful

   - if so, destroy your KAS instance and sleep for 5 seconds: undef $kas; sleep 5;

I loop over these four steps for maximum 5 times.  Then I abort that
task if it still was not successful.  With this procedure that problem
has never again shown up.

I hope this helps you to solve your problem.  Otherwise you should
send my your script and I will have a look into it.

Cheers,

Norbert
-- 
Ceterum censeo          | PGP encrypted mail preferred.
Redmond esse delendam.  | PGP Key at www.MPA-Garching.MPG.de/~nog/