[OpenAFS] Occasional "VLDB: no permission access for call"

Benjamin Kaduk kaduk@mit.edu
Sun, 28 Mar 2021 22:27:35 -0700

On Mon, Mar 29, 2021 at 03:23:42PM +1100, Ian Wienand wrote:
> Hello,
> A new thing I've noticed after we have upgraded everything to 1.8.6 is
> like the following:

What were you upgrading from?  (And I assume you mean 1.8.7?)

>  ~$ vos remove -server afs01.dfw.openstack.org -id 536870937 -partition a
>  Could not lock VLDB entry for the volume 536870937
>    VLDB: no permission access for call
>    VLDB: no permission access for call
>  Error in vos remove command.
>  VLDB: no permission access for call

This only happens if the code thinks that your authenticated identity is
not in the UserList on the server in question.  I find it unlikely that the
view of the file on disk is changing, so assume that something goes wrong
on the way to that.  (It looks like the SetLock operation is what fails
rather than the Delete itself, for what it's worth.)

>  $ vos remove -server afs01.dfw.openstack.org -id 536870937 -partition a
>  Volume 536870937 on partition /vicepa server afs01.dfw.openstack.org deleted
> The first failed, I literally pressed "up and enter" to try again and
> it then worked.  In a similar fashion I can run a loop of "vos
> release" and have random volumes fail to release, then work just fine
> on another try.
> None of our afsdb servers have anything in VLLog or PtLog (nothing at
> all since they started).  All my tokens are granted and valid, etc. --
> I mean it works one time and not another doing nothing at all in
> between.
> I have strace's of a failing and good "vos remove" attempt, if it
> would help.
> Any ideas?  Any suggestions what else to trace to see what's going on
> here?

strace on the client is unlikely to help.  A packet capture of the rx
challenge/response might, and an audit log from the vlserver might as well.