[OpenAFS] Re: [OpenAFS-announce] OpenAFS client crashes on RHEL 5.10 and RHEL 6.5

Tue, 11 Mar 2014 14:54:29 -0400

On Tue, Mar 11, 2014 at 11:33:37AM -0500, Andrew Deason wrote:
> [Moving from openafs-info to openafs-devel.]
> 
> On Tue, 11 Mar 2014 11:45:47 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > 	- d_materialise_unique: which was originally designed which
> > 	  distributed filesystems in mind, which have the special issue
> > 	  that the directory tree can be modified by other nodes without
> > 	  their knowledge, so they need to be ble to deal with lookup
> > 	  encountering an inode which was previously known under some
> > 	  other name.
> 
> I'm not clear on if this really helps, as I'm not completely sure what
> the effects of this are, but maybe you can clarify. It seems to just
> 'move' the dentry from one location to another, but I'm not sure if
> that's all. Say we have the following AFS mountpoints (which appear as
> directories, not e.g. vfsmounts or anything like that). These can be
> anywhere in the hierarchy:
> 
> /afs/.../parent1/dir1
> /afs/.../parent2/dir2
> 
> 'dir1' and 'dir2' are AFS mountpoints to the same place. So, they are
> effectively the same directory.
> 
> So if someone accesses dir1, say they have a reference to dentry
> 0x1234, which has d_name "dir1" and d_parent pointing to parent1.
> And say someone accesses dir2 while dir1 is still held open. With
> d_materialise_unique, it looks like they also get dentry 0x1234, but it
> is 'moved' so now its d_name is "dir2" and its d_parent points to
> parent2. Is that correct?

That's right.  (There's a yuchy corner case which makes this a little
unreliable: the lookup operation already has the i_mutex on the parent
directory, and we need other locks to do the move succesfully, but we
would have needed to acquire them in a different order--so
d_materialise_unique instead makes nonblocking attempts to get those
locks (these are the mutex_trylock's in __d_unalias) and returns -EBUSY
if that fails.  So you can get EBUSY on the lookup if there happens to
be, say, a rename going on at the same moment.)

> It seems like that does not get rid of the problem, since someone (call
> it P1) could lookup dir1, and get back dentry 0x1234. Then someone else
> looks up dir2 and renames/moves dentry 0x1234. P1 now has a reference to
> dentry 0x1234, whose d_parent does not match the parent1, the directory
> where we looked up 'dir1'. As we've seen, that can cause a panic on the
> box (I'm glossing over how, if it's not readily apparent I can go into
> more detail.)
> 
> I would also guess this causes an ELOOP error when trying to access a
> mountpoint inside itself. That is maybe not the worst thing in the
> world, but does preclude access which was previously allowed.
> 
> And just to be clear, I wasn't really expecting there to be some other
> function to call that would make all of this "okay". I am working on
> something to use actual mount points (that is, not "AFS mountpoints",
> but 'struct vfsmount'/'struct mount' mountpoints). It is just a little
> difficult since we cannot use the builtin kernel functions to create new
> mountpoints.

Right, the vfs depends on the assumption that the directory tree is
really a tree.  d_splice_alias as currently implemented can violate that
assumption by creating multiple dentries pointing to the same directory,
but that really can cause deadlocks down the road, so it's a bug we
intend to fix.

Other filesystems, like NFS, do as you describe and create mountpoints.

--b.