[Port-solaris] EBUSY unmount check

Frank Batschulat (Home) Frank.Batschulat@Sun.COM
Tue, 27 Apr 2010 12:23:13 +0200

On Mon, 12 Apr 2010 23:08:09 +0200, Andrew Deason <adeason@sinenomine.net> wrote:

> Right now the OpenAFS solaris kernel module doesn't check if someone is
> accessing something in AFS when we .vfs_unmount, and solaris doesn't
> check for us, either. This has the effect of possibly panic'ing when we
> umount AFS (typically at shutdown). This has been brought up before:
> <http://www.openafs.org/pipermail/openafs-devel/2007-February/014853.html>,
> but I don't think anything ever came of it.
> I'd like to add a check, but I'm no expert on the Solaris VFS layer. At
> first, I thought that checking the vfs_count member of the given
> struct vfs* would work for this, but that always appears to be 1, even
> if we have files in AFS open at the time of unmounting. So, my
> understanding is that we must inc/dec that field for it to be useful for
> this, presumably with VFS_HOLD/VFS_RELE.
> If we VFS_HOLD in our (OpenAFS') .vfs_vget function, and VFS_RELE in our
> .vop_inactive function, would that make checking vfs_count in unmount be
> a sufficient check? Or should we just check the v_count of the vnode
> referenced by the struct vfs* given to us?
> The original way I was going to check for this was by checking all of
> our vnodes to see if they were in use, but that's slow. I presume a
> similar walk will be necessary to support force-unmounting, but I want
> to get EBUSY for regular unmounts first.

Sorry for the long time to respond, but I changed gears and no longer
work in file systems land so new project work takes precedence.

Yes, indeed, the old fashioned way would be like UFS does,
ie. walk along the list of active inodes in the UFS inode cache
(or the similar list on AFS) and check if the vnode is busy
(v_count > 1) and fail VFS_UNMOUNT() with EBUSY in the case
of a non-forcible umount operation. This is what ufs_unmount() still does
unfortunately. Thats not really nice.

The better way to do that is, as you've already guessed, use
vfs reference counting for that purpose. the VFS_HOLD()/VFS_RELE()
infrastructure was put in place in Solaris 8 mainly in order to
support forcible unmounts. A side effect if this is that it can be used
for unmount's busy check as well. This is what zfs_unmount() does.

Of course the file system has to implement code supporting that protocol,
that's why you see a vfs_count of 1 for your AFS vfs_t most of the time.

Here's a step-by-step guide how to make use of the VFS_HOLD()/VFS_RELE()
protocol inside the file system implementation. This is what I had
planned a long time ago for UFS but it never made it. This does also
allow in particular for implementing forcible umounts in a safe manner.

	- keep track of all vnode's that have been created via VFS_ROOT(),
	  VOP_MKDIR(), VOP_REALVP() and which are not released via VOP_INACTIVE()
	  by adding a corresponding hold to the vfs_t via VFS_HOLD()
          this will be essentially done at the place where you really go
          and create a new file system object and allocate the corresponding vnode
          via vn_alloc()

          This will bump up the vfs_t reference count for every active object.

        - As vnodes may continue to have references held in the rest of the system
	  after the VFS_UNMOUNT(MS_FORCE) has taken place, the filesystems top-level
	  VOP's will return EIO for these dangling vnodes except for VOP_INACTIVE()
	  called on them as a result of the last VN_RELE(), in which case VOP_INACTIVE()
	  should free the vnode and return and also release the corresponding hold
	  on the vfs_t via VFS_RELE(), that way such dangling vnodes are freed eventually 

          Consequently, leaving forcible umounts out of the picture, when you really
          go and destroy a file system object eventually via vn_free() you decrement
          the ref count using VFS_RELE()

        - care must be taken if you sustain a cache of file system objects and unreferenced
          but alife, inactive objects can change identify, ie. if you use vn_invalid()/vn_reinit().
          those places  also need to deal with VFS_HOLD()/VFS_FREE()

        - you should be implementing a VFS_FREEVFS() callback. once all references to a 
          vfs_t are gone, the filesystem independend vfs layer
	  will invoke VFS_FREEVFS() from VFS_RELE() so that the filesystem depended code
	  can do it any possible still pending internal cleanup work needed
          and eventually frees the private data vfs_t->vfs_data.

          This essentially allows you to safely support forcible umounts of file systems
          with active objects.

        - a net effect of implementing is that you can now check in your VFS_UNMOUNT()
          routine for active objects by just looking at your corresponding vfs_t reference
          count being > 1 and you can return EBUSY here for a non-forcible umount.

        - NB: when domount() allocates a vfs via vfs_alloc()/VFS_INIT() it is allocated
          with a reference count of 0, but domount() actually immediately does a VFS_HOLD()
          being the first and the last reference will go away in dounmount() by the framework.

hth, good luck!