[Port-solaris] EBUSY unmount check

Frank Batschulat (Home) Frank.Batschulat@Sun.COM
Thu, 29 Apr 2010 09:11:02 +0200


On Tue, 27 Apr 2010 12:23:13 +0200, Frank Batschulat (Home) <Frank.Batschulat@sun.com> wrote:

> On Mon, 12 Apr 2010 23:08:09 +0200, Andrew Deason <adeason@sinenomine.net> wrote:
>
>> Right now the OpenAFS solaris kernel module doesn't check if someone is
>> accessing something in AFS when we .vfs_unmount, and solaris doesn't
>> check for us, either. This has the effect of possibly panic'ing when we
>> umount AFS (typically at shutdown). This has been brought up before:
>> <http://www.openafs.org/pipermail/openafs-devel/2007-February/014853.html>,
>> but I don't think anything ever came of it.
>>
>> I'd like to add a check, but I'm no expert on the Solaris VFS layer. At
>> first, I thought that checking the vfs_count member of the given
>> struct vfs* would work for this, but that always appears to be 1, even
>> if we have files in AFS open at the time of unmounting. So, my
>> understanding is that we must inc/dec that field for it to be useful for
>> this, presumably with VFS_HOLD/VFS_RELE.
>>
>> If we VFS_HOLD in our (OpenAFS') .vfs_vget function, and VFS_RELE in our
>> .vop_inactive function, would that make checking vfs_count in unmount be
>> a sufficient check? Or should we just check the v_count of the vnode
>> referenced by the struct vfs* given to us?
>>
>> The original way I was going to check for this was by checking all of
>> our vnodes to see if they were in use, but that's slow. I presume a
>> similar walk will be necessary to support force-unmounting, but I want
>> to get EBUSY for regular unmounts first.
>
> Sorry for the long time to respond, but I changed gears and no longer
> work in file systems land so new project work takes precedence.
>
> Yes, indeed, the old fashioned way would be like UFS does,
> ie. walk along the list of active inodes in the UFS inode cache
> (or the similar list on AFS) and check if the vnode is busy
> (v_count > 1) and fail VFS_UNMOUNT() with EBUSY in the case
> of a non-forcible umount operation. This is what ufs_unmount() still does
> unfortunately. Thats not really nice.
>
> The better way to do that is, as you've already guessed, use
> vfs reference counting for that purpose. the VFS_HOLD()/VFS_RELE()
> infrastructure was put in place in Solaris 8 mainly in order to
> support forcible unmounts. A side effect if this is that it can be used
> for unmount's busy check as well. This is what zfs_unmount() does.
>
> Of course the file system has to implement code supporting that protocol,
> that's why you see a vfs_count of 1 for your AFS vfs_t most of the time.
>
> Here's a step-by-step guide how to make use of the VFS_HOLD()/VFS_RELE()
> protocol inside the file system implementation. This is what I had
> planned a long time ago for UFS but it never made it. This does also
> allow in particular for implementing forcible umounts in a safe manner.
>
> 	- keep track of all vnode's that have been created via VFS_ROOT(),
> 	  VFS_VGET(), VFS_SWAPVP(), VOP_OPEN(), VOP_CREATE(), VOP_LOOKUP(),
> 	  VOP_MKDIR(), VOP_REALVP() and which are not released via VOP_INACTIVE()
> 	  by adding a corresponding hold to the vfs_t via VFS_HOLD()
> 
>           this will be essentially done at the place where you really go
>           and create a new file system object and allocate the corresponding vnode
>           via vn_alloc()
>
>           This will bump up the vfs_t reference count for every active object.
>
>         - As vnodes may continue to have references held in the rest of the system
> 	  after the VFS_UNMOUNT(MS_FORCE) has taken place, the filesystems top-level
> 	  VOP's will return EIO for these dangling vnodes except for VOP_INACTIVE()
> 	  called on them as a result of the last VN_RELE(), in which case VOP_INACTIVE()
> 	  should free the vnode and return and also release the corresponding hold
> 	  on the vfs_t via VFS_RELE(), that way such dangling vnodes are freed eventually
>
>           Consequently, leaving forcible umounts out of the picture, when you really
>           go and destroy a file system object eventually via vn_free() you decrement
>           the ref count using VFS_RELE()
>
>         - care must be taken if you sustain a cache of file system objects and unreferenced
>           but alife, inactive objects can change identify, ie. if you use vn_invalid()/vn_reinit().
>           those places  also need to deal with VFS_HOLD()/VFS_FREE()
>
>         - you should be implementing a VFS_FREEVFS() callback. once all references to a
>           vfs_t are gone, the filesystem independend vfs layer
> 	  will invoke VFS_FREEVFS() from VFS_RELE() so that the filesystem depended code
> 	  can do it any possible still pending internal cleanup work needed
>           and eventually frees the private data vfs_t->vfs_data.
>
>           This essentially allows you to safely support forcible umounts of file systems
>           with active objects.
>
>         - a net effect of implementing is that you can now check in your VFS_UNMOUNT()
>           routine for active objects by just looking at your corresponding vfs_t reference
>           count being > 1 and you can return EBUSY here for a non-forcible umount.
>
>         - NB: when domount() allocates a vfs via vfs_alloc()/VFS_INIT() it is allocated
>           with a reference count of 0, but domount() actually immediately does a VFS_HOLD()
>           being the first and the last reference will go away in dounmount() by the framework.

I forgott to mention, that once your file system depended VFS_UNMOUNT() routine
has finished all the work and you are done with unmounting from the AFS PoV, 
you shall mark the corresponding vfs_t as: vfsp->vfs_flag |= VFS_UNMOUNTED; 

that keeps path name traversal and lookups in the generic vnode layer away from then on.

----
frankB