[Port-solaris] EBUSY unmount check
Frank Batschulat (Home)
Thu, 29 Apr 2010 09:11:02 +0200
On Tue, 27 Apr 2010 12:23:13 +0200, Frank Batschulat (Home) <Frank.Batschulat@sun.com> wrote:
> On Mon, 12 Apr 2010 23:08:09 +0200, Andrew Deason <firstname.lastname@example.org> wrote:
>> Right now the OpenAFS solaris kernel module doesn't check if someone is
>> accessing something in AFS when we .vfs_unmount, and solaris doesn't
>> check for us, either. This has the effect of possibly panic'ing when we
>> umount AFS (typically at shutdown). This has been brought up before:
>> but I don't think anything ever came of it.
>> I'd like to add a check, but I'm no expert on the Solaris VFS layer. At
>> first, I thought that checking the vfs_count member of the given
>> struct vfs* would work for this, but that always appears to be 1, even
>> if we have files in AFS open at the time of unmounting. So, my
>> understanding is that we must inc/dec that field for it to be useful for
>> this, presumably with VFS_HOLD/VFS_RELE.
>> If we VFS_HOLD in our (OpenAFS') .vfs_vget function, and VFS_RELE in our
>> .vop_inactive function, would that make checking vfs_count in unmount be
>> a sufficient check? Or should we just check the v_count of the vnode
>> referenced by the struct vfs* given to us?
>> The original way I was going to check for this was by checking all of
>> our vnodes to see if they were in use, but that's slow. I presume a
>> similar walk will be necessary to support force-unmounting, but I want
>> to get EBUSY for regular unmounts first.
> Sorry for the long time to respond, but I changed gears and no longer
> work in file systems land so new project work takes precedence.
> Yes, indeed, the old fashioned way would be like UFS does,
> ie. walk along the list of active inodes in the UFS inode cache
> (or the similar list on AFS) and check if the vnode is busy
> (v_count > 1) and fail VFS_UNMOUNT() with EBUSY in the case
> of a non-forcible umount operation. This is what ufs_unmount() still does
> unfortunately. Thats not really nice.
> The better way to do that is, as you've already guessed, use
> vfs reference counting for that purpose. the VFS_HOLD()/VFS_RELE()
> infrastructure was put in place in Solaris 8 mainly in order to
> support forcible unmounts. A side effect if this is that it can be used
> for unmount's busy check as well. This is what zfs_unmount() does.
> Of course the file system has to implement code supporting that protocol,
> that's why you see a vfs_count of 1 for your AFS vfs_t most of the time.
> Here's a step-by-step guide how to make use of the VFS_HOLD()/VFS_RELE()
> protocol inside the file system implementation. This is what I had
> planned a long time ago for UFS but it never made it. This does also
> allow in particular for implementing forcible umounts in a safe manner.
> - keep track of all vnode's that have been created via VFS_ROOT(),
> VFS_VGET(), VFS_SWAPVP(), VOP_OPEN(), VOP_CREATE(), VOP_LOOKUP(),
> VOP_MKDIR(), VOP_REALVP() and which are not released via VOP_INACTIVE()
> by adding a corresponding hold to the vfs_t via VFS_HOLD()
> this will be essentially done at the place where you really go
> and create a new file system object and allocate the corresponding vnode
> via vn_alloc()
> This will bump up the vfs_t reference count for every active object.
> - As vnodes may continue to have references held in the rest of the system
> after the VFS_UNMOUNT(MS_FORCE) has taken place, the filesystems top-level
> VOP's will return EIO for these dangling vnodes except for VOP_INACTIVE()
> called on them as a result of the last VN_RELE(), in which case VOP_INACTIVE()
> should free the vnode and return and also release the corresponding hold
> on the vfs_t via VFS_RELE(), that way such dangling vnodes are freed eventually
> Consequently, leaving forcible umounts out of the picture, when you really
> go and destroy a file system object eventually via vn_free() you decrement
> the ref count using VFS_RELE()
> - care must be taken if you sustain a cache of file system objects and unreferenced
> but alife, inactive objects can change identify, ie. if you use vn_invalid()/vn_reinit().
> those places also need to deal with VFS_HOLD()/VFS_FREE()
> - you should be implementing a VFS_FREEVFS() callback. once all references to a
> vfs_t are gone, the filesystem independend vfs layer
> will invoke VFS_FREEVFS() from VFS_RELE() so that the filesystem depended code
> can do it any possible still pending internal cleanup work needed
> and eventually frees the private data vfs_t->vfs_data.
> This essentially allows you to safely support forcible umounts of file systems
> with active objects.
> - a net effect of implementing is that you can now check in your VFS_UNMOUNT()
> routine for active objects by just looking at your corresponding vfs_t reference
> count being > 1 and you can return EBUSY here for a non-forcible umount.
> - NB: when domount() allocates a vfs via vfs_alloc()/VFS_INIT() it is allocated
> with a reference count of 0, but domount() actually immediately does a VFS_HOLD()
> being the first and the last reference will go away in dounmount() by the framework.
I forgott to mention, that once your file system depended VFS_UNMOUNT() routine
has finished all the work and you are done with unmounting from the AFS PoV,
you shall mark the corresponding vfs_t as: vfsp->vfs_flag |= VFS_UNMOUNTED;
that keeps path name traversal and lookups in the generic vnode layer away from then on.