[OpenAFS-devel] Re: Breaking callbacks on unlink

Matt W. Benjamin matt@linuxbox.com
Tue, 28 Feb 2012 06:43:46 -0500 (EST)


Hi,

I tend to share your feelings about .trash and .backup.  I would also agree,
potentially, that a snapshot AFS would solve in particular this formulation of
file recovery better than another magic clone.

However, I'm not sure I agree that Andrew hasn't made a fairly good case that
the "object could persist as long as a registration on it exists" is consistent
with current semantics of AFS3.  The containing directory is permanently changed,
all clients share a consistent view, which corresponds to the unlinked state.
The side effect--the extra BCB on the file being unlinked--might be just that,
an implementation side effect.  The registrations on the file belong to both the
client(s) and fileserver.  If the fileserver chooses to keep the registration alive,
how exactly does AFS3 forbid it doing so?  Correspondingly, as was indeed noted at
the start of this discussion, Unix clients, in particular, actually would expect
a behavior closer to the one Andrew and Troy are proposing (files are reclaimed
on last unlink) than the "files are immutably gone on first unlink" semantics, and
on other platforms, the impossibility of producing the case perhaps makes the
question moot?

I apologize in advance if I'm still missing something obvious, or just wrong...

Regards,

Matt

----- "Jeffrey Altman" <jaltman@your-file-system.com> wrote:

> Troy:
> 
> With all due respect, what you are describing is an utter hack.  
> What
> you are looking for is called "snapshots" and the fact is that AFS as
> it
> is currently designed cannot support an arbitrary number of them. 
> But
> if any protocol design efforts are going to be made, they should be
> made
> in that direction.
> 
> A ".trash" volume cannot be added to the AFS protocol because
> ".trash"
> may already be in use in volume names.  In any case, creating a new
> volume would change the File ID required to access the object and
> that
> would defeat the entire goal of maintaining the file so that active
> users could continue to do so.
> 
> Callbacks are not kept alive on deleted objects.  They were deleted,
> their status can no longer change.
> 
> As for pulling the data into the cache, whose to say that there even
> is
> a cache (cache bypass) or that the cache is even large enough to hold
> the file?   What about a file that is open for append only?  Or
> accessed
> over a very slow and expensive link?
> 
> In the AFS protocol, the file server does not maintain open file
> handles.  It is not possible for the file server to know of a file is
> actively in use.  The existing AFS unlink RPC has the semantics that
> it
> has.  If new semantics are to be implemented, a new unlink RPC must
> be
> created to support them.  That is not only OpenAFS policy but a
> requirement of the need for backward compatibility between clients
> and
> servers.
> 
> "vos backup" creates a snapshot of the volume at the time the command
> is
> executed.  To put objects into the snapshot that do not exist at the
> time the snapshot is created makes taking the snapshot (a) a
> non-atomic
> operation; and (b) utterly unpredictable as to the resulting data
> contents.
> 
> If you want files to be in the .backup, you could conceivably take a
> new
> .backup on every unlink operation.  That would provide the human user
> the ability to undelete but would do nothing to address the problem
> of
> unpredictability of when a file will become unavailable to a cache
> manager.  Of course, the correct long term answer to that problem is
> treating every change to a volume as a snapshot just as ZFS and ReFS
> do
> and then providing a cache manager that has to satisfy an open file
> handle request the ability to read from the snapshot prior to the
> deletion.  Of course, none of the protocol support to do this has
> been
> designed yet.
> 
> Jeffrey Altman
> 
> 
> On 2/28/2012 12:51 AM, Troy Benjegerdes wrote:
> > If I ever feel sufficiently motivated, then I suppose I can create a
> special
> > ".trash" volume, which basically holds all the orphaned vnodes until
> 'vos
> > backup' is run, at which time they can be moved into the backup
> volume.
> > 
> > It seems like no new RPCs are needed at all, just keep the callback
> alive, and
> > maybe some hooks for a client process disconnected operation manager
> to pull
> > all files for open FD's into cache.
> > 
> > (I'm also thinking a short doc page summarizing our discussion here
> would be
> > usefull)
> > 
> > Now.. to throw another wrench in the works... does this make
> read/write
> > replication more or less complicated?

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309