[OpenAFS-devel] Re: Breaking callbacks on unlink

Jeffrey Altman jaltman@your-file-system.com
Tue, 28 Feb 2012 07:26:32 -0500


Callbacks in current AFS have a lifetime on the order of minutes.  They are n=
ot renewed by the client until after they expire and then only on next acces=
s to the file. =20

It occurred to me last night why the callback is not broken on the last unli=
nk .  Because it is a wasted message.  Breaking the callback does not guaran=
tee that the object will in fact be deleted on the client in a timely manner=
 because unlike with XCB there is no context to say that it has in fact been=
 deleted.  When the callback is received or it expires does not trigger a po=
lling to the server.  Therefore there is no guarantee of constant behavior i=
n any case.

The addition of XCB changes the landscape because clients can request an ext=
ended lifetime.  However, XCB is a new RPC that does not exist with current d=
eployments.  =20

Sent from my iPad

On Feb 28, 2012, at 6:43 AM, "Matt W. Benjamin" <matt@linuxbox.com> wrote:

> Hi,
>=20
> I tend to share your feelings about .trash and .backup.  I would also agre=
e,
> potentially, that a snapshot AFS would solve in particular this formulatio=
n of
> file recovery better than another magic clone.
>=20
> However, I'm not sure I agree that Andrew hasn't made a fairly good case t=
hat
> the "object could persist as long as a registration on it exists" is consi=
stent
> with current semantics of AFS3.  The containing directory is permanently c=
hanged,
> all clients share a consistent view, which corresponds to the unlinked sta=
te.
> The side effect--the extra BCB on the file being unlinked--might be just t=
hat,
> an implementation side effect.  The registrations on the file belong to bo=
th the
> client(s) and fileserver.  If the fileserver chooses to keep the registrat=
ion alive,
> how exactly does AFS3 forbid it doing so?  Correspondingly, as was indeed n=
oted at
> the start of this discussion, Unix clients, in particular, actually would e=
xpect
> a behavior closer to the one Andrew and Troy are proposing (files are recl=
aimed
> on last unlink) than the "files are immutably gone on first unlink" semant=
ics, and
> on other platforms, the impossibility of producing the case perhaps makes t=
he
> question moot?
>=20
> I apologize in advance if I'm still missing something obvious, or just wro=
ng...
>=20
> Regards,
>=20
> Matt
>=20
> ----- "Jeffrey Altman" <jaltman@your-file-system.com> wrote:
>=20
>> Troy:
>>=20
>> With all due respect, what you are describing is an utter hack. =20
>> What
>> you are looking for is called "snapshots" and the fact is that AFS as
>> it
>> is currently designed cannot support an arbitrary number of them.=20
>> But
>> if any protocol design efforts are going to be made, they should be
>> made
>> in that direction.
>>=20
>> A ".trash" volume cannot be added to the AFS protocol because
>> ".trash"
>> may already be in use in volume names.  In any case, creating a new
>> volume would change the File ID required to access the object and
>> that
>> would defeat the entire goal of maintaining the file so that active
>> users could continue to do so.
>>=20
>> Callbacks are not kept alive on deleted objects.  They were deleted,
>> their status can no longer change.
>>=20
>> As for pulling the data into the cache, whose to say that there even
>> is
>> a cache (cache bypass) or that the cache is even large enough to hold
>> the file?   What about a file that is open for append only?  Or
>> accessed
>> over a very slow and expensive link?
>>=20
>> In the AFS protocol, the file server does not maintain open file
>> handles.  It is not possible for the file server to know of a file is
>> actively in use.  The existing AFS unlink RPC has the semantics that
>> it
>> has.  If new semantics are to be implemented, a new unlink RPC must
>> be
>> created to support them.  That is not only OpenAFS policy but a
>> requirement of the need for backward compatibility between clients
>> and
>> servers.
>>=20
>> "vos backup" creates a snapshot of the volume at the time the command
>> is
>> executed.  To put objects into the snapshot that do not exist at the
>> time the snapshot is created makes taking the snapshot (a) a
>> non-atomic
>> operation; and (b) utterly unpredictable as to the resulting data
>> contents.
>>=20
>> If you want files to be in the .backup, you could conceivably take a
>> new
>> .backup on every unlink operation.  That would provide the human user
>> the ability to undelete but would do nothing to address the problem
>> of
>> unpredictability of when a file will become unavailable to a cache
>> manager.  Of course, the correct long term answer to that problem is
>> treating every change to a volume as a snapshot just as ZFS and ReFS
>> do
>> and then providing a cache manager that has to satisfy an open file
>> handle request the ability to read from the snapshot prior to the
>> deletion.  Of course, none of the protocol support to do this has
>> been
>> designed yet.
>>=20
>> Jeffrey Altman
>>=20
>>=20
>> On 2/28/2012 12:51 AM, Troy Benjegerdes wrote:
>>> If I ever feel sufficiently motivated, then I suppose I can create a
>> special
>>> ".trash" volume, which basically holds all the orphaned vnodes until
>> 'vos
>>> backup' is run, at which time they can be moved into the backup
>> volume.
>>>=20
>>> It seems like no new RPCs are needed at all, just keep the callback
>> alive, and
>>> maybe some hooks for a client process disconnected operation manager
>> to pull
>>> all files for open FD's into cache.
>>>=20
>>> (I'm also thinking a short doc page summarizing our discussion here
>> would be
>>> usefull)
>>>=20
>>> Now.. to throw another wrench in the works... does this make
>> read/write
>>> replication more or less complicated?
>=20
> --=20
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
>=20
> http://linuxbox.com
>=20
> tel. 734-761-4689
> fax. 734-769-8938
> cel. 734-216-5309