[OpenAFS-devel] Re: Breaking callbacks on unlink

Jeffrey Altman jaltman@your-file-system.com
Tue, 28 Feb 2012 01:27:50 -0500


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigF30E99D828004F4C457AAD13
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Troy:

With all due respect, what you are describing is an utter hack.   What
you are looking for is called "snapshots" and the fact is that AFS as it
is currently designed cannot support an arbitrary number of them.  But
if any protocol design efforts are going to be made, they should be made
in that direction.

A ".trash" volume cannot be added to the AFS protocol because ".trash"
may already be in use in volume names.  In any case, creating a new
volume would change the File ID required to access the object and that
would defeat the entire goal of maintaining the file so that active
users could continue to do so.

Callbacks are not kept alive on deleted objects.  They were deleted,
their status can no longer change.

As for pulling the data into the cache, whose to say that there even is
a cache (cache bypass) or that the cache is even large enough to hold
the file?   What about a file that is open for append only?  Or accessed
over a very slow and expensive link?

In the AFS protocol, the file server does not maintain open file
handles.  It is not possible for the file server to know of a file is
actively in use.  The existing AFS unlink RPC has the semantics that it
has.  If new semantics are to be implemented, a new unlink RPC must be
created to support them.  That is not only OpenAFS policy but a
requirement of the need for backward compatibility between clients and
servers.

"vos backup" creates a snapshot of the volume at the time the command is
executed.  To put objects into the snapshot that do not exist at the
time the snapshot is created makes taking the snapshot (a) a non-atomic
operation; and (b) utterly unpredictable as to the resulting data content=
s.

If you want files to be in the .backup, you could conceivably take a new
=2Ebackup on every unlink operation.  That would provide the human user
the ability to undelete but would do nothing to address the problem of
unpredictability of when a file will become unavailable to a cache
manager.  Of course, the correct long term answer to that problem is
treating every change to a volume as a snapshot just as ZFS and ReFS do
and then providing a cache manager that has to satisfy an open file
handle request the ability to read from the snapshot prior to the
deletion.  Of course, none of the protocol support to do this has been
designed yet.

Jeffrey Altman


On 2/28/2012 12:51 AM, Troy Benjegerdes wrote:
> If I ever feel sufficiently motivated, then I suppose I can create a sp=
ecial
> ".trash" volume, which basically holds all the orphaned vnodes until 'v=
os
> backup' is run, at which time they can be moved into the backup volume.=

>=20
> It seems like no new RPCs are needed at all, just keep the callback ali=
ve, and
> maybe some hooks for a client process disconnected operation manager to=
 pull
> all files for open FD's into cache.
>=20
> (I'm also thinking a short doc page summarizing our discussion here wou=
ld be
> usefull)
>=20
> Now.. to throw another wrench in the works... does this make read/write=

> replication more or less complicated?


--------------enigF30E99D828004F4C457AAD13
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iQEcBAEBAgAGBQJPTHPrAAoJENxm1CNJffh4TfgH/3VCfmDPJ1R5K2GD51wUL97a
Qf5Z7f9QO+RqXJ/Rvk6NJbn9f5BRLWnMgNJ35OJuYlq7+/C2ysxjFzSMKD7WWRxi
glG9WQcXXA1Jof+3V85nuYEDwQsn+evU2995vokyY768lUdZ3VEc2ecd85P81lAF
+iVmNq7kHcFojz99AbOwFu81K70IA5UrsNGrafqrK4nCWK7xKrbjFbCIN/Jt4xyE
psDS3XDcdkT7qaIzlDHlCa7NiIkyBhCpPTi82kn3ePetgzeh8p/2OwBwsmqAadtL
OePFiVOpg2yVOJ4JgfPWj4YbwEy9LQZ7LXZsY7zXndXPhZES9Du126Sdgm0cntU=
=0MdH
-----END PGP SIGNATURE-----

--------------enigF30E99D828004F4C457AAD13--