[AFS3-std] first draft: ubik update proposal
Derrick Brashear
shadow@gmail.com
Tue, 15 Feb 2011 01:07:50 -0500
On Mon, Feb 14, 2011 at 5:23 PM, Jeffrey Hutzelman <jhutz@cmu.edu> wrote:
> --On Friday, February 04, 2011 09:32:53 AM -0500 Derrick Brashear
> <shadow@gmail.com> wrote:
>
>> This is not a complete refresh of all Ubik RPCs. It would allow
>> capability for IPv6, 64 bit times, multiple files in a database,
>> beacon returns not precluding errors. Comments welcome, I will refine
>> further into a draft.
>
> I'm trying to decide whether I think Ubik is even a subject for
> standardization, rather than an implementation detail of OpenAFS. =A0I do=
n't
> think we necessarily expect ubik servers of different implementations to =
be
> able to operate together, and certainly database _contents_ are an
> implementation detail.
>
> That said, I'll go ahead and comment on this anyway.
>
>
>> struct ubik_nversion {
>> =A0 =A0afs_int64 epoch;
>> =A0 =A0afs_int64 counter;
>> };
>>
>> struct ubik_ntid {
>> =A0 =A0afs_int64 epoch;
>> =A0 =A0afs_int64 counter;
>> };
>
> 64-bit epochs are necessary.
> 64-bit counters seem excessive.
i considered that, i am certainly not wed to 64 bit counter.
>
>> New package DISK2, would supercede DISK.
>
> Why a new package, instead of just adding RPC's?
so they can have the same name. not wed to that, either.
>
>> GetFile =A0 =A0 =A0 =A0 (IN afs_int32 file,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0IN ubik_nversion *haveVersion,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0OUT ubik_nversion *gotVersion)
>
> What's the haveVersion argument for here?
if the version i claim i have of a given file (not of the whole
database, but the particular file) is the same one you'd send me,
there's no need to send it.
>
>> SendFileDiff =A0 =A0(IN afs_int32 file,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0afs_int64 length,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ubik_nversion *fromVersion,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ubik_nversion *toVersion)
>
> What does the length mean here? =A0The size of the diff?
yes.
>
>
>> when using sendfile or getfile, all files need to be gotten (to the
>> same version) before any are moved into place. snapversion should do a
>> copy-on-write snapshot of the current files; dropsnap will drop them
>> when done. the *filediff can use the snapshot plus locks to do an
>> incremental "finish" after transferring all files, in much the same
>> way as a volume release is performed.
>
>
> I'm not clear on how snapshotting interacts with GetFile/SendFile and act=
ive
> operations. =A0I think in practice the mechanism you need is one that all=
ows
> you to "freeze" the target's databases so that active transactions read f=
rom
> the frozen copy, while sendfile prepares a "new" copy; note that there ca=
n
> be no write transactions, since writes happen only on the sync site and
> these calls are made only by the sync site and never to itself. Having do=
ne
> a snapshot and sent some new files, it must be possible to either commit =
the
> new files or discard them; recovery should only do the commit operation i=
f
> it is still sync site.
the original intent of getfilediff was for some future use, not at this tim=
e.
sendfilediff is an optimization. just because you're recovering
doesn't mean the extant quorum can't continue taking writes. so i take
writes and when sendfile to you finishes, i stop taking writes, send
*only* a diff, and then commit and resume taking writes, not unlike a
volume release.
i could conceive a version where a snapshot was taken automatically
when one site failed to accept a change but a quorum was still in
place so that, predictively assuming
that replica later came up with the database intact from that point, a
recovery could be done by *only* sending a diff (or, if the master
changes, by only first getting a diff)
--=20
Derrick