[OpenAFS] Need details of callback mechanism -- questions ...

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 01 Sep 2005 20:39:32 -0400

On Thursday, September 01, 2005 10:27:46 AM -0600 Dexter 'Kim' Kimball 
<dhk@ccre.com> wrote:

>  If the version numbers are the same then the
> client updates the callback state and no data transfer occurs.  OTOH if
> the file has been changed and the version numbers are different, the
> client receives data and then updates the callback state.

That's approximately correct.  For every vnode it knows about, the client 
maintains information indicating whether it has current metadata for that 
vnode and, if so, for how long.  A callback granted by the fileserver (for 
example, as part of a FetchStatus operation) extendes the "for how long" 
timer; a callback broken causes the metadata to be invalidated immediately.

Whenever the client wants to use a vnode, it checks to see whether it has 
current, valid metadata.  If not, it does a new FetchStatus call to update 
its copy of the metadata; this also gets it a new callback.

If the operation involves reading or writing _data_, then the client must 
insure it has an up-to-date copy of the data chunk it is working with.  To 
facilitate this, part of the metadata is a "data version" number, and each 
cache chunk is tagged with the DV of the file whose contents it contains. 
If the DV on a chunk does not match that in the file's metadata (which we 
previously validated), then the client needs to fetch a new version of the 
part it cares about.

> I'm looking for definitive answers to the following.  Assume RW volumes
> throughout.
> 1.  When a fileserver sends a BCB to a given client, does it wait for a
> response or does it send the BCB and handle responses asynchronously?  I
> believe it used to wait for a response and that it no longer does so.

On a normal operation, the fileserver will attempt to break callbacks 
synchronously; that means it waits for each client which has a callback on 
the file being updated.  The calls are sent out simultaneously (using 
rx_multi), so you only have to wait for a single call timeout no matter how 
many clients are holding callbacks.  And, the fileserver will not waste 
time trying to break a callback on a client it already knows is "down"; 
instead, it will add it to the delay queue.

On an operation such as a volume restore, the fileserver must break all 
callbacks held by any client on that volume.  In 1.2, this is done 
synchronously in the fssync thread, which means the volserver has to wait 
for it.  In 1.4, it will be done in a dedicated "callbacks later" thread, 
allowing the fileserver to respond to the fssync request immediately.

> 2.  When does the fileserver begin sending the BCBs?
>     a. When it begins to modify a given file -- i.e when it receives the
> write RPC and before (or simultaneously with) storing the first few bytes.
>     b. When it has written the first bytes to a given file -- i.e. after
> it has stored x bytes but before receiving a "close" from the client.
> c. When it receives the close file RPC.

There is no such thing as a "close file" RPC; the fileserver is essentially 
stateless and does not know which files are open on a client.  On directory 
operations, the callback on the directory is broken once the change has 
been made.  For file operations, the callback break happens after the vnode 
in question has been locked, but before the file is actually updated.  This 
means that any clients which try to access the file after the write begins 
will be guaranteed of seeing the new version, because their cached metadata 
will be invalid, and the FetchStatus they do to update it will block until 
the store completes and releases the lock on that vnode.

> 3.  If the fileserver attempts a BCB to client X and gets no response (BCB
> fails on X), does it:
>     a. Retry immediately.
>     b. Wait some period of time before attempting the BCB again.
>     c. (a) then (b)

The fileserver breaks callbacks by making a normal RPC (except that, as 
described above, when multiple clients are involved, the RPC's are made in 
parallel).  If this operation times out, the client host is marked as 
"down", and the callback is added to the delay queue.  Further callback 
breaks for this client will be shunted directly to the delay queue, until 
we hear from it again.  Once a callback is on the delay queue, the 
fileserver will not attempt to break it again until it believes the client 
is "up".

Once a client is marked "down", the fileserver will not waste any time 
trying to communicate with that client until it hears from it again.  The 
next time that client makes an RPC, the fileserver will immedately break 
any delayed callbacks it has queued for that host, before it processes the 
new RPC.  This insures that the host is now "up to date", and that if the 
vnode on which it is making an RPC is one for which a callback was broken, 
the client will process the callback break _before_ recording a new 
callback as a result of the new RPC.

Periodically, the fileserver does a sweep of all "up" clients it knows 
about.  Any client which is holding active callbacks but has not been heard 
from in 15 minutes is probed, to verify that it still exists.  If the probe 
fails, the fileserver marks the client as "down", just as if a callback 
break has failed (except there is nothing to add to the delay queue).  This 
allows the fileserver to proactively discover "down" clients, instead of 
waiting to time them out when it is trying to break a callback.

In addition, "up" clients which have not been heard from in over two hours, 
whether or not they have outstanding callbacks, are deleted from the 
fileserver's client list.  If such a client is up, the fileserver 
immediately makes an InitCallBackState RPC, instructing the client to 
discard any callbacks it is holding from that fileserver.  If the client is 
not up, the RPC is skipped; if/when that client ever makes a call again, 
the fileserver will note it has no record of the client and will make the 
InitCallBackState call at that time.

> 4.  What is the current fileserver BCB retry scheme?

There is no retry scheme.  Once a callback break fails, the fileserver 
discontinues all attempts to contact that client unless and until the 
client makes another RPC.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA