[OpenAFS] Re: Afs User volume servers in VM's

Renata Maria Dart renata@slac.stanford.edu
Wed, 26 Oct 2011 15:32:19 -0700 (PDT)

Hi Jeff, 

>You should upgrade to 1.6 so you can increase -p to 255.  This
>will help but will not address the core problem.

I am working on that...I have a 1.6 server in test mode.

>>> Do these processes communicate with each other to synchronize their work
>>> and as such rely on the AFS cache coherency semantics?  If not, could
>>> these jobs be modified to open the files in a different manner that did
>>> not enforce those semantics?
>> [...]
>You answered a question but not the question I asked.

I guess I am not sure about the way they communicate.  We recommend to
users that they do not write to one file or directory in AFS, but to
local scratch space on the batch client, copying things over once to
AFS when done.  That works for the users who know about it.  I am guessing
that there is no communication between processes for the users who are
unknowingly running a utility from each batch job that writes a 
file to the same directory.   


>Implementing such a change is not impossible but would need to be done
>quite carefully.
>In any case, I asked the question I did because at the present time the
>file server cannot complete a call and free the associated worker thread
>until the callback processing is complete.  This is necessary to ensure
>that the data modification performed by the call is visible to all of
>the interested clients prior to its completion.  By doing so any
>out-of-band messages sent by one application to another working on the
>same data set will be guaranteed to be serialized.  If there is no
>requirement for serialized messaging between the clients it *might* be
>possible for us to add support for O_NONBLOCK that would instruct the
>file server to not wait for callback processing to complete but instead
>queue the callback processing for a background task and complete the
>call immediately.  Jobs that made use of such an option when it is safe
>to do so would not run into the problems that you are experiencing.
>I should note that at this point this is just an idea and has not be
>vetted in anyway what so ever.
>Jeffrey Altman