[OpenAFS] Trying to use OpenAFS-1.5.xx with Linux

Dale Pontius pontius@btv.ibm.com
Tue, 23 Feb 2010 14:02:51 -0500


Simon Wilkinson wrote:
> Hi,
>
> I've just got back online, and I'm going to try to reply to all of this in one message. Apologies if the attribution ends up being a little confused.
>   
Thanks for your patience and interest.
> Dale Pontius wrote:
>
>   
>> To be truthful, I'm not even THAT interested in disconnected mode.  I'm
>> more interested in the ability to "hold things over" while disconnected,
>> so I can reconnect.
>>     
>
> I suspect you do need disconnected for this - either that, or changes which mean that the filesystem never returns control when the network goes away (and for some, multithreaded, applications, this will probably have other issues).
>
> As I see it, your use case is:
> A) Start applications with data/configuration files stored in AFS
> B) Disconnect from network and suspend machine
> C) Walk down corridor
> D) Reconnect to network
> E) Continue working with applications as if nothing had happened.
>
> The problem today is that if C takes you a while, and the applications were in the middle of doing something when you disconnected, AFS will time out the file operations, and your applications will complain. So, you either need to allow applications to continue to access files while the network is down (hence disconnected), or to block all file access indefinitely whilst there is no network (which is a touch anti-social)
>   
There is another consideration, that during Step C time will be standing
still, in application space.  So the applications will be blocked - they
just won't know it.  You have hit my desired usage scenario exactly,
however.
>   
>> Today I'd have to take down the afs 1.4.x client if
>> I were about to lose my network connection, and that would mean taking
>> down my desktop session and all applications - just to suspend.
>>     
>
> Why do you need to take down the client? You should be able to start and stop the network under AFS without any major issues.
>   
I suspect the current limit isn't OpenAFS in this respect - it's the
Gentoo init scripts, which attempt to do dependency tracking and such. 
Those scripts think that OpenAFS requires a network, so if the network
goes down, they take OpenAFS down.  I need to look at tweaking the
OpenAFS init script.
>   
>> Next I figured I'd grab a token for root, using my afsid as
>> "-principal", and I'm reconnected.  What seems odd here is that at first
>> root didn't need a token to reconnect, now it does.
>>     
>
> There's a couple of potential explanations here. Firstly, if your login session is in a PAG, and you use 'su' or 'sudo' to become root, then root will inherit your tokens. You can verify this by running the 'tokens' command. Secondly, we only need tokens to reconnect when you have changes to send to the server. If there are no local changes, then reconnection will succeed without any authentication.
>
>   
>> Another observation.  On my previous testing, some mount points were
>> missing, along with all of their data.  This time those mount points are
>> present - mostly.  My data seems to be all present, with the exception
>> of some "holes".  The overt symptom is that I have a symlink, and if you
>> try "ls -l" on the parent directory to look at that symlink you get
>> "linkname -> " with the target showing up as blank.  I can go to another
>> afs client and find that target of that symlink.  Back on the 1.5.72
>> client, in some cases I can go directly to the target of the symlink and
>> see that the data is present.  But in some (currently one, but not
>> exhaustive testing) I go directly to the symlink target, and the data is
>> missing.  In this one case, the missing data shares a volume with other
>> data that is properly present.
>>     
>
> As others have pointed out, the only data that is made available to you whilst you are offline is that which you held a valid callback for when you disconnected. These are typically files which
> a) Have been accessed recently
> b) Have not been modified by other clients
> c) Are not hosted on a server which is flushing its callback list
>   
I guess I wasn't being clear, and this is subtle enough that murkiness
is easy to achieve.

The "holes" in my data in the preceding paragraph are happening when I'm
online and connected.  They have nothing to do with disconnected
operation.  I've run two tests of the 1.5.72 client, and in neither test
was I able to see any/all of my data while connected.
> Cases which are not repeatable (such as your comments about ls -l either side of disconnection not yielding reproducable results) are likely to be down to one of these 3 factors. 
>
> In addition, in order to be able to open() a file whilst disconnected, all of its chunks must be available in the disk cache. This typically means that the client must have read the entire file. There are also some further gotchas with the fact that the object containing the directory, as well as each file within it, must be in the cache. 
>   
I think I understand the rules for files while disconnected.  If I used
it while I was connected, I can use it while I'm disconnected, with the
same "level" (readonly vs read/write) access.  I'm not sure what "ls -l"
should say while disconnected.  By my first trial, "ls -l" gave the same
results disconnected as they had when previously connected.  On my
second trial, "ls -l" gave me a lot of question marks, except for files
that I had actually read the contents of.
> I populate my disk cache with:
> find . -type f | xargs cat > /dev/null
>
> As Jeffrey said, we're planning on implementing a user interface which will allow you to designate 'pinned' files which will always be available whilst disconnected.
>   
I'll get there eventually.  I'm not too worried about that at the
moment, right now I'm focused on the scenario at the top, and will work
from there.
>   
>> Is this helping at all?
>>     
>
> It's hugely helpful - the more use disconnected gets, the better!
>   
I'll keep plugging, as time permits.

-- 
Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: pontius@us.ibm.com

This e-mail and its attachments, if any, may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply e-mail and delete all copies of this message from your system without copying it and notify sender of the misdirection by reply e-mail.