[OpenAFS] File server allegedly unavailable
Kim Kimball
Kim Kimball" <kim@ccre.com
Tue, 5 Feb 2002 09:29:27 -0700
The behavior of "fs checks" is not as odd as it might appear.
This client-side call does _not_ check _all_ fileservers, despite its own
claims to the contrary.
Here's what happens:
1. Each client keeps track of fileservers it has contacted.
2. When "fs checks" is run on the client, the client goes through its list
of already-contacted fileservers.
3. For each fileserver in the list, the client makes a simple query. I
believe it asks the fileserver for the time.
4. If the client hears back from each fileserver in the already-contacted
list, it cheerfully reports that ALL fileservers are available.
This is frequently a prevarication, a lie, an untruth, a fabrication -- or
perhaps a symptom of the client-centric ego.
If a given client has not yet contacted a particular fileserver, you'll get
the behavior you describe.
Assume volume XYA.vol is a ReadWrite (can't get it from anywhere else) on
fileserver "NotOnTheList" and that the client has not ever issued a request
to the fileserver NotOnTheList
1. "fs checks" returns "All servers running" -- assuming all but
NotOnTheList do respond to the time query.
2. I attempt to access volume XYA.vol -- which causes the client to put
NotOnTheList onto the already-contacted list.
3. When I run "fs checks" again, the client goes through the updated
already-contacted list.
I believe the "contact list" is kept until reboot -- but this may be
incorrect and if so I would appreciate any clarification.
BTW -- the only way I know of to check the status of all active
fileservers -- where "active" means that the fileserver houses at least one
AFS volume -- is to list the entire VLDB, something like "vos listvl |
grep -i server | awk .... | sort -u" > /tmp/somefile -- which will give a
list of all active fileservers. Then step through the list in a loop,
running "bos status fs" against each host, and checking the results by
awking out keywords.
Kim
-------------------------------------------------------
Dexter "Kim" Kimball
CCRE, Inc.
14421 N. County Rd. 25E (Swanson Ranch Road)
PO Box 209
Masonville, CO 80541
kim@ccre.com
-------------------------------------------------------
>
> Message: 4
> Date: Tue, 5 Feb 2002 10:24:30 +0000
> From: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> To: openafs-info@openafs.org
> Reply-To: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> Subject: [OpenAFS] File server allegedly unavailable
>
> I'm having a peculiar problem, which has affected a single
> Linux system. (Debian potato, upgraded, with 2.4.17 kernel).
> I have over 20 identical systems, indentical in the sense that
> they have virtually identical hardware, and that the software
> has been installed from a single common image, which includes
> the kernel and openafs; all that changes is the IP address.
>
> One machine is showing the following symptoms: when I start
> openafs, 'fs checks' reports that 'All servers are running'.
> Then I do ls on a directory in AFS. The response is
> 'Connection timed out'. After this, 'fs checks' reports that
> 'These servers unavailable due to network or server problems:
ice.mcc.ac.uk.'
> These 'network or server problems' however are not affecting any
> of the other 20 machines, all of which (along with the server)
> are on the same segment.
>
> I started this with openafs 1.2.2. When openafs 1.2.3 came out, I
> upgraded the client and kernel module, hoping this would cure the
> problem. No such luck. I've edited the CellServDB file so that
> only my cell is in it. No change. I've stopped AFS, deleted all
> the cache files, and restarted. No change.
>
> I would appreciate any ideas about this one.
>
> -- Owen
> LeBlanc@mcc.ac.uk
>