[OpenAFS] File server allegedly unavailable

Kim Kimball Kim Kimball" <kim@ccre.com
Tue, 5 Feb 2002 09:29:27 -0700


The behavior of "fs checks" is not as odd as it might appear.

This client-side call does _not_ check _all_ fileservers, despite its own
claims to the contrary.

Here's what happens:

1.  Each client keeps track of fileservers it has contacted.
2.  When "fs checks" is run on the client, the client goes through its list
of already-contacted fileservers.
3.  For each fileserver in the list, the client makes a simple query.  I
believe it asks the fileserver for the time.
4.  If the client hears back from each fileserver in the already-contacted
list, it cheerfully reports that ALL fileservers are available.

This is frequently a prevarication, a lie, an untruth, a fabrication -- or
perhaps a symptom of the client-centric ego.

If a given client has not yet contacted a particular fileserver, you'll get
the behavior you describe.

Assume volume XYA.vol is a ReadWrite (can't get it from anywhere else) on
fileserver "NotOnTheList" and that the client has not ever issued a request
to the fileserver NotOnTheList

1.  "fs checks" returns "All servers running" -- assuming all but
NotOnTheList do respond to the time query.
2.  I attempt to access volume XYA.vol -- which causes the client to put
NotOnTheList onto the already-contacted list.
3.  When I run "fs checks" again, the client goes through the updated
already-contacted list.

I believe the "contact list" is kept until reboot -- but this may be
incorrect and if so I would appreciate any clarification.

BTW -- the only way I know of to check the status of all active
fileservers -- where "active" means that the fileserver houses at least one
AFS volume -- is to list the entire VLDB, something like "vos listvl |
grep -i server | awk .... | sort -u" > /tmp/somefile -- which will give a
list of all active fileservers.  Then step through the list in a loop,
running "bos status fs" against each host, and checking the results by
awking out keywords.

Kim

-------------------------------------------------------
Dexter "Kim" Kimball
CCRE, Inc.
14421 N. County Rd. 25E (Swanson Ranch Road)
PO Box 209
Masonville, CO 80541

        kim@ccre.com
-------------------------------------------------------

>
> Message: 4
> Date: Tue, 5 Feb 2002 10:24:30 +0000
> From: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> To: openafs-info@openafs.org
> Reply-To: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
> Subject: [OpenAFS] File server allegedly unavailable
>
> I'm having a peculiar problem, which has affected a single
> Linux system.  (Debian potato, upgraded, with 2.4.17 kernel).
> I have over 20 identical systems, indentical in the sense that
> they have virtually identical hardware, and that the software
> has been installed from a single common image, which includes
> the kernel and openafs; all that changes is the IP address.
>
> One machine is showing the following symptoms:  when I start
> openafs, 'fs checks' reports that 'All servers are running'.
> Then I do ls on a directory in AFS.  The response is
> 'Connection timed out'.  After this, 'fs checks' reports that
> 'These servers unavailable due to network or server problems:
ice.mcc.ac.uk.'
> These 'network or server problems' however are not affecting any
> of the other 20 machines, all of which (along with the server)
> are on the same segment.
>
> I started this with openafs 1.2.2.  When openafs 1.2.3 came out, I
> upgraded the client and kernel module, hoping this would cure the
> problem.  No such luck.  I've edited the CellServDB file so that
> only my cell is in it.  No change.  I've stopped AFS, deleted all
> the cache files, and restarted.  No change.
>
> I would appreciate any ideas about this one.
>
>      -- Owen
>      LeBlanc@mcc.ac.uk
>