[OpenAFS] 1.6.0pre2 Filelog CB: WhoAreYou failed for host

Gémes Géza geza@kzsdabas.hu
Mon, 21 Mar 2011 07:42:50 +0100


Hi,

I know this topic has been discussed before, but the conclusion was that
it is caused by NAT.
This is impossible in my case, as openafs servers are firewalled from
the outside world.
The fileserver has 3 ethernet interfaces:
1: connected to the clients, two IP addresses one active (other in the
NetRestrict file)
2: connected to a SAN, no IP addresses
3: connected to other cluster memebers, IP address in the NetRestrict file
vos listaddrs gives nothing just the right IP address for the vol and
fileserver.
The FileLog is full of entries like:
CB: WhoAreYou failed for host FILESERVER
Besides that all the clients (1.6.0pre2 on linux, 1.5.78 and 1.6.0pre3
on windows) are working as expected.
Except one (1.6.0pre2 on linux) which has two interfaces (one connected
two the Fileservers network and the other in its NetRestrict file). This
has entries in its syslog like this:
Mar 21 07:31:36 ssh-gate kernel: [ 4879.645660] afs: WARM shutting down
of: CBCELLNAME afsCELLNAME BkGCELLNAME CTruncCELLNAME AFSDBCELLNAME
RxEventCELLNAME UnmaskRxkSignalsCELLNAME RxListenerCELLNAME  ALL
allocated tablesCELLNAME done
Mar 21 07:31:36 HOSTNAME kernel: [ 4880.229317] enabling dynamically
allocated vcaches
Mar 21 07:31:36 HOSTNAME kernel: [ 4880.229321] Starting AFS cache
scanCELLNAMEfound 70 non-empty cache files (2%).
Mar 21 07:32:51 HOSTNAME kernel: [ 4954.852104] afs: Lost contact with
file server FILESERVER in cell CELLNAME (all multi-homed ip addresses
down for the server)
Mar 21 07:32:51 HOSTNAME kernel: [ 4954.852115] afs: Lost contact with
file server FILESERVER in cell CELLNAME (all multi-homed ip addresses
down for the server)
Mar 21 07:32:51 HOSTNAME kernel: [ 4954.852119] RXAFS_GetCapabilities
failed with code -3
Mar 21 07:32:54 HOSTNAME kernel: [ 4958.304958] afs: file server
FILESERVER in cell CELLNAME is back up (multi-homed address; other
same-host interfaces may still be down)
Mar 21 07:32:54 HOSTNAME kernel: [ 4958.304965] afs: file server
FILESERVER in cell CELLNAME is back up (multi-homed address; other
same-host interfaces may still be down)
the first ls on /afs/CELLNAME/ always fail, and later on (after those
syslog entries, except restarting of course) it starts working.

Thanks for any idea!

Cheers

Geza