[OpenAFS] Non-functional fileserver
Stephan Wonczak
a0033@rrz.uni-koeln.de
Fri, 26 Jul 2024 10:44:03 +0200 (CEST)
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--1602634389-1840350760-1721983443=:4043825
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT
Hi Jeffrey,
On Mon, 22 Jul 2024, Jeffrey Altman wrote:
>
> On Jul 18, 2024, at 6:56 AM, Stephan Wonczak
> <a0033@rrz.uni-koeln.de> wrote:
>
> I just noticed: There still seems to be something not working
> correctly. Although everything is working correcty (at least -I- did
> not find anything amiss), I still get these messages in FileLog every
> five minutes:
>
> Thu Jul 18 12:36:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> Thu Jul 18 12:41:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> Thu Jul 18 12:46:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
>
> Any ideas as to that?
>
>
> 5376 - no quorum elected
Strange.
> Earlier you mentioned that the cell consists of a single machine on which
> the DB and FILE services are co-located.
> In a single server configuration the UBIK services (vlserver, ptserver, …)
> should be operating in single server mode and there should never be an
> election. Since the vlserver is returning 5376 it indicates there might
> still be a problem with the contents of the server CellServDB and perhaps
> the NetInfo/NetRestrict configuration.
Really strange.
I do not have any NetInfo or NetRestrict files, so no problem there.
Here are the contents of /usr/afs/etc/CellServDB:
>afstest.uni-koeln.de #Cell name
134.95.13.39 #afstest.rrz.uni-koeln.de
(Yes, really only these two lines!)
> What errors are logged to the VLLog?
None at all.
Thu Jul 11 14:57:30 2024 Starting AFS vlserver 4 (/usr/afs/bin/vlserver)
Thu Jul 11 14:57:30 2024 @(#)OpenAFS 1.8.11 2024-06-13
root@dialog8.rrz.uni-koeln.de
Thu Jul 11 14:58:45 2024 Ubik: I am the sync site
These are the last entries.
> What does 'udebug <host> 7003 -long’ report?
This is where it gets really weird:
[root@afstest/usr/afs]$ udebug afstest.rrz.uni-koeln.de vl -long
Host's addresses are: 134.95.13.39
Host's 134.95.13.39 time is Fri Jul 26 10:26:26 2024
Local time is Fri Jul 26 10:26:26 2024 (time differential 0 secs)
Last yes vote for 134.95.13.39 was 13 secs ago (sync site);
Last vote started 13 secs ago (at Fri Jul 26 10:26:13 2024)
Local db version is 1610030433.14
I am sync site until 47 secs from now (at Fri Jul 26 10:27:13 2024) (2
servers)
Recovery state 1
The last trans I handled was 1720702725.17056
Sync site's db version is 1610030433.14
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
1279661 secs ago (at Thu Jul 11 14:58:45 2024)
Server (134.95.110.160): (db 0.0)
last vote never rcvd
last beacon never sent
dbcurrent=0, up=0 beaconSince=0
Where does this IP 134.95.110.160 come from?
Well, actually, this was the -old- IP of this machine before it was
moved into another network. But where did this come from? Hmmm...
I got it.
After correcting the server-CellServDB, I did not reboot the machine. I
just stopped (and afterwards) restarted both the openafs-server and
openafs-client. Obviously, the wrong IP remained in some kernel resident
lists. I tried fixing the issue with "fs newcell", but no luck there.
One reboot later, however, things are looking fine now:
udebug afstest.rrz.uni-koeln.de vl -long
Host's addresses are: 134.95.13.39
Host's 134.95.13.39 time is Fri Jul 26 10:39:31 2024
Local time is Fri Jul 26 10:39:31 2024 (time differential 0 secs)
Last yes vote for 134.95.13.39 was 0 secs ago (sync site);
Last vote started 0 secs ago (at Fri Jul 26 10:39:31 2024)
Local db version is 1610030433.14
I am sync site forever (1 server)
Recovery state 1f
The last trans I handled was 1721983108.0
Sync site's db version is 1610030433.14
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
63 secs ago (at Fri Jul 26 10:38:28 2024)
Thanks, Jeffrey, for poining me in the right direction!
(and hopefully someone can learn from my bunbling here :-) )
Dipl. Chem. Dr. Stephan Wonczak
Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
Universitaet zu Koeln, Weyertal 121, 50931 Koeln
Tel: +49/(0)221/470-89583, Fax: +49/(0)221/470-89625
--1602634389-1840350760-1721983443=:4043825--