[OpenAFS] Non-functional fileserver

Stephan Wonczak a0033@rrz.uni-koeln.de
Fri, 26 Jul 2024 10:44:03 +0200 (CEST)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--1602634389-1840350760-1721983443=:4043825
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

   Hi Jeffrey,

On Mon, 22 Jul 2024, Jeffrey Altman wrote:

>
>       On Jul 18, 2024, at 6:56 AM, Stephan Wonczak
>       <a0033@rrz.uni-koeln.de> wrote:
> 
>  I just noticed: There still seems to be something not working
> correctly. Although everything is working correcty (at least -I- did
> not find anything amiss), I still get these messages in FileLog every
> five minutes:
> 
> Thu Jul 18 12:36:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> Thu Jul 18 12:41:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> Thu Jul 18 12:46:59 2024 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> 
>  Any ideas as to that?
> 
> 
> 5376 - no quorum elected

   Strange.

> Earlier you mentioned that the cell consists of a single machine on which
> the DB and FILE services are co-located.  
> In a single server configuration the UBIK services (vlserver, ptserver, …)
> should be operating in single server mode and there should never be an
> election.  Since the vlserver is returning 5376 it indicates there might
> still be a problem with the contents of the server CellServDB and perhaps
> the NetInfo/NetRestrict configuration.  

   Really strange.
   I do not have any NetInfo or NetRestrict files, so no problem there.
   Here are the contents of /usr/afs/etc/CellServDB:

>afstest.uni-koeln.de	#Cell name
134.95.13.39    #afstest.rrz.uni-koeln.de

   (Yes, really only these two lines!)

> What errors are logged to the VLLog?

   None at all.

Thu Jul 11 14:57:30 2024 Starting AFS vlserver 4 (/usr/afs/bin/vlserver)
Thu Jul 11 14:57:30 2024 @(#)OpenAFS 1.8.11 2024-06-13 
root@dialog8.rrz.uni-koeln.de
Thu Jul 11 14:58:45 2024 Ubik: I am the sync site

   These are the last entries.

> What does 'udebug <host> 7003 -long’ report?

   This is where it gets really weird:

[root@afstest/usr/afs]$ udebug afstest.rrz.uni-koeln.de vl -long
Host's addresses are: 134.95.13.39
Host's 134.95.13.39 time is Fri Jul 26 10:26:26 2024
Local time is Fri Jul 26 10:26:26 2024 (time differential 0 secs)
Last yes vote for 134.95.13.39 was 13 secs ago (sync site);
Last vote started 13 secs ago (at Fri Jul 26 10:26:13 2024)
Local db version is 1610030433.14
I am sync site until 47 secs from now (at Fri Jul 26 10:27:13 2024) (2 
servers)
Recovery state 1
The last trans I handled was 1720702725.17056
Sync site's db version is 1610030433.14
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
 	 1279661 secs ago (at Thu Jul 11 14:58:45 2024)

Server (134.95.110.160): (db 0.0)
     last vote never rcvd
     last beacon never sent
     dbcurrent=0, up=0 beaconSince=0

   Where does this IP 134.95.110.160 come from?
   Well, actually, this was the -old- IP of this machine before it was 
moved into another network. But where did this come from? Hmmm...

   I got it.
   After correcting the server-CellServDB, I did not reboot the machine. I 
just stopped (and afterwards) restarted both the openafs-server and 
openafs-client. Obviously, the wrong IP remained in some kernel resident 
lists. I tried fixing the issue with "fs newcell", but no luck there.
   One reboot later, however, things are looking fine now:

  udebug afstest.rrz.uni-koeln.de vl -long
Host's addresses are: 134.95.13.39
Host's 134.95.13.39 time is Fri Jul 26 10:39:31 2024
Local time is Fri Jul 26 10:39:31 2024 (time differential 0 secs)
Last yes vote for 134.95.13.39 was 0 secs ago (sync site);
Last vote started 0 secs ago (at Fri Jul 26 10:39:31 2024)
Local db version is 1610030433.14
I am sync site forever (1 server)
Recovery state 1f
The last trans I handled was 1721983108.0
Sync site's db version is 1610030433.14
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
 	 63 secs ago (at Fri Jul 26 10:38:28 2024)

   Thanks, Jeffrey, for poining me in the right direction!
   (and hopefully someone can learn from my bunbling here :-) )


 	Dipl. Chem. Dr. Stephan Wonczak

         Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
         Universitaet zu Koeln, Weyertal 121, 50931 Koeln
         Tel: +49/(0)221/470-89583, Fax: +49/(0)221/470-89625
--1602634389-1840350760-1721983443=:4043825--