[OpenAFS] openafs-server does not recover from crash

Pascal Salet pascal.salet@wu.ac.at
Wed, 9 Mar 2022 17:26:51 +0100


Hi,

our openafs-server has stopped working after a crash.

"bos status" shows all services online for all fileservers and DBservers.

udebug port 7003 works correctly from all fileservers and DBservers.

However, "strace -p $(pidof salvageserver)" shows an error:
connect(6, {sa_family=3DAF_UNIX,=20
sun_path=3D"/var/lib/openafs/local/fssync.sock"}, 110) =3D -1 ECONNREFUSE=
D=20
(Connection refused)

SalsrvLog:
Wed Mar 09 16:09:31 2022 @(#)OpenAFS 1.8.2-1-debian 2018-09-12
Wed Mar 09 16:09:31 2022 Starting OpenAFS Online Salvage Server 2.4=20
(/usr/lib/openafs/salvageserver)
Wed Mar 09 16:10:57 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
Wed Mar 09 16:11:29 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
Wed Mar 09 16:12:09 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
SYNC_connect failed (giving up!): Connection refused
Wed Mar 09 16:12:57 2022 Unable to connect to file server; aborted

FileLog:
Wed Mar 09 15:57:41 2022 VL_RegisterAddrs rpc failed; will retry=20
periodically (code=3D-1, err=3D0)
Wed Mar 09 16:03:31 2022 Couldn't get CPS for AnyUser, will try again in=20
30 seconds; code=3D-1.
Wed Mar 09 16:06:56 2022 Couldn't get CPS for AnyUser, will try again in=20
30 seconds; code=3D-1.
Wed Mar 09 16:10:21 2022 Couldn't get CPS for AnyUser, will try again in=20
30 seconds; code=3D-1.
Wed Mar 09 16:13:46 2022 Couldn't get CPS for AnyUser, will try again in=20
30 seconds; code=3D-1.

Boslog:
Wed Mar  9 16:06:05 2022: dafs started pid 5912:=20
/usr/lib/openafs/salvageserver
Wed Mar  9 16:09:31 2022: dafs:salsrv exited with code 1
Wed Mar  9 16:09:31 2022: dafs started pid 6757:=20
/usr/lib/openafs/salvageserver
Wed Mar  9 16:12:57 2022: dafs:salsrv exited with code 1
Wed Mar  9 16:12:57 2022: dafs started pid 7635:=20
/usr/lib/openafs/salvageserver
Wed Mar  9 16:16:23 2022: dafs:salsrv exited with code 1

VolserLog:
Wed Mar 09 15:50:22 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
Wed Mar 09 15:50:54 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
Wed Mar 09 15:51:34 2022 SYNC_connect: temporary failure on circuit=20
'FSSYNC' (will retry)
SYNC_connect failed (giving up!): Connection refused
Wed Mar 09 15:52:21 2022 Unable to connect to file server; will retry at=20
need

I would be very grateful for any advice on this matter.

Pascal

--=20
Pascal Salet
IT-Services / Server Infrastructure
Wirtschaftsuniversit=C3=A4t Wien / Vienna University of Economics and=20
Business / Austria
pascal.salet@wu.ac.at / +43-676-8213-5375