[OpenAFS] After server reboot, all connections time out

Ryan C. Underwood nemesis@icequake.net
Sat, 17 Sep 2011 10:22:07 -0500


--7AUc2qLy4jB3hD7Z
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


My RW server went bump in the night last night.  After rebooting, everything
came back up as normal but attempting to access either /afs/icequake.net or
/afs/.icequake.net would result in "connection timed out".

I have restarted all fileservers and all clients with only the following to
note: after the client is restarted, the first request to /afs will pause f=
or a
few seconds before returning the timeout error, then subsequent requests re=
turn
timeout immediately.  fs checks/checkv had no effect except to introduce the
pause on the first request again.

Needless to say this is baffling.  There is nothing interesting in the logs=
 or
udebug output, but maybe someone else might disagree. 10.0.1.230 is the ubik
master and 10.0.1.232 is the RW fileserver.


# udebug 10.0.1.230 7003
Host's addresses are: 10.0.1.230 65.38.17.159=20
Host's 10.0.1.230 time is Sat Sep 17 10:19:38 2011
Local time is Sat Sep 17 10:19:38 2011 (time differential 0 secs)
Last yes vote for 10.0.1.230 was 6 secs ago (sync site);=20
Last vote started 6 secs ago (at Sat Sep 17 10:19:32 2011)
Local db version is -1438751922.1777336322
I am sync site until 53 secs from now (at Sat Sep 17 10:20:31 2011) (3 serv=
ers)
Recovery state 1f
Sync site's db version is -1438751922.1777336322
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         1145824 secs ago (at Sun Sep  4 04:02:34 2011)

Server (10.0.1.233 65.38.17.160): (db -1438751922.1777336322)
    last vote rcvd 7 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 6 secs ago (at Sat Sep 17 10:19:32 2011), last vote wa=
s yes
    dbcurrent=3D1, up=3D1 beaconSince=3D1

Server (10.0.1.232 65.38.17.158): (db -1438751922.1777336322)
    last vote rcvd 7 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 6 secs ago (at Sat Sep 17 10:19:32 2011), last vote wa=
s yes
    dbcurrent=3D1, up=3D1 beaconSince=3D1

# udebug 10.0.1.230 7002
Host's addresses are: 10.0.1.230 65.38.17.159=20
Host's 10.0.1.230 time is Sat Sep 17 10:19:37 2011
Local time is Sat Sep 17 10:19:39 2011 (time differential 2 secs)
Last yes vote for 10.0.1.230 was 7 secs ago (sync site);=20
Last vote started 7 secs ago (at Sat Sep 17 10:19:32 2011)
Local db version is 1313883291.5
I am sync site until 50 secs from now (at Sat Sep 17 10:20:29 2011) (3 serv=
ers)
Recovery state 1f
Sync site's db version is 1313883291.5
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         2389486 secs ago (at Sat Aug 20 18:34:53 2011)

Server (10.0.1.233 65.38.17.160): (db 1313883291.5)
    last vote rcvd 8 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 7 secs ago (at Sat Sep 17 10:19:32 2011), last vote wa=
s yes
    dbcurrent=3D1, up=3D1 beaconSince=3D1

Server (10.0.1.232 65.38.17.158): (db 1313883291.5)
    last vote rcvd 10 secs ago (at Sat Sep 17 10:19:29 2011),
    last beacon sent 7 secs ago (at Sat Sep 17 10:19:32 2011), last vote wa=
s yes
    dbcurrent=3D1, up=3D1 beaconSince=3D1


# cat FileLog
Sat Sep 17 10:04:45 2011 File server starting (/usr/lib/openafs/dafileserve=
r -p 123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind -cb 4000000 -vattac=
hpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsize 11)
Sat Sep 17 10:04:45 2011 afs_krb_get_lrealm failed, using icequake.net.
Sat Sep 17 10:04:46 2011 VLRU: starting scanner with the following configur=
ation parameters:
Sat Sep 17 10:04:46 2011 VLRU:  offlining volumes after minimum of 86400 se=
conds of inactivity
Sat Sep 17 10:04:46 2011 VLRU:  running VLRU soft detach pass every 120 sec=
onds
Sat Sep 17 10:04:46 2011 VLRU:  taking up to 8 volumes offline per pass
Sat Sep 17 10:04:46 2011 VLRU:  scanning generation 0 for inactive volumes =
every 10800 seconds
Sat Sep 17 10:04:46 2011 VLRU:  scanning for promotion/demotion between gen=
erations 0 and 1 every 172800 seconds
Sat Sep 17 10:04:46 2011 VLRU:  scanning for promotion/demotion between gen=
erations 1 and 2 every 345600 seconds
Sat Sep 17 10:04:46 2011 Set thread id 3 for FSYNC_sync
Sat Sep 17 10:04:46 2011 VInitVolumePackage: beginning parallel fileserver =
startup
Sat Sep 17 10:04:46 2011 VInitVolumePackage: using 1 threads to pre-attach =
volumes on 1 partitions
Sat Sep 17 10:04:46 2011 Scanning partitions on thread 1 of 1
Sat Sep 17 10:04:46 2011 Partition /vicepa: pre-attaching volumes
Sat Sep 17 10:04:46 2011 Partition scan thread 1 of 1 ended
Sat Sep 17 10:04:46 2011 fs_stateRestore: commencing fileserver state resto=
re
Sat Sep 17 10:04:46 2011 fs_stateRestore: host table restored
Sat Sep 17 10:04:46 2011 fs_stateRestore: FileEntry and CallBack tables res=
tored
Sat Sep 17 10:04:46 2011 fs_stateRestore: host table indices remapped
Sat Sep 17 10:04:46 2011 fs_stateRestore: FileEntry and CallBack indices re=
mapped
Sat Sep 17 10:04:46 2011 fs_stateRestore: restore phase complete
Sat Sep 17 10:04:46 2011 fs_stateRestore: beginning state verification phase
Sat Sep 17 10:04:46 2011 h_stateVerifyUuidHash: warning: uuid hash entry po=
ints to different host struct (1, 0)
Sat Sep 17 10:04:46 2011 fs_stateRestore: fileserver state verification com=
plete
Sat Sep 17 10:04:46 2011 fs_stateRestore: restore was successful
Sat Sep 17 10:04:46 2011 Set thread id 0000007E for 'FiveMinuteCheckLWP'
Sat Sep 17 10:04:46 2011 Getting FileServer name...
Sat Sep 17 10:04:46 2011 Set thread id 00000081 for 'HostCheckLWP'
Sat Sep 17 10:04:46 2011 FileServer host name is 'valhalla'
Sat Sep 17 10:04:46 2011 Getting FileServer address...
Sat Sep 17 10:04:46 2011 Set thread id 00000083 for 'FsyncCheckLWP'
Sat Sep 17 10:04:46 2011 FileServer valhalla has address 10.0.1.232 (0xe801=
000a or 0xa0001e8 in host byte order)
Sat Sep 17 10:04:46 2011 File Server started Sat Sep 17 10:04:46 2011



--=20
Ryan C. Underwood, <nemesis@icequake.net>

--7AUc2qLy4jB3hD7Z
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iD8DBQFOdLsfIonHnh+67jkRAv8qAJ9vWJzaS/QQRgp62i1U0kZ9dph3MgCdF5UW
ge+n0iWtdS4U4vS9cEjsJ8k=
=RCe2
-----END PGP SIGNATURE-----

--7AUc2qLy4jB3hD7Z--