[OpenAFS] ptserver processes hanging
Hartmut Reuter
reuter@rzg.mpg.de
Mon, 28 Oct 2002 18:00:09 +0100
Dr A V Le Blanc wrote:
> I've now got the problem that the ptserver process is hanging at times
> on two of my DB servers; the process won't die even after kill -9,
> but remains in the process table for several hours, and the other
> machines are unable to form a quorum, so no changes can be made
> to the pt database.
For me this looks like a problem with the file-system where /usr/afs/db
is in. The ptserver is pure userland code.
I would separate the database servers from fileservers, anyway (keeping
the ip-addresses for the database servervs, of course).
>
> The two problem servers are SGI Origens with 180 MHZ IP27 processors,
> running IRIX 6.5 and openafs 1.2.7. The third server, which is not
> showing this problem, is an i386 Linux box with a 1.8 GHZ processor,
> running Debian woody and with openafs 1.2.7 as well. The only way
> I know to solve the problem is to reboot the server with the
> hanging ptserver, which I'm usually reluctant to do, since
> the salvaging at boot time usually takes about 40 minutes on
> these machines, even after a clean shutdowm. (After a power failure,
> salvaging sometimes takes about 4 hours.)
You should configure the build of your fileservers with
"--enable-fast-restart". This skips the salvage and lets your
fileservers come back immediately. If you really have a damaged volume
it will probably go off-line by itself and you can salvage it later
without shuting down the fileserver. We do this since years without any
bad experience.
Hartmut
>
> The SGI machines are rock and ice, and the Linux one is snow.
> Currently ice's ptserver is hung. Below wre the results of udebug
> to the two runing ptservers. Note that each server is voting for
> itself as lowest host, which is why no quorum results. The funny
> times for last vote and last beacon also seem to be parts of the
> problem.
>
> -- Owen
> LeBlanc@mcc.ac.uk
>
> 'Udebug rock 7002 -long' returns:
>
> Host's addresses are: 130.88.203.11
> Host's 130.88.203.11 time is Mon Oct 28 16:08:31 2002
> Local time is Mon Oct 28 16:08:31 2002 (time differential 0 secs)
> Last yes vote for 130.88.203.11 was 9 secs ago (not sync site);
> Last vote started 9 secs ago (at Mon Oct 28 16:08:22 2002)
> Local db version is 1035451571.4
> I am not sync site
> Lowest host 130.88.203.11 was set 5 secs ago
> Sync host 0.0.0.0 was set 32323 secs ago
> Sync site's db version is 1035451571.4
> 0 locked pages, 0 of them for write
> Last time a new db version was labelled was:
> 369740 secs ago (at Thu Oct 24 10:26:11 2002)
>
> Server (130.88.203.12): (db 1035451571.4)
> last vote rcvd 32369 secs ago (at Mon Oct 28 07:09:02 2002),
> last beacon sent 32338 secs ago (at Mon Oct 28 07:09:33 2002), last vote was yes
> dbcurrent=1, up=0 beaconSince=0
>
> Server (130.88.203.13): (db 1035451571.4)
> last vote rcvd 32353 secs ago (at Mon Oct 28 07:09:18 2002),
> last beacon sent 10 secs ago (at Mon Oct 28 16:08:21 2002), last vote was yes
> dbcurrent=1, up=0 beaconSince=0
>
> and 'udebug scree 7002 -long' returns:
>
> Host's addresses are: 130.88.203.13
> Host's 130.88.203.13 time is Mon Oct 28 16:08:42 2002
> Local time is Mon Oct 28 16:08:42 2002 (time differential 0 secs)
> Last yes vote for 130.88.203.13 was 1 secs ago (not sync site);
> Last vote started 1 secs ago (at Mon Oct 28 16:08:41 2002)
> Local db version is 1035451571.4
> I am not sync site
> Lowest host 130.88.203.13 was set 1 secs ago
> Sync host 0.0.0.0 was set 32364 secs ago
> Sync site's db version is 1035451571.4
> 0 locked pages, 0 of them for write
>
> Server (130.88.203.12): (db 0.0)
> last vote rcvd 441604 secs ago (at Wed Oct 23 14:28:38 2002),
> last beacon sent 32268 secs ago (at Mon Oct 28 07:10:54 2002), last vote was no
> dbcurrent=0, up=0 beaconSince=0
>
> Server (130.88.203.11): (db 0.0)
> last vote rcvd 1 secs ago (at Mon Oct 28 16:08:41 2002),
> last beacon sent 1 secs ago (at Mon Oct 28 16:08:41 2002), last vote was no
> dbcurrent=0, up=1 beaconSince=1
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--
-----------------------------------------------------------------
Hartmut Reuter e-mail reuter@rzg.mpg.de
phone +49-89-3299-1328
RZG (Rechenzentrum Garching) fax +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------