[OpenAFS] Rotating log files and server probs

Thomas Mueller thomas.mueller@hrz.tu-chemnitz.de
Mon, 30 Sep 2002 07:37:35 +0200 (MEST)


On Fri, 27 Sep 2002, Derrick J Brashear wrote:

> On 27 Sep 2002, Mitchell D. Baker wrote:
>=20
>=20
> > Seems that a arbitrary times during the day.. the fileserver process on
> > on one of the servers (same server each time) jumps to 95%+ CPU and the
> > load on the system starts to rise.. If we catch it in time, we MAY be
> > able to issue a bos restart and calm things down.. and the system will
> > run for a while longer.. if we don't get to it in time then all of the
> > AFS cell locks up and we have to just reboot one or more servers...=20
> >=20
> > Anyone got a clue as to why this would be happening? Where to look for
> > the prob? =20
>=20
> Perhaps one of the people whose servers were running out of callbacks
> could tell us if the "looping trying to free a callback slot" behaves lik=
e
> this. There's a fix in 1.2.7 but the right answer is to fix
> viced/callback.c to not use u_short for storage of the callback index. No
> one has gotten to it yet.

You can easily figure out if you have this callback problem.
If your fileserver starts to consume all CPU cycles you may raise the
debug level by=20

kill -TSTP <pid_of_fileserver>

(perhaps you should repeat this two or three times)

After that you will see thousands of messages like

=2E.. Delete longest inactive host ...

in your /usr/afs/logs/FileLog.
If you don't see such messages you must have a different problem.

Thomas.
--=20
-------------------------------------------------
Thomas M=FCller, TU Chemnitz, URZ, D-09107 Chemnitz
Tel: +49 (0)371 5311755   Fax: +49 (0)371 5311629
-------------------------------------------------