[OpenAFS-devel] fileserver problem

Thomas Mueller thomas.mueller@hrz.tu-chemnitz.de
Mon, 5 Nov 2001 17:29:05 +0100 (MET)


On Mon, 29 Oct 2001, Thomas Mueller wrote:

>=20
> Hi all,
>=20
> Today we had an outage of a fileserver (i386_linux22) running Redhat 6.=
2
> and OpenAFS-1.2.2.
>=20
> Suddenly the load increased to 15 or even more. Clients stopped working.
>=20

Some new information regarding to our fileserver problem, perhaps it will
help to track it down:

Today we had another outage and this time I got all the dumps.

Here is what I found:

# cbd  callback.dump      =20
The time of the dump was 1004973095 Mon Nov  5 16:11:35 2001
The last time cleanup ran was 1004973184 Mon Nov  5 16:13:04 2001
0 add CB, 3463103 break CB, 2155605 del CB, 396325 del FE, 2507664 CB's t=
imed out, 17546 space reclaim, 40425 del host
65000 CBs, 65000 FEs, (130000 of total of 65000 16-byte blocks)

Note: We start the fileserver with "-cd 65000".

# grep cbid hosts.dump | grep -v "cbid:0"
ip:5b3c6d86 port:22811 hidx:471 cbid:10310 lock:ffffffff last:1004973093 =
active:1004973093 down:0 del:0 cons:2 cldel:0
ip:4ba06d86 port:22811 hidx:943 cbid:28517 lock:ffffffff last:1004973094 =
active:1004973094 down:0 del:0 cons:2 cldel:32
ip:7c846d86 port:22811 hidx:291 cbid:20260 lock:ffffffff last:1004973094 =
active:1004973094 down:0 del:0 cons:2 cldel:32
ip:85286d86 port:22811 hidx:1163 cbid:33656 lock:ffffffff last:1004973094=
 active:1004973094 down:0 del:0 cons:2 cldel:32
ip:1e846d86 port:22811 hidx:134 cbid:39075 lock:ffffffff last:1004973094 =
active:1004973094 down:0 del:0 cons:2 cldel:32
ip:3ec86d86 port:22811 hidx:121 cbid:23947 lock:ffffffff last:1004973094 =
active:1004973094 down:0 del:0 cons:2 cldel:32

# cbd  -host 10310 callback.dump | wc -l
  64991
# cbd  -host 28517 callback.dump | wc -l
      1
# cbd  -host 20260 callback.dump | wc -l
      2
# cbd  -host 33656 callback.dump | wc -l
      3
# cbd  -host 39075 callback.dump | wc -l
      1
# cbd  -host 23947 callback.dump | wc -l
      2

It seems to me that one client (5b3c6d86) was able to consume nearly all
ressources.

I will try to find out, what the client was doing.

Bye,
Thomas.
--=20
-----------------------------------------------------------------------
Thomas M=FCller, TU Chemnitz, Universit=E4tsrechenzentrum, D-09107 Chemni=
tz
mail: Thomas.Mueller@hrz.tu-chemnitz.de
-----------------------------------------------------------------------