[OpenAFS] ProbeUuid for host failed

Ken Elkabany Ken@Elkabany.com
Mon, 2 Apr 2012 19:04:19 -0700


--f46d04448149c2fcf004bcbcbac2
Content-Type: text/plain; charset=ISO-8859-1

Hi,

We're noticing an odd behavior on our AFS cluster with 2 fileservers,
and 200+ active clients, each generally reading from a couple hundred of
the same files every minute (thus, they should be cached). We have begun
seeing these messages in FileLog:

Tue Apr  3 00:32:04 2012 CB: ProbeUuid for host 0x--- (---:7001) failed -01

Over time these errors become more and more frequent. The problem is that
the client who hits this issue will experience a 5-10s delay in accessing a
file, which hurts performance significantly. The clients are 1.6pre1, and
the server is 1.4.14

Using afsmonitor, I do see that one of the clients hitting this issue (I
haven't checked whether all client have the problem, but many seem to) has
17M callbacks alloced. Could that be suspect? Are there any other
statistics I can provide to get to the bottom of this?

Here are the fileserver parameters in BosConfig: parm
/usr/lib/openafs/fileserver -L -p 200 -busyat 600 -rxpck 1000 -s 3000 -l
3000 -cb 1000000 -b 500 -vc 4800 -pctspare 5
Here are the client OPTIONS="-cachedir /mnt/cache/openafs -daemons 16 -stat
50000 "

Best,
Ken

--f46d04448149c2fcf004bcbcbac2
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>Hi,</div><div><br></div><div>We&#39;re noticing an odd behavior on our=
 AFS cluster with=A02 fileservers, and=A0200+ active clients, each generall=
y reading from a couple hundred of the same files every minute (thus, they =
should be cached). We have begun seeing these messages in FileLog:</div>

<div><br></div>Tue Apr =A03 00:32:04 2012 CB: ProbeUuid for host 0x--- (---=
:7001) failed -01<div><br></div><div>Over time these errors become more and=
 more frequent. The problem is that the client who hits this issue will exp=
erience a 5-10s delay in accessing a file, which hurts performance signific=
antly.=A0The clients are 1.6pre1, and the server is 1.4.14</div>

<div><br></div><div>Using afsmonitor, I do see that one of the clients hitt=
ing this issue (I haven&#39;t checked whether all client have the problem, =
but many seem to) has 17M callbacks alloced. Could that be suspect? Are the=
re any other statistics I can provide to get to the bottom of this?</div>

<div><br></div><div>Here are the fileserver parameters in BosConfig:=A0parm=
 /usr/lib/openafs/fileserver -L -p 200 -busyat 600 -rxpck 1000 -s 3000 -l 3=
000 -cb 1000000 -b 500 -vc 4800 -pctspare 5</div><div>Here are the client=
=A0OPTIONS=3D&quot;-cachedir /mnt/cache/openafs -daemons 16 -stat 50000 &qu=
ot;</div>

<div><br></div><div>Best,</div><div>Ken</div>

--f46d04448149c2fcf004bcbcbac2--