Jose Calhariz jose.calhariz@tagus.ist.utl.pt
Mon, 13 Nov 2006 16:29:42 +0000

On Sun, Nov 12, 2006 at 03:46:35PM -0500, Derrick J Brashear wrote:
> On Sun, 12 Nov 2006, Jose Calhariz wrote:
> >I can confirm that it can happen with Linux and a recent client.
> >
> >I am testing the stability of openafs 1.4.2 in Linux, both in the
> >client and in the server.  So I have a Linux client endless compiling
> >the Linux kernel using make -j 35 over an afs volume in a Linux
> >Fileserver.  I have in the past run this test for days without
> >problems.  This time the compilation aborts with errors, normally an
> >indication of problems with hardware or kernel/software problems.
> >
> >The last error I have seen was:
> >
> >ar: sound/drivers/opl4/built-in.o: File format not recognized
> >
> >In the client there is no indication of a problem with AFS, but on the
> >fileserver I have seen the same error sometime before the compilation
> >aborted:
> >
> >Sun Nov 12 14:05:05 2006 FindClient: stillborn client 8185ef0(785cb670);=
> >conn 8149220 (host had client 8185f98(785cb670)
> >
> >So in my point of view, this kind of error can result in data corruption.
> Except that code path can't cause a corrupted file. It may be related but=
> that error message (in the fileserver) is not a cause of that client=20
> problem.

In my tests the compilation sometimes abort, because of a timeout
comunicating with the fileserver, usually happened during a vos
backupsys of all volumes.

Looking for errors in the fileserver I had seen "FindClient: stillborn
client" in some of the cases.  Can it be possible when a client is
hitting very hard a fileserver, with reads and writes, for this error
to happen?

What I can do to pinpoint the cause of the problem?

I can think this problem can hit my prodution clients and servers if I
do an upgrade to 1.4.2, now they use 1.3.81, 1.4.0, 1.4.1.

> Derrick

    Jos=E9 Calhariz


Se voc=EA lembrar que um problema existe, certamente ser=E1 encarregado de =

--Isu Fang

