[OpenAFS] Crash testing OpenAFS

ted creedon tcreedon@easystreet.com
Mon, 15 Aug 2005 10:36:34 -0700


General comment:

At the Workshop the MIT presenter stated that AFS worked fine until it was
"placed under load".

In the process of copying files from a 1.2.11 box to a 1.3.87 box I ran into
the same problem with Linux 2 Linux copying. That's how all this started.

tedc

-----Original Message-----
From: chas williams - CONTRACTOR [mailto:chas@cmf.nrl.navy.mil] 
Sent: Monday, August 15, 2005 10:24 AM
To: ted creedon
Cc: 'Jeffrey Altman'; openafs-info@openafs.org
Subject: Re: [OpenAFS] Crash testing OpenAFS 

In message <20050815171201.52EC0C62A@smtpauth.easystreet.com>,"ted creedon"
wri
tes:
>3. Copying the test set with empty files works fine. Files with data 
>crashes the destination 1.3.87 Linux box.

by "destination" you are referring to the afs fileserver containing the
destination afs volume?

>>Yes. 

>5. Crash means the Linux operating system crashes. Other xterm windows 
>do not respond, the system won't soft reboot and usually wont respond to
ping.
>Hardware reset is required.

if my assertion is true, then you should not be running anything on the
console, this includes x-windows or any other pretty graphical gui.  if the
server crashes, then you will be able to see the panic/oops unless the
machine wedges in which case we have to try something else.
try the simply things first though.

>>Can do. I'll keep at runlevel 5 but kill X.

you keep saying your cache gets corrupted.  this leads me to think that your
afs client machine is crashing and not the afs file server.

>>I suspect the Linux AFS 1.3.87 client crashes on the 1.3.87 server but I
don't have the expertise to tell. The cache corruption may be caused by the
hard reset?

is the fileserver and client the same machine?
>>There are 2 client/fileservers a 1.2.11 and a 1.3.87. The 1.3.87 box is
the destination that crashes.

how about that network diagram?
>>You should have it by now.