FW: [OpenAFS] Crash testing OpenAFS

ted creedon tcreedon@easystreet.com
Mon, 15 Aug 2005 10:41:08 -0700

This is what was sent from my "sent files".

There are no >>>'s! >> is my reply to your >.


-----Original Message-----
From: ted creedon [mailto:tcreedon@easystreet.com] 
Sent: Monday, August 15, 2005 10:18 AM
To: 'chas williams - CONTRACTOR'
Cc: 'openafs-info@openafs.org'
Subject: RE: [OpenAFS] Crash testing OpenAFS 


-----Original Message-----
From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org]
On Behalf Of chas williams - CONTRACTOR
Sent: Monday, August 15, 2005 9:35 AM
To: ted creedon
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Crash testing OpenAFS 

ted, please answer my questions.

what is the network configuration?

>>hiawatha.home.ted-doris.fam running AFS 1.2.11 client and
server Linux 2.6.11.. Hosting cell "bigcell"
>>nome.home.ted-doris.fam running AFS 1.3.87 client and server
Linux 2.6.11..Hosting cell "home.ted-doris.fam"
>>denali.home.ted-doris.fam running AFS Windows Client Debug 
>>1.3.84 Win Server 2003 nome and hiawatha are connected thru a Linksys
EZXS55W generic 100base T switch. Full duplex.
>>Internal class C network with no firewalling.
>> cp -rpvf /afs/.bigcell/bar2 /afs/.home.ted-doris.fam/bar2 run on nome
crashes nome consistently. Where bar2 is a volume not a local directory.

what do you mean by crash?  this has never been clear to me.  does the
openafs client box (the one running the cp -rpv) lockup? 

>>"Crash on nome " means Linux Operating system crash on "nome". I.e.
"lockup". No response to keyboard, mouse or ssh. Sometimes ping
produces a reply, sometimes not.
>>"Crash on denali" means Windows server does not respond (this not an issue
with 1.3.87, just a comment that whatever caused it in 1.3.84 has been

 does it simply stop copying? 
>>Yes. And there are no rx packets from hiawatha, hiawatha is unaffected.

 if it locks up are there messages on the console that say something about
an "oops"?

>>No messages on xterm or in /var/log/messages.

what happens when you copy the generated directory tree from one local
volume to another local volume (both source and destination volumes are
located on you local afs fileserver running 1.3.87 on your local network).

In message <20050815161440.3390EC2C7@smtpauth.easystreet.com>,"ted creedon"
>You are correct, originally I created 1 meg files and reduced them in size.
>Not only does this directory crash Linux but 2 others do too.
>Linux does not crash then copying from the 1.2.11 fileserver to 
>/root/filename on the 1.3.87 client.
>Suggest using a loopback filesystem or manual mount for /usr/vice/cache 
>to prevent problems on reboot.
>-----Original Message-----
>From: openafs-info-admin@openafs.org
>On Behalf Of chas williams - CONTRACTOR
>Sent: Monday, August 15, 2005 8:55 AM
>To: ted creedon
>Cc: openafs-info@openafs.org
>Subject: Re: [OpenAFS] Crash testing OpenAFS
>In message <20050815150504.4AAF3BF44@smtpauth.easystreet.com>,"ted creedon"
>i recreated your test directory tree locally.  i am puzzled about a few 
>things though.  for instance:
>	#!/bin/bash
>	#set -x #if one is curious..
>	dd if=/dev/zero of=1meg bs=256K count=1
>	cp 1meg "./TESTDIR.TMP"
>	cp 1meg "./ADAPTEC/ACMWrapperServer.A021.dll"
>	cp 1meg "./ADAPTEC/ACMWrapperServer.A884.dll"
>	cp 1meg "./ADAPTEC/CdCopier.A021.exe"
>i would hazard that this is creating 256k files, not 1M files.
>the total volume size, after running ./mkdirs, ./mkfiles, ./mk1megfiles 
>was about 5.6G.  is this corect?
>i was able to copy this tree from one volume to another on a different 
>server (within our local afs cell).  the servers are amd64_solaris10 
>running openafs 1.3.81.  the afs client machine which did the create 
>and subsequent copy, was i386_2.6.13-rc3 running openafs 1.3.87.
>your tcpdump leads me to believe that atleast part of these tests is 
>behind a NAT.  is this true?  further, the tcpdump from run1 looks 
>incomplete.  the end of the dump still seems to show data transfer.
>the fstrace output from run0 is useless.  you need to install the 
>afszcm.cat in order to get something human readable.
>cmdebug from run0 looks unremarkable.  the client doesnt not appear to 
>be wedged in anyway.
>conclusions:  i would guess that the 1.3.87 openafs client is stable.
>perhaps you could trying building and running an older set of afs 
>server binaries, say 1.3.81.
>OpenAFS-info mailing list
>OpenAFS-info mailing list
OpenAFS-info mailing list