[OpenAFS] Crash testing OpenAFS

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Mon, 15 Aug 2005 13:25:07 -0400


learn how to do a quoted reply.  i can't read this gibberish.

In message <20050815171752.691B0C633@smtpauth.easystreet.com>,"ted creedon" wri
tes:
> 
>
>-----Original Message-----
>From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org]
>On Behalf Of chas williams - CONTRACTOR
>Sent: Monday, August 15, 2005 9:35 AM
>To: ted creedon
>Cc: openafs-info@openafs.org
>Subject: Re: [OpenAFS] Crash testing OpenAFS 
>
>ted, please answer my questions.
>
>what is the network configuration?
>
>>>hiawatha.home.ted-doris.fam 10.1.1.190 running AFS 1.2.11 client and
>server Linux 2.6.11.. Hosting cell "bigcell"
>>>nome.home.ted-doris.fam 10.1.1.193 running AFS 1.3.87 client and server
>Linux 2.6.11..Hosting cell "home.ted-doris.fam"
>>>denali.home.ted-doris.fam 10.1.1.100 running AFS Windows Client Debug
>1.3.84 Win Server 2003
>>>nome and hiawatha are connected thru a Linksys EZXS55W generic 100base T
>switch. Full duplex.
>>>Internal class C network with no firewalling.
>>> cp -rpvf /afs/.bigcell/bar2 /afs/.home.ted-doris.fam/bar2 run on nome
>crashes nome consistently. Where bar2 is a volume not a local directory.
>
>what do you mean by crash?  this has never been clear to me.  does the
>openafs client box (the one running the cp -rpv) lockup? 
>
>>>"Crash on nome " means Linux Operating system crash on "nome". I.e.
>"lockup". No response to keyboard, mouse or ssh. Sometimes ping 10.1.1.193
>produces a reply, sometimes not.
>>>"Crash on denali" means Windows server does not respond (this not an issue
>with 1.3.87, just a comment that whatever caused it in 1.3.84 has been
>corrected).
>
> does it simply stop copying? 
>>>Yes. And there are no rx packets from hiawatha, hiawatha is unaffected.
>
> if it locks up are there messages on the console that say something about
>an "oops"?
>
>>>No messages on xterm or in /var/log/messages.
>
>what happens when you copy the generated directory tree from one local
>volume to another local volume (both source and destination volumes are
>located on you local afs fileserver running 1.3.87 on your local network).
>
>In message <20050815161440.3390EC2C7@smtpauth.easystreet.com>,"ted creedon"
>wri
>tes:
>>You are correct, originally I created 1 meg files and reduced them in size.
>>
>>Not only does this directory crash Linux but 2 others do too.
>>
>>Linux does not crash then copying from the 1.2.11 fileserver to 
>>/root/filename on the 1.3.87 client.
>>
>>Suggest using a loopback filesystem or manual mount for /usr/vice/cache 
>>to prevent problems on reboot.
>>
>>tedc
>>
>>-----Original Message-----
>>From: openafs-info-admin@openafs.org 
>>[mailto:openafs-info-admin@openafs.org]
>>On Behalf Of chas williams - CONTRACTOR
>>Sent: Monday, August 15, 2005 8:55 AM
>>To: ted creedon
>>Cc: openafs-info@openafs.org
>>Subject: Re: [OpenAFS] Crash testing OpenAFS
>>
>>In message <20050815150504.4AAF3BF44@smtpauth.easystreet.com>,"ted creedon"
>>wri
>>tes:
>>>ftp://creedon.dhs.org/afs_stress_test/run0/
>>>ftp://creedon.dhs.org/afs_stress_test/run1
>>
>>i recreated your test directory tree locally.  i am puzzled about a few 
>>things though.  for instance:
>>
>>	#!/bin/bash
>>	#set -x #if one is curious..
>>	dd if=/dev/zero of=1meg bs=256K count=1
>>	cp 1meg "./TESTDIR.TMP"
>>	cp 1meg "./ADAPTEC/ACMWrapperServer.A021.dll"
>>	cp 1meg "./ADAPTEC/ACMWrapperServer.A884.dll"
>>	cp 1meg "./ADAPTEC/CdCopier.A021.exe"
>>
>>i would hazard that this is creating 256k files, not 1M files.
>>the total volume size, after running ./mkdirs, ./mkfiles, ./mk1megfiles 
>>was about 5.6G.  is this corect?
>>
>>i was able to copy this tree from one volume to another on a different 
>>server (within our local afs cell).  the servers are amd64_solaris10 
>>running openafs 1.3.81.  the afs client machine which did the create 
>>and subsequent copy, was i386_2.6.13-rc3 running openafs 1.3.87.
>>
>>your tcpdump leads me to believe that atleast part of these tests is 
>>behind a NAT.  is this true?  further, the tcpdump from run1 looks 
>>incomplete.  the end of the dump still seems to show data transfer.
>>
>>the fstrace output from run0 is useless.  you need to install the 
>>afszcm.cat in order to get something human readable.
>>
>>cmdebug from run0 looks unremarkable.  the client doesnt not appear to 
>>be wedged in anyway.
>>
>>conclusions:  i would guess that the 1.3.87 openafs client is stable.
>>perhaps you could trying building and running an older set of afs 
>>server binaries, say 1.3.81.
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>