[OpenAFS] Crash testing OpenAFS
chas williams - CONTRACTOR
chas@cmf.nrl.navy.mil
Mon, 15 Aug 2005 13:25:07 -0400
learn how to do a quoted reply. i can't read this gibberish.
In message <20050815171752.691B0C633@smtpauth.easystreet.com>,"ted creedon" wri
tes:
>
>
>-----Original Message-----
>From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org]
>On Behalf Of chas williams - CONTRACTOR
>Sent: Monday, August 15, 2005 9:35 AM
>To: ted creedon
>Cc: openafs-info@openafs.org
>Subject: Re: [OpenAFS] Crash testing OpenAFS
>
>ted, please answer my questions.
>
>what is the network configuration?
>
>>>hiawatha.home.ted-doris.fam 10.1.1.190 running AFS 1.2.11 client and
>server Linux 2.6.11.. Hosting cell "bigcell"
>>>nome.home.ted-doris.fam 10.1.1.193 running AFS 1.3.87 client and server
>Linux 2.6.11..Hosting cell "home.ted-doris.fam"
>>>denali.home.ted-doris.fam 10.1.1.100 running AFS Windows Client Debug
>1.3.84 Win Server 2003
>>>nome and hiawatha are connected thru a Linksys EZXS55W generic 100base T
>switch. Full duplex.
>>>Internal class C network with no firewalling.
>>> cp -rpvf /afs/.bigcell/bar2 /afs/.home.ted-doris.fam/bar2 run on nome
>crashes nome consistently. Where bar2 is a volume not a local directory.
>
>what do you mean by crash? this has never been clear to me. does the
>openafs client box (the one running the cp -rpv) lockup?
>
>>>"Crash on nome " means Linux Operating system crash on "nome". I.e.
>"lockup". No response to keyboard, mouse or ssh. Sometimes ping 10.1.1.193
>produces a reply, sometimes not.
>>>"Crash on denali" means Windows server does not respond (this not an issue
>with 1.3.87, just a comment that whatever caused it in 1.3.84 has been
>corrected).
>
> does it simply stop copying?
>>>Yes. And there are no rx packets from hiawatha, hiawatha is unaffected.
>
> if it locks up are there messages on the console that say something about
>an "oops"?
>
>>>No messages on xterm or in /var/log/messages.
>
>what happens when you copy the generated directory tree from one local
>volume to another local volume (both source and destination volumes are
>located on you local afs fileserver running 1.3.87 on your local network).
>
>In message <20050815161440.3390EC2C7@smtpauth.easystreet.com>,"ted creedon"
>wri
>tes:
>>You are correct, originally I created 1 meg files and reduced them in size.
>>
>>Not only does this directory crash Linux but 2 others do too.
>>
>>Linux does not crash then copying from the 1.2.11 fileserver to
>>/root/filename on the 1.3.87 client.
>>
>>Suggest using a loopback filesystem or manual mount for /usr/vice/cache
>>to prevent problems on reboot.
>>
>>tedc
>>
>>-----Original Message-----
>>From: openafs-info-admin@openafs.org
>>[mailto:openafs-info-admin@openafs.org]
>>On Behalf Of chas williams - CONTRACTOR
>>Sent: Monday, August 15, 2005 8:55 AM
>>To: ted creedon
>>Cc: openafs-info@openafs.org
>>Subject: Re: [OpenAFS] Crash testing OpenAFS
>>
>>In message <20050815150504.4AAF3BF44@smtpauth.easystreet.com>,"ted creedon"
>>wri
>>tes:
>>>ftp://creedon.dhs.org/afs_stress_test/run0/
>>>ftp://creedon.dhs.org/afs_stress_test/run1
>>
>>i recreated your test directory tree locally. i am puzzled about a few
>>things though. for instance:
>>
>> #!/bin/bash
>> #set -x #if one is curious..
>> dd if=/dev/zero of=1meg bs=256K count=1
>> cp 1meg "./TESTDIR.TMP"
>> cp 1meg "./ADAPTEC/ACMWrapperServer.A021.dll"
>> cp 1meg "./ADAPTEC/ACMWrapperServer.A884.dll"
>> cp 1meg "./ADAPTEC/CdCopier.A021.exe"
>>
>>i would hazard that this is creating 256k files, not 1M files.
>>the total volume size, after running ./mkdirs, ./mkfiles, ./mk1megfiles
>>was about 5.6G. is this corect?
>>
>>i was able to copy this tree from one volume to another on a different
>>server (within our local afs cell). the servers are amd64_solaris10
>>running openafs 1.3.81. the afs client machine which did the create
>>and subsequent copy, was i386_2.6.13-rc3 running openafs 1.3.87.
>>
>>your tcpdump leads me to believe that atleast part of these tests is
>>behind a NAT. is this true? further, the tcpdump from run1 looks
>>incomplete. the end of the dump still seems to show data transfer.
>>
>>the fstrace output from run0 is useless. you need to install the
>>afszcm.cat in order to get something human readable.
>>
>>cmdebug from run0 looks unremarkable. the client doesnt not appear to
>>be wedged in anyway.
>>
>>conclusions: i would guess that the 1.3.87 openafs client is stable.
>>perhaps you could trying building and running an older set of afs
>>server binaries, say 1.3.81.
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>