[OpenAFS-devel] OpenAFS 1.2.3 client hangs on Linux - kernels 2.4.2 and 2.4.9

Touretsky, Gregory gregory.touretsky@intel.com
Tue, 5 Mar 2002 08:32:54 +0200


We run afsd with the following parameters:
-files 30000 -dcache 10000 -stat 8000 -daemons 5 -volumes 256

As for the blocked request - I agree that new requests might be blocked, but
I see in this case that AFS access completely hangs, and it doesn't resume.
Another thing - other commands, that doesn't reside in AFS, like strace, ps,
etc hang after AFS becomes "frozen".
And, there is no such problem running IBM AFS 3.6 patch 4 in the same
configuration.

-- Gregory

-----Original Message-----
From: Derek Atkins [mailto:warlord@MIT.EDU]
Sent: Tuesday, March 05, 2002 12:07 AM
To: Touretsky, Gregory
Cc: 'openafs-devel@openafs.org'; Broughton, Travis V; Ervin, Douglas C;
Shamir, Yuval
Subject: Re: [OpenAFS-devel] OpenAFS 1.2.3 client hangs on Linux - kernels
2.4.2 and 2.4.9


There are multiple issues going on here.  The AFS client has a finite
number of channels that it can use to contact servers, and a finite
number of callback worker threads that can work on requests.  If all
the background daemons are busy, then future requests will block until
one becomes available.

What do you have for your -daemons setting to afsd?  You might try
increasing that number.

-derek

"Touretsky, Gregory" <gregory.touretsky@intel.com> writes:

> Hi,
> 
>   configuring Linux machine as NIS server, we found a strange problem -
AFS
> hangs if there are several (4) instances of "pwck -r" running
> simultaneously. pwck verifies integrity of /etc/passwd, and it stat's all
> user home dirs. We have ~3000 unix accounts with home dirs in AFS (each
home
> directory is a volume).
> I succeeded to reproduce this problem running several instances (10+) of
the
> following short script:
> #!/bin/tcsh
> #Usage <command> <file with the long list of AFS mount points>
> foreach i (`cat $1`)
> /bin/ls -ld $i
> end
> 
> The problem is reproducible on Linux 2.4.2 and 2.4.9 kernels with OAFS
> 1.2.3, I couldn't reproduce it on 2.4.2 with IBM AFS 3.6 patch 4. 
> Here are the last lines from fstrace output:
> time 206.474904, pid 1262: Access vp 0xe0aa0000 mode 0x40 len 0x800 
> time 206.474904, pid 1262: Access vp 0xe0aa0000 mode 0x40 len 0x800 
> time 206.474904, pid 1262: Access vp 0xe0aa05b8 mode 0x40 len 0x1000 
> time 206.474904, pid 1262: Access vp 0xe0aa05b8 mode 0x40 len 0x1000 
> time 206.474904, pid 1262: Access vp 0xe0ad33b0 mode 0x40 len 0x800 
> time 206.474904, pid 1262: Access vp 0xe0ad3598 mode 0x40 len 0x800 
> time 206.484904, pid 1256: Analyze RPC op -1 conn 0xd3f5e6c0 code 0x0 user
> 0x0 
> time 206.484904, pid 1256: Mount point is to vp 0xe0bb8938 fid
> (1:537094601.42.831) 
> time 206.504904, pid 1265: Access vp 0xe0aa0000 mode 0x40 len 0x800 
> time 206.504904, pid 1265: Access vp 0xe0aa0000 mode 0x40 len 0
> 
> You can see that the last line is incomplete.
> 
> Any thoughts?
> 
> Gregory Touretsky
> Israel Engineering Computing
> Unix Server Platforms
> gregory.touretsky@intel.com
> > (+) 972-4-865-6377, Fax: 04-865-5999
> iNET: 465-6377, M/S: IDC-1B
> 
> 
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available