[OpenAFS-devel] Misbehaviour of afs client on 4-processor server, Kernel 2.4.0

Herbert Huber Herbert.Huber@lrz-muenchen.de
Fri, 06 Apr 2001 14:48:58 +0200


Karl Lehnberger wrote:

> Hello everyone,
>
> I'm not able to get the afs client working proberly on a 4-processor
> server with 16GB memory, kernel 2.4.0 (RedHat), openafs-1.0.3.
> Loading the module, starting the afsd's and mounting /afs seem to be
> o.k. Also roaming through afs directories (cd, ls) is possible.
> But trying to access files from /afs fails:
> 12:54> /afs/ipp/@sys/bin/tokens
> /afs/ipp/@sys/bin/tokens: Bad address
>
> fstrace shows the following:
>
> AFS Trace Dump -
> ...
>
> time 818.372711, pid 1718: Access vp 0xf8922000 mode 0x40 len 0x2800
> time 818.372711, pid 1718: Access vp 0xf8922000 mode 0x40 len 0x2800
> time 818.372711, pid 1718: Lookup adp 0xf8922000 name ipp-garching.mpg.de fid (1:536870913.2.202), code=0
> time 818.372711, pid 1718: Mount point is to vp 0xf89223b8 fid (1:536870913.2.202)
> time 818.372711, pid 1718: Access vp 0xf8922594 mode 0x40 len 0x1800
> time 818.372711, pid 1718: Lookup adp 0xf8922594 name i386_linux24 fid (1:536870916.274.83895), code=0
> time 818.372711, pid 1718: Mount point is to vp 0xf8922770 fid (1:536870916.274.83895)
> time 818.372711, pid 1718: Access vp 0xf892294c mode 0x40 len 0x800
> time 818.372711, pid 1718: Access vp 0xf892294c mode 0x40 len 0x800
> time 818.372711, pid 1718: Access vp 0xf8922594 mode 0x40 len 0x1800
> time 818.372711, pid 1718: Lookup adp 0xf8922594 name i386_linux22 fid (1:536870916.182.77612), code=0
> time 818.372711, pid 1718: Mount point is to vp 0xf8922d04 fid (1:536870916.182.77612)
> time 818.372711, pid 1718: Access vp 0xf8922ee0 mode 0x40 len 0x800
> time 818.372711, pid 1718: Access vp 0xf8925d5c mode 0x40 len 0x6000
> time 818.372711, pid 1718: Lookup adp 0xf8925d5c name tokens fid (1:536923523.1482.1352), code=0
> time 818.372711, pid 1718: Access vp 0xf8925d5c mode 0x40 len 0x6000
> time 818.372711, pid 1718: Lookup adp 0xf8925d5c name AFSWS fid (1:536923523.50.278), code=0
> time 818.372711, pid 1718: Access vp 0xf8925d5c mode 0x40 len 0x6000
> time 818.372711, pid 1718: Access vp 0xf8925d5c mode 0x40 len 0x6000
> time 818.372711, pid 1718: Access vp 0xf8922ee0 mode 0x40 len 0x800
> time 818.372711, pid 1718: Access vp 0xf8928db4 mode 0x40 len 0x800
> time 818.372711, pid 1718: Lookup adp 0xf8928db4 name afsws fid (1:536923523.10.9694), code=0
> time 818.372711, pid 1718: Access vp 0xf8928db4 mode 0x40 len 0x800
> time 818.372711, pid 1718: Lookup adp 0xf8928db4 name afsws.open fid (1:536923523.4092.9692), code=0
> time 818.372711, pid 1718: Mount point is to vp 0xf892916c fid (1:536923523.4092.9692)
> time 818.372711, pid 1718: Access vp 0xf8929348 mode 0x40 len 0x800
> time 818.372711, pid 1718: Lookup adp 0xf8929348 name bin fid (1:536951524.3.4), code=0
> time 818.372711, pid 1718: Access vp 0xf8929524 mode 0x40 len 0x800
> time 818.372711, pid 1718: Lookup adp 0xf8929524 name tokens fid (1:536951524.798.5422), code=0
> time 818.372711, pid 1718: Access vp 0xf8929700 mode 0x40 len 0x165b0
> time 818.372711, pid 1718: Open 0xf8929700 flags 0x0
> time 818.372711, pid 1718: Open 0xf8929700 flags 0xf423f
> time 818.372711, pid 1718: Iread ip 0xf8929700 pos 0x0 count 0x80 code 1869f
> time 818.372711, pid 1718: Ireadpage ip 0xf8929700 pp 0xd1e7c750 count 0x5 code 1869f
> time 818.372711, pid 1718: Read vp 0xf8929700 off 0x0 resid 0x1000 file length 0x165b0
> time 818.372711, pid 1718: Returning code 14 from 13
> time 818.372711, pid 1718: Ireadpage ip 0xf8929700 pp 0xd1e7c750 count 0x5 code e
> time 818.372711, pid 1718: Iread ip 0xf8929700 pos 0x0 count 0x80 code fffffff2
> time 818.372711, pid 1718: Close 0xf8929700 flags 0x0
> time 834.602711, pid 1721: Access vp 0xf8922000 mode 0x40 len 0x2800
> time 834.602711, pid 1721: Access vp 0xf8922000 mode 0x40 len 0x2800
> time 834.602711, pid 1721: Lookup adp 0xf8922000 name ipp-garching.mpg.de fid (1:536870913.2.202), code=0
> ...
>
> The same kernel, same afs client version on a single cpu machine doesn't
> cause this behaviour.
> Perhaps somebody was faced with a similar problem and know a solution.
>
> ---
> Thank you in advance,
> Karl
>
> -----------------------------------------------------------------
> Karl Lehnberger                      e-mail lehnberger@rzg.mpg.de
>                                            phone +49-89-3299-2556
> RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
> Institut fuer Plasmaphysik (IPP)
> -----------------------------------------------------------------
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo.cgi/openafs-devel

Yes, we had the same problem and it still seems not to be fixed in the stable openafs-release.

The source of your problem is the different memory maping algorithm of linux kernels with High Memory
support enabled.

Chas Williams has kindly provided us with binaries which are running very stable at our site since weeks
now. If you want to give these binaries a try, please contact me.

Regards

Herbert

--
Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften
Abteilung Rechensysteme, Gruppe Hochleistungssysteme
Dr. Herbert Huber (Tel. +49 89 289 28833, Fax +49 89 2809460)
Barer Strasse 21
D-80333 Muenchen