[OpenAFS] segmentation faults ... Yes cache-stuff seems duff

Mathias Feiler Mathias Feiler <feiler@uni-hohenheim.de>
Thu, 10 Jan 2002 00:08:54 +0100 (MET)


D
>We are using  an open afs server with 3 Transarc afs db servers. when we
>try and compile something on the Openafs server (Which is intel with RH
>7.2 and kernel 2.4.9-13) we get segmentation faults. Does anyone
>have any ideas on what is causing this?
>
>--Mike
>


Well , YES , basicly I think we face the same problem.

I put that on the list the other week.... but got no answer so far.
Maybe it was the wrong list, maybe they are bussy,  maybe they don't like
germans ...  anyway  The subject was:
   " Strange behavior from Cachemanager (RH 7.2 / openafs 1.2.2) "  

Lets check if we got the same problem... 
In the syslogfile  /var/log/messages   You probably can find something like
the following. If so, we pull the same rope !!!
 
--------------------8<----------------------8<--------------------8<---------
....
Unable to handle kernel NULL pointer dereference at virtual address 00000000   
....
....
Call Trace: [do_truncate+62/128] do_truncate [kernel] 0x3e
Call Trace: [<c013049e>] do_truncate [kernel] 0x3e 
[jbd:__insmod_jbd_S.bss_L24+519239/81089333] afs_linux_permission
	[libafs-2.4.9-13afs] 0x4b
[<c68b5f8b>] afs_linux_permission [libafs-2.4.9-13afs] 0x4b
[open_namei+1095/1432] open_namei [kernel] 0x447
[<c013b783>] open_namei [kernel] 0x447
[jbd:__insmod_jbd_S.bss_L24+511471/81097101] afs_cleanup
	[libafs-2.4.9-13afs] 0xff
[<c68b4133>] afs_cleanup [libafs-2.4.9-13afs] 0xff
[jbd:__insmod_jbd_S.bss_L24+628240/80980332]
	__insmod_libafs-2.4.9-13afs_S.bss_L37792 [libafs-2.4.9-13afs] 0x4774
......
Code: 89 01 8b 44 24 0c 85 c0 74 16 8d 76 00 e8 53 e5 f2 ff c7 03

--------------------8<----------------------8<--------------------8<---------

What I found out is this : 

  (1) Boot the machine and log in as root - OK
  (2) Get a AFS-directory and a token for - OK
  (3) Start a aditional (sub) shell i.e. bash and do a  PS1="# "  - OK 
  (4) Do A  ' echo X > NewFile ' - OK
  (5) Do it AGAIN : ' echo X > NewFile '  - E R R O R !!! 
  (6) The prompt shoes You are back on the first shell

So fron a abstract point of view  it seams to be the cachemanager which is 
unable to write a file the second time. 

The more interesting and suppricing thing is :
* On one of the machines having that problem I recompiled the kernel   
* recompiled afs
* made a make install in the afs-tree
all with no success, but than, after I 
* copied the AFS-stuff by hand into the socalled target-location and
* reformated the cache-partition as ext2fs,
* did a view tiny other things (can't remember I regred )
all problems had gone. 

Since that I try to do the same on the other machines with that problem,
but I couldn't find the point so far. 

As soon as I can reproduce a action that helps I'll give You a hint.

Sncly Mathias Feiler   


 ---
 Hochachtungsvoll und mit freundlichen Gruessen   
   xxxxx                                               MATHIAS
  X __ \        ____/  ___/   /   /      ___/   __  
 X C  O-O      /      /      /   /      /      /  \       _\||/_
 X     _\     __/    __/    /   /      __/    _   ___/     o  o 
  X  _@      /      /      /   /      /      / \           (_)  
  |  |    __/    _____/ __/ _____/ _____/ __/   \__        ===

 --- 
 M. Feiler Roßbergstr.1 72649 Wolfschlugen 0049 (0)7022 560965 (Privat)
 feiler@uni-hohenheim.de     RZ (620)   0049 (0)711/459-3949   (Uni)   
 PGP public key &  Homepage   :  http://www.uni-hohenheim.de/~feiler
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -