[OpenAFS] openafs 1.4: kaserver crashes every 5 minutes on AIX 5.2

Horst Birthelmer horst@riback.net
Thu, 27 Oct 2005 11:44:25 +0200


On Oct 27, 2005, at 11:08 AM, Ernst Jeschek wrote:
> Hello,
>
> I've migrated our AFS DB servers (Cell: wu-wien.ac.at) from Transarc
> to OpenAFS on AIX 5.2. Since one of the new servers is sync site,
> the kaserver on this machine crashes every 5 minutes, and is restarted
> by the bosserver.
>
> It happens with 1.4.0 RC6 and RC8. It seems to have something to
> do with lwp (dbx where output below). The two other openafs db
> servers are running without problems.
>
> What have I done wrong?
>
> openafs 1.4.0 RC8, configured with --enable-transarc-paths,
> --enable-debug and --enable-debug-lwp, compiled with vac.C 5.0.2.8.
>
> oslevel -r: 5200-07
>
> | root> dbx /usr/afs/bin/kaserver /usr/afs/logs/corekaserver
> | Type 'help' for help.
> | [using memory image in corekaserver]
> | reading symbolic information ...
> |
> | Illegal instruction (illegal opcode) in . at 0x2002e820
> | warning: Unable to access address 0x2002e820 from core
> | (dbx) where
> | warning: Unable to access address 0x2002e820 from core
> | warning: Unable to access address 0x2002e820 from core
> | warning: Unable to access address 0x2002e81c from core
> | warning: Unable to access address 0x2002e81c from core
> | warning: Unable to access address 0x2002e820 from core
> | warning: Unable to access address 0x2002e820 from core
> | warning: Unable to access address 0x2002e81c from core
> | warning: Unable to access address 0x2002e81c from core
> | warning: Unable to access address 0x2002e820 from core
> | .() at 0x2002e820
> | warning: Unable to access address 0xfcfdff07 from core
> | FiveMinuteCheckLWP() at 0x100508cc
> | warning: Unable to access address 0xfcfdfeff from core
> | warning: Unable to access address 0xfcfdfeff from core
> | Create_Process_Part2(), line 778 in "lwp.c"
>
> Any help would be highly appreciated.

I'm running db servers on AIX 5.2, too, and they're working. I can't  
think of why a FiveMinuteCheckLWP would cause a crash.
What I'm trying to say, is, that's pretty weird, but that's what all  
bugs are :-)

You're sure it works until the first 5 min. check?
Does it say anything in the logs during startup?
What does udebug say about the quorum during those 5 minutes?


Horst