[OpenAFS] fileserver coredumping
Horst Birthelmer
horst@riback.net
Mon, 19 Apr 2004 16:27:51 +0200
On Monday, April 19, 2004, at 04:10 PM, J S wrote:
>
>
>>
>> On Monday, April 19, 2004 12:41:49 +0000 J S <vervoom@hotmail.com>
>> wrote:
>>
>>> # dbx /usr/afs/bin/fileserver corefile.fs
>>> Type 'help' for help.
>>> reading symbolic information ...warning: no source compiled with -g
>>>
>>> [using memory image in corefile.fs]
>>>
>>> IOT/Abort trap in pthread_kill at 0xd0014af8 ($t16)
>>> 0xd0014af8 (pthread_kill+0x80) 80410014 lwz r2,0x14(r1)
>>> (dbx) where
>>> pthread_kill(??, ??) at 0xd0014af8
>>> _p_raise(??) at 0xd0013eac
>>> raise.raise(??) at 0xd018792c
>>> abort() at 0xd0180400
>>> AssertionFailed() at 0x1000594c
>>> FSYNC_sync() at 0x1004499c
>>> _pthread_body(??) at 0xd00080c8
>>> (dbx)
>>
>>
>> This is a little odd...
>>
>> This backtrace suggests that an assertion failed in FSYNC_sync().
>> The only assert in FSYNC_sync occurs if the fileserver is unable to
>> bind the fssync port after trying for about 25 seconds. If you see
>> this assert, you should also see 5 messages in the log about failing
>> to bind the port; these should include an error code thay may point
>> you in the right direction.
>>
>> Is it possible you already have another fileserver running, or
>> something else bound or connected to port 2040 ?
>>
Jeffrey was right. Somehow he's always right :-)
>
> No I don't think it's that:
>
> # netstat -a | grep 2040
> # ps -ef | grep fileserver
> root 101358 54276 1 15:05:12 pts/6 0:00 grep fileserver
> root 101398 94564 0 15:03:58 - 0:00 /usr/afs/bin/fileserver
>
> But... cat Filelog shows:
>
> Mon Apr 19 15:05:08 2004 Getting FileServer name...
> Mon Apr 19 15:05:08 2004 FileServer host name is 'bspc1n11'
> Mon Apr 19 15:05:08 2004 Getting FileServer address...
> Mon Apr 19 15:05:08 2004 FileServer bspc1n11 has address 172.30.4.11
> (0xac1e040b or 0xac1e040b in host byte order)
> Mon Apr 19 15:05:08 2004 File Server started Mon Apr 19 15:05:08 2004
> Mon Apr 19 15:05:13 2004 FSYNC_sync: bind failed with (68), will sleep
> and retry
> Mon Apr 19 15:05:18 2004 FSYNC_sync: bind failed with (68), will sleep
> and retry
> Mon Apr 19 15:05:23 2004 FSYNC_sync: bind failed with (68), will sleep
> and retry
> Mon Apr 19 15:05:28 2004 FSYNC_sync: bind failed with (68), will sleep
> and retry
>
>
> It should be connecting to bspc1n11e (which is on a different IP
> address) not bspc1n11. Do you know how I can fix this? If I do vol
> listaddrs it shows both bspc1n11 and bspc1n11e. Should I do vos
> changeAddr -remove bspc1n11 ?
>
> Thanks for your help. By the way how do I check the FileServer version?
>
rxdebug -version <servername>
If you have some kind of 'virtual' IP adresses make shure the
fileserver binds itself to the right one. I had a lot of trouble with
that.
Horst