[OpenAFS] fileserver coredumping

Horst Birthelmer horst@riback.net
Mon, 19 Apr 2004 16:27:51 +0200


On Monday, April 19, 2004, at 04:10 PM, J S wrote:

>
>
>>
>> On Monday, April 19, 2004 12:41:49 +0000 J S <vervoom@hotmail.com> 
>> wrote:
>>
>>> # dbx /usr/afs/bin/fileserver corefile.fs
>>> Type 'help' for help.
>>> reading symbolic information ...warning: no source compiled with -g
>>>
>>> [using memory image in corefile.fs]
>>>
>>> IOT/Abort trap in pthread_kill at 0xd0014af8 ($t16)
>>> 0xd0014af8 (pthread_kill+0x80) 80410014        lwz   r2,0x14(r1)
>>> (dbx) where
>>> pthread_kill(??, ??) at 0xd0014af8
>>> _p_raise(??) at 0xd0013eac
>>> raise.raise(??) at 0xd018792c
>>> abort() at 0xd0180400
>>> AssertionFailed() at 0x1000594c
>>> FSYNC_sync() at 0x1004499c
>>> _pthread_body(??) at 0xd00080c8
>>> (dbx)
>>
>>
>> This is a little odd...
>>
>> This backtrace suggests that an assertion failed in FSYNC_sync().  
>> The only assert in FSYNC_sync occurs if the fileserver is unable to 
>> bind the fssync port after trying for about 25 seconds.  If you see 
>> this assert, you should also see 5 messages in the log about failing 
>> to bind the port; these should include an error code thay may point 
>> you in the right direction.
>>
>> Is it possible you already have another fileserver running, or 
>> something else bound or connected to port 2040 ?
>>


Jeffrey was right. Somehow he's always right :-)

>
> No I don't think it's that:
>
> # netstat -a | grep 2040
> # ps -ef | grep fileserver
>    root 101358  54276   1 15:05:12  pts/6  0:00 grep fileserver
>    root 101398  94564   0 15:03:58      -  0:00 /usr/afs/bin/fileserver
>
> But... cat Filelog shows:
>
> Mon Apr 19 15:05:08 2004 Getting FileServer name...
> Mon Apr 19 15:05:08 2004 FileServer host name is 'bspc1n11'
> Mon Apr 19 15:05:08 2004 Getting FileServer address...
> Mon Apr 19 15:05:08 2004 FileServer bspc1n11 has address 172.30.4.11 
> (0xac1e040b or 0xac1e040b in host byte order)
> Mon Apr 19 15:05:08 2004 File Server started Mon Apr 19 15:05:08 2004
> Mon Apr 19 15:05:13 2004 FSYNC_sync: bind failed with (68), will sleep 
> and retry
> Mon Apr 19 15:05:18 2004 FSYNC_sync: bind failed with (68), will sleep 
> and retry
> Mon Apr 19 15:05:23 2004 FSYNC_sync: bind failed with (68), will sleep 
> and retry
> Mon Apr 19 15:05:28 2004 FSYNC_sync: bind failed with (68), will sleep 
> and retry
>
>
> It should be connecting to bspc1n11e (which is on a different IP 
> address) not bspc1n11. Do you know how I can fix this? If I do vol 
> listaddrs it shows both bspc1n11 and bspc1n11e. Should I do vos 
> changeAddr -remove bspc1n11 ?
>
> Thanks for your help. By the way how do I check the FileServer version?
>

rxdebug -version <servername>

If you have some kind of 'virtual' IP adresses make shure the 
fileserver binds itself to the right one. I had a lot of trouble with 
that.


Horst