[OpenAFS-devel] Re: afs prevents linux-2.6.13-rc6-git9 from reboot

Martin MOKREJŠ mmokrejs@ribosome.natur.cuni.cz
Fri, 19 Aug 2005 21:54:29 +0200


This problem is reproducible when you delete instances fired under bosserver
and start afsd. Simple, use the "bos delete" command to delete ptserver, vlserver
and fileserver instances but keep in CellServDB you "fileserver" hosts.
After you start afsd, it will complain that the contact is lost.
Then sniff the packets and look for those which get reported by tcpdump
like having wrong checksum. I know this is misconfiguration, but even
this scenario should be defensive against misconfigurations - one should
be able to reboot the system. And additionally, this might lead naybe someone
to figure out why these message occur under "normal" circumstances as well? ;) 
M.


Martin MOKREJŠ wrote:
> Hi,
>   I've seen similar problem few days ago and have ask one of the
> lkml people. Ha hasn't found anything obvious in the stack trace attached
> at that time to the email. I have just cleaned up all my Trash and Sent
> folders, so don't ask me for that ... ;)
> 
>   Few minutes ago I have started my box and my init script tried to insmod
> wrong kernel module. I failed because of the version mismatch but then started
> bosserver while the kernel modules was not loaded. But shouldn't be a problem,
> right? Then I've manually insmod-ed the right module and started afsd.
> Although it said teh usual message that the "AFS daemons are started"
> or whatever is the message, I haven't seen the /afs mounted. Nothing in logs
> except the ordinary message "Lost contact with .... 192.168.0.11".
> 
>   tcpdump has shown that some udp packets with wrong checksum were coming
> back along with "cannot contact fileserver" ... I concluded this was caused
> by the fact that only bosserver and afsd was running, but no fileserver,
> ptserver etc. I Have no clue why it happened but ok, it's getting late here.
> 
>   I went for a reboot, but that got stuck, probably because of the afsd
> somewhat hanging. attached is what I got from the remote console. Does this help?
> Martin

-- 
Martin Mokrejs
Email: 'bW9rcmVqc21Acmlib3NvbWUubmF0dXIuY3VuaS5jeg==\n'.decode('base64')
GPG key is at http://www.natur.cuni.cz/~mmokrejs