[OpenAFS-devel] A few questions about the current Linux implementation of the AFS client

21 Jan 2002 12:57:02 -0500

Derek Atkins <warlord@MIT.EDU> writes:
[snip]
> There are known screw cases in the current code.  In particular, if a
> client starts with network "configured" but not working, or without
> being able to contact their AFS servers, it will get into a state
> where the only recovery option is to reboot.  I don't know of any
> others off the top of my head, but this is a kind of major one.

We actually work around that at our site by using a modified afs startup
script.  Before loading the kernel module, it tries pinging the router
(thoug this could probably easily be changed to be one of the vldb
servers) for 120 seconds.  If it doesn't start getting successful pings
at the end of that time, the startup script bails.

The long timeout was needed for a 3c59x bug, in that the card comes up,
but doesn't really start passing packets for upwards of a minute. 

The relevant script bit looks something like...

    router="`netstat -rn | awk '$1 == "0.0.0.0" {print $2}'`"
    count=0
    while [ $count -lt 130 ]; do
        ping -c 1 -i 1 -n -q -w 1 "$router" >/dev/null 2>&1 && break
        count=$(( $count + 1 ))
    done
    if [ $count = 130 ]; then
        echo -n " no network?"
        echo -e "$rc_failed"
        exit 1
    fi

-- 
Tom Maher, ECE Systems Administrator