[OpenAFS] /afs does not exist

J S vervoom@hotmail.com
Wed, 14 Apr 2004 09:30:16 +0000


More Info:

I logged on to rsl155 and the BosLog is reporting the following:

FSYNC_clientInit temporary failure (will retry): Connection refused

Do you know what this means?

>Hi Kim,
>
>Rebooted some of the bad clients last night to no avail. I have added some 
>answers to your questions below.
>I think that the clients may have stopped working on 11 April actually (not 
>10 April).
>
>Thanks for your help on this.
>
>JS.
>>
>> >
>> > >Is /afs available on other AFS clients?  That rules out some
>> > possibilities.
>> > >
>> >
>> > Yes, on some of them. 50% have this problem though.
>>
>>That's interesting.  One I can understand.  50% gives me a pause ... 
>>unless
>>of course you only have 2 clients :)
>
>I've discovered at least 6 clients with this problem so far :(
>>
>> >
>> > >I'd go to a well-behaved client, cd /afs, fs flushv, cd /afs and see 
>>if
>> > >/afs
>> > >is still available on that client.
>> >
>> > Yes, tried this and /afs was still available.
>> >
>> > >
>>
>> > vos exa root.afs gave (on well behaved client):
>> >
>> >
>> > vos exa root.afs.readonly gace:
>> > >
>>
>>These are both fine.
>>
>>I do suggest creating an RO on   "server rsl155 partition /vicepa RW
>>ite"  -- same server, same partition as RW -- doesn't cost much since such 
>>a
>>RO is COW/clone and will allow rsl155 to be used as an RO failover site.
>>(Client won't fail over to the RW if it's supposed to get an RO, and when
>>root.afs is replicated the client will be looking for an RO.)
>Mmm not sure how to do this? I'm not exactly an AFS expert as you prob 
>guessed!
>
>>
>>Are you able to "vos listvl root.afs" from a good client/bad client?
>
>Bad client:
>$ vos listvl root.afs
>vsu_ClientInit: Could not process files in configuration directory 
>(/usr/vice/etc).
>could not initialize VLDB library (code=4294967295)
>
>Good client:
>$ vos listvl root.afs
>
>root.afs
>    RWrite: 536870915     ROnly: 536870916
>    number of sites -> 3
>       server rsl155 partition /vicepa RW Site
>       server rsl156 partition /vicepa RO Site
>       server rsl59 partition /vicepa RO Site
>
>>
>> > >
>> > >rxdebug <hostname> 7001    will give you some info about activity on 
>>the
>> > >AFS
>> > >client's callback port
>> >
>> > On a working client:
>>
>>Nothing conclusive here, but nothing unexpected either.  rs155 has talked 
>>to
>>a fileserver (apparently itself) and still has an open connection.
>>
>> >
>> > rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001
>> > Trying 167.156.154.55 (port 7001):
>> > Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64
>> > not waiting for packets.
>> > 0 calls waiting for a thread
>> > 1 threads are idle
>> > Connection from host 167.156.154.55, port 7000, Cuid 9915c0ac/1817cfe8
>> >   serial 128760,  natMTU 1444, flags pktCksum, security index 2,
>> > client conn
>> >   rxkad: level clear, flags pktCksum
>> >   Received 271944 bytes in 2518 packets
>> >   Sent 180088696 bytes in 128706 packets
>> >     call 0: # 2518, state dally, mode: receiving, flags: receive_done
>> >     call 1: # 0, state not initialized
>> >     call 2: # 0, state not initialized
>> >     call 3: # 0, state not initialized
>> > Done.
>>
>>
>>While rs156 either hasn't talked to a fileserver recently or at all -- in
>>any case there's no connection.
>>
>>?????  Someone correct me if I'm wrong ... IIRC the connections to 7001 
>>from
>>a given fileserver will time out after a period of non use???
>>
>> >
>> > On a broekn client:
>> >
>> > rsl56:/# rxdebug rsl56 7001
>> > Trying 167.156.154.56 (port 7001):
>> > Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64
>> > not waiting for packets.
>> > 0 calls waiting for a thread
>> > 1 threads are idle
>> > Done.
>> >
>> >
>> > >
>> > >df to see if /afs still appears in the output
>> >
>> > /dev/logsarc     9469952   6327480   34%      244     1% /logs/archive
>> > AFS
>> > df: /afs: No such file or directory
>> >
>>
>>Expected from broken client.
>>
>>
>>Is there anything (like a reboot) that happened .... oh, I see, it looks
>>like you're doing the Sunday default fileserver restarts ... judging from
>>the dates ...
>>
>>rsl57:/usr/afs/local# ps -ef | grep afs
>> > > >     root 17314 40486   0   11 Apr      -  0:00 
>>/usr/afs/bin/fileserver
>> > > >     root 17686 40486   0   11 Apr      -  0:00 
>>/usr/afs/bin/volserver
>> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
>> > > > -stat 2800
>> > > > -dcache 2400 -daemons 5 -volumes 128
>> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
>> > > > -stat 2800
>>
>>
>>How about sending the output from bos status <fileserver> -long ... for 
>>each
>>fileserver -- or at least telling me when the last restart times were for
>>each.
>
>$ bos status -server rsl155 -long
>Bosserver reports inappropriate access on server directories
>Instance fs, (type is fs) currently running normally.
>    Auxiliary status is: file server running.
>    Process last started at Sun Apr 11 04:01:10 2004 (2 proc starts)
>    Command 1 is '/usr/afs/bin/fileserver'
>    Command 2 is '/usr/afs/bin/volserver'
>    Command 3 is '/usr/afs/bin/salvager'
>
>Instance kaserver, (type is simple) currently running normally.
>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>    Command 1 is '/usr/afs/bin/kaserver'
>
>Instance buserver, (type is simple) currently running normally.
>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>    Command 1 is '/usr/afs/bin/buserver'
>
>Instance ptserver, (type is simple) currently running normally.
>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>    Command 1 is '/usr/afs/bin/ptserver'
>
>
>>
>>Is it possible that all your fileservers restarted at the same time, or 
>>that
>>the two fileservers with root.afs.readonly restarted at the same time, or
>>were unavailable at the same time?
>>
>>If so, what were the broken clients doing during the restart?  Might be
>>worth checking uptime on the broken clients to see if they were restarted
>>while the fileservers were restarting.
>All of these are possible but it's unlikely 6 clients would have been 
>restarted at the same time.
>>
>>?????  Do any of the fs commands (fs checkv or fs checks, e.g.) work on 
>>the
>>client?  I don't know if those are blocked when /afs won't mount or not.
>>Anyone???
>
>These ones worked:
># fs checkv
>All volumeID/name mappings checked.
># fs checks
>All servers are running.
>
>>
>>How about fs setcache?  (Gotta be root, and I'd try this on a broken 
>>client
>>you don't care about hurting ... but it might be worth resetting the cache
>>size to 1, waiting for it to purge, then resetting to 0 -- which restores
>>the original size.  fs checkv tells the client to refetch info from the 
>>VLDB
>>instead of trusting its cache, but is irrelevant if the unmounted /afs
>>breaks all the fs commands -- not sure because afsd is running even tho 
>>/afs
>>is broken)
>
># fs setcache 1
>New cache size set.
># fs setcache 0
>New cache size set.
># fs checkv
>All volumeID/name mappings checked.
># cd /afs
>ksh: /afs:  not found.
>
>
>>
>>Possibility ... I'm thinking that the broken clients may have tried to 
>>mount
>>/afs (root.afs.readonly) while the fileservers were restarting.  Couldn't
>>find any root.afs.readonly so failed to mount /afs.  If so, argues for not
>>restarting all the FS at the same time, and for having root.afs.readonly 
>>on
>>each of your X (how many do you have) fileservers.
>>
>>
>>Looks like afsd has been up for almost a year?  While AFS fileserver procs
>>were restarted 11 Apr?  Clients were good on Apr 10 ...
>
>I'm not sure but it looks as if it was OK until the 11th April.
>
>>
>>While we're at it, has anything changed with your AFS DB servers?  Are the
>>CellServDB files correct on clients (/usr/vice/etc) and servers
>>(/usr/afs/etc)?  What does "bos listhosts" report from each of the
>>fileservers?  "fs listc" from the clients?
>
># bos listhosts -server rsl155
>Cell name is uk.dd.com
>    Host 1 is rsl155
>    Host 2 is rsl156
>
>Although if I run this on rsl155 it gives:
>$ bos listhosts -server rsl155
>bos: can't open cell database (/usr/vice/etc)
>eventhough /usr/vice/etc/CellServDB is present.
>
># fs listc
>Cell uk.dd.com on hosts rsl155.dd.com.
>
># cat /usr/vice/etc/CellServDB
>>uk.dd.com   #Cell name
>161.2.249.91    #rsl155.dd.com
>
>>
>>Kim
>>
>
>> > > > Hi,
>> > > >
>> > > > I'm having problems getting in to the /afs directory on an AIX
>> > > > box and I'm
>> > > > not sure how to fix it:
>> > > >
>> > > > # cd /afs
>> > > > ksh: /afs:  not found.
>> > > >
>> > > > This has been running fine until now though. The processes are 
>>still
>> > > > running:
>> > > >
>> > > > rsl57:/usr/afs/local# ps -ef | grep afs
>> > > >     root 17314 40486   0   11 Apr      -  0:00 
>>/usr/afs/bin/fileserver
>> > > >     root 17686 40486   0   11 Apr      -  0:00 
>>/usr/afs/bin/volserver
>> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
>> > > > -stat 2800
>> > > > -dcache 2400 -daemons 5 -volumes 128
>> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
>> > > > -stat 2800
>> > > > -dcache 2400 -daemons 5 -volumes 128
>> > > >     ..
>> > > >     ..
>> > > >     root 40486     1   0   11 Apr      -  0:00 
>>/usr/afs/bin/bosserver
>> > > >
>> > > > And the logs look normal. I can ping the AFS server also.
>> > > >
>> > > > Is there anything else I can try?
>> > > >
>> > > > Thanks for any help.
>> > > >
>> > > > JS.
>
>_________________________________________________________________
>Use MSN Messenger to send music and pics to your friends 
>http://www.msn.co.uk/messenger
>
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info

_________________________________________________________________
Express yourself with cool emoticons - download MSN Messenger today! 
http://www.msn.co.uk/messenger