[OpenAFS] /afs does not exist

J S vervoom@hotmail.com
Wed, 14 Apr 2004 13:10:29 +0000


I have a bit more information on this now.
The AFS DB server (rsl155) was patched to AIX 4.3 ML 11 last week and as a 
result of this /etc/services got overwritten. We set kerberos to port 88 
(but the default was 750). I have corrected that now and have been told to 
wait 30 mins and see if the afsd reads /etc/services otherwise reboot the 
box.

>
>
>Hi,
>
>how long is the bosserver reporting that??
>
>If it's just a few times. Don't worry that happens when your volserver 
>starts before your fileserver is around. (or was it the other way around??) 
>I'm not quite shure.
>
>If the bosserver does this for a rather long amount of time, (let's say a 
>few hours ;-) ) you might have some other problem. Maybe some name 
>resolving problem from DNS or hosts file.
>
>Horst
>
>On Wednesday, April 14, 2004, at 11:30 AM, J S wrote:
>
>>More Info:
>>
>>I logged on to rsl155 and the BosLog is reporting the following:
>>
>>FSYNC_clientInit temporary failure (will retry): Connection refused
>>
>>Do you know what this means?
>>
>>>Hi Kim,
>>>
>>>Rebooted some of the bad clients last night to no avail. I have added 
>>>some answers to your questions below.
>>>I think that the clients may have stopped working on 11 April actually 
>>>(not 10 April).
>>>
>>>Thanks for your help on this.
>>>
>>>JS.
>>>>
>>>> >
>>>> > >Is /afs available on other AFS clients?  That rules out some
>>>> > possibilities.
>>>> > >
>>>> >
>>>> > Yes, on some of them. 50% have this problem though.
>>>>
>>>>That's interesting.  One I can understand.  50% gives me a pause ... 
>>>>unless
>>>>of course you only have 2 clients :)
>>>
>>>I've discovered at least 6 clients with this problem so far :(
>>>>
>>>> >
>>>> > >I'd go to a well-behaved client, cd /afs, fs flushv, cd /afs and see 
>>>>if
>>>> > >/afs
>>>> > >is still available on that client.
>>>> >
>>>> > Yes, tried this and /afs was still available.
>>>> >
>>>> > >
>>>>
>>>> > vos exa root.afs gave (on well behaved client):
>>>> >
>>>> >
>>>> > vos exa root.afs.readonly gace:
>>>> > >
>>>>
>>>>These are both fine.
>>>>
>>>>I do suggest creating an RO on   "server rsl155 partition /vicepa RW
>>>>ite"  -- same server, same partition as RW -- doesn't cost much since 
>>>>such a
>>>>RO is COW/clone and will allow rsl155 to be used as an RO failover site.
>>>>(Client won't fail over to the RW if it's supposed to get an RO, and 
>>>>when
>>>>root.afs is replicated the client will be looking for an RO.)
>>>Mmm not sure how to do this? I'm not exactly an AFS expert as you prob 
>>>guessed!
>>>
>>>>
>>>>Are you able to "vos listvl root.afs" from a good client/bad client?
>>>
>>>Bad client:
>>>$ vos listvl root.afs
>>>vsu_ClientInit: Could not process files in configuration directory 
>>>(/usr/vice/etc).
>>>could not initialize VLDB library (code=4294967295)
>>>
>>>Good client:
>>>$ vos listvl root.afs
>>>
>>>root.afs
>>>    RWrite: 536870915     ROnly: 536870916
>>>    number of sites -> 3
>>>       server rsl155 partition /vicepa RW Site
>>>       server rsl156 partition /vicepa RO Site
>>>       server rsl59 partition /vicepa RO Site
>>>
>>>>
>>>> > >
>>>> > >rxdebug <hostname> 7001    will give you some info about activity on 
>>>>the
>>>> > >AFS
>>>> > >client's callback port
>>>> >
>>>> > On a working client:
>>>>
>>>>Nothing conclusive here, but nothing unexpected either.  rs155 has 
>>>>talked to
>>>>a fileserver (apparently itself) and still has an open connection.
>>>>
>>>> >
>>>> > rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001
>>>> > Trying 167.156.154.55 (port 7001):
>>>> > Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64
>>>> > not waiting for packets.
>>>> > 0 calls waiting for a thread
>>>> > 1 threads are idle
>>>> > Connection from host 167.156.154.55, port 7000, Cuid 
>>>>9915c0ac/1817cfe8
>>>> >   serial 128760,  natMTU 1444, flags pktCksum, security index 2,
>>>> > client conn
>>>> >   rxkad: level clear, flags pktCksum
>>>> >   Received 271944 bytes in 2518 packets
>>>> >   Sent 180088696 bytes in 128706 packets
>>>> >     call 0: # 2518, state dally, mode: receiving, flags: receive_done
>>>> >     call 1: # 0, state not initialized
>>>> >     call 2: # 0, state not initialized
>>>> >     call 3: # 0, state not initialized
>>>> > Done.
>>>>
>>>>
>>>>While rs156 either hasn't talked to a fileserver recently or at all -- 
>>>>in
>>>>any case there's no connection.
>>>>
>>>>?????  Someone correct me if I'm wrong ... IIRC the connections to 7001 
>>>>from
>>>>a given fileserver will time out after a period of non use???
>>>>
>>>> >
>>>> > On a broekn client:
>>>> >
>>>> > rsl56:/# rxdebug rsl56 7001
>>>> > Trying 167.156.154.56 (port 7001):
>>>> > Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64
>>>> > not waiting for packets.
>>>> > 0 calls waiting for a thread
>>>> > 1 threads are idle
>>>> > Done.
>>>> >
>>>> >
>>>> > >
>>>> > >df to see if /afs still appears in the output
>>>> >
>>>> > /dev/logsarc     9469952   6327480   34%      244     1% 
>>>>/logs/archive
>>>> > AFS
>>>> > df: /afs: No such file or directory
>>>> >
>>>>
>>>>Expected from broken client.
>>>>
>>>>
>>>>Is there anything (like a reboot) that happened .... oh, I see, it looks
>>>>like you're doing the Sunday default fileserver restarts ... judging 
>>>>from
>>>>the dates ...
>>>>
>>>>rsl57:/usr/afs/local# ps -ef | grep afs
>>>> > > >     root 17314 40486   0   11 Apr      -  0:00 
>>>>/usr/afs/bin/fileserver
>>>> > > >     root 17686 40486   0   11 Apr      -  0:00 
>>>>/usr/afs/bin/volserver
>>>> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
>>>> > > > -stat 2800
>>>> > > > -dcache 2400 -daemons 5 -volumes 128
>>>> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
>>>> > > > -stat 2800
>>>>
>>>>
>>>>How about sending the output from bos status <fileserver> -long ... for 
>>>>each
>>>>fileserver -- or at least telling me when the last restart times were 
>>>>for
>>>>each.
>>>
>>>$ bos status -server rsl155 -long
>>>Bosserver reports inappropriate access on server directories
>>>Instance fs, (type is fs) currently running normally.
>>>    Auxiliary status is: file server running.
>>>    Process last started at Sun Apr 11 04:01:10 2004 (2 proc starts)
>>>    Command 1 is '/usr/afs/bin/fileserver'
>>>    Command 2 is '/usr/afs/bin/volserver'
>>>    Command 3 is '/usr/afs/bin/salvager'
>>>
>>>Instance kaserver, (type is simple) currently running normally.
>>>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>>>    Command 1 is '/usr/afs/bin/kaserver'
>>>
>>>Instance buserver, (type is simple) currently running normally.
>>>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>>>    Command 1 is '/usr/afs/bin/buserver'
>>>
>>>Instance ptserver, (type is simple) currently running normally.
>>>    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
>>>    Command 1 is '/usr/afs/bin/ptserver'
>>>
>>>
>>>>
>>>>Is it possible that all your fileservers restarted at the same time, or 
>>>>that
>>>>the two fileservers with root.afs.readonly restarted at the same time, 
>>>>or
>>>>were unavailable at the same time?
>>>>
>>>>If so, what were the broken clients doing during the restart?  Might be
>>>>worth checking uptime on the broken clients to see if they were 
>>>>restarted
>>>>while the fileservers were restarting.
>>>All of these are possible but it's unlikely 6 clients would have been 
>>>restarted at the same time.
>>>>
>>>>?????  Do any of the fs commands (fs checkv or fs checks, e.g.) work on 
>>>>the
>>>>client?  I don't know if those are blocked when /afs won't mount or not.
>>>>Anyone???
>>>
>>>These ones worked:
>>># fs checkv
>>>All volumeID/name mappings checked.
>>># fs checks
>>>All servers are running.
>>>
>>>>
>>>>How about fs setcache?  (Gotta be root, and I'd try this on a broken 
>>>>client
>>>>you don't care about hurting ... but it might be worth resetting the 
>>>>cache
>>>>size to 1, waiting for it to purge, then resetting to 0 -- which 
>>>>restores
>>>>the original size.  fs checkv tells the client to refetch info from the 
>>>>VLDB
>>>>instead of trusting its cache, but is irrelevant if the unmounted /afs
>>>>breaks all the fs commands -- not sure because afsd is running even tho 
>>>>/afs
>>>>is broken)
>>>
>>># fs setcache 1
>>>New cache size set.
>>># fs setcache 0
>>>New cache size set.
>>># fs checkv
>>>All volumeID/name mappings checked.
>>># cd /afs
>>>ksh: /afs:  not found.
>>>
>>>
>>>>
>>>>Possibility ... I'm thinking that the broken clients may have tried to 
>>>>mount
>>>>/afs (root.afs.readonly) while the fileservers were restarting.  
>>>>Couldn't
>>>>find any root.afs.readonly so failed to mount /afs.  If so, argues for 
>>>>not
>>>>restarting all the FS at the same time, and for having root.afs.readonly 
>>>>on
>>>>each of your X (how many do you have) fileservers.
>>>>
>>>>
>>>>Looks like afsd has been up for almost a year?  While AFS fileserver 
>>>>procs
>>>>were restarted 11 Apr?  Clients were good on Apr 10 ...
>>>
>>>I'm not sure but it looks as if it was OK until the 11th April.
>>>
>>>>
>>>>While we're at it, has anything changed with your AFS DB servers?  Are 
>>>>the
>>>>CellServDB files correct on clients (/usr/vice/etc) and servers
>>>>(/usr/afs/etc)?  What does "bos listhosts" report from each of the
>>>>fileservers?  "fs listc" from the clients?
>>>
>>># bos listhosts -server rsl155
>>>Cell name is uk.dd.com
>>>    Host 1 is rsl155
>>>    Host 2 is rsl156
>>>
>>>Although if I run this on rsl155 it gives:
>>>$ bos listhosts -server rsl155
>>>bos: can't open cell database (/usr/vice/etc)
>>>eventhough /usr/vice/etc/CellServDB is present.
>>>
>>># fs listc
>>>Cell uk.dd.com on hosts rsl155.dd.com.
>>>
>>># cat /usr/vice/etc/CellServDB
>>>>uk.dd.com   #Cell name
>>>161.2.249.91    #rsl155.dd.com
>>>
>>>>
>>>>Kim
>>>>
>>>
>>>> > > > Hi,
>>>> > > >
>>>> > > > I'm having problems getting in to the /afs directory on an AIX
>>>> > > > box and I'm
>>>> > > > not sure how to fix it:
>>>> > > >
>>>> > > > # cd /afs
>>>> > > > ksh: /afs:  not found.
>>>> > > >
>>>> > > > This has been running fine until now though. The processes are 
>>>>still
>>>> > > > running:
>>>> > > >
>>>> > > > rsl57:/usr/afs/local# ps -ef | grep afs
>>>> > > >     root 17314 40486   0   11 Apr      -  0:00 
>>>>/usr/afs/bin/fileserver
>>>> > > >     root 17686 40486   0   11 Apr      -  0:00 
>>>>/usr/afs/bin/volserver
>>>> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
>>>> > > > -stat 2800
>>>> > > > -dcache 2400 -daemons 5 -volumes 128
>>>> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
>>>> > > > -stat 2800
>>>> > > > -dcache 2400 -daemons 5 -volumes 128
>>>> > > >     ..
>>>> > > >     ..
>>>> > > >     root 40486     1   0   11 Apr      -  0:00 
>>>>/usr/afs/bin/bosserver
>>>> > > >
>>>> > > > And the logs look normal. I can ping the AFS server also.
>>>> > > >
>>>> > > > Is there anything else I can try?
>>>> > > >
>>>> > > > Thanks for any help.
>>>> > > >
>>>> > > > JS.
>>>
>>>_________________________________________________________________
>>>Use MSN Messenger to send music and pics to your friends 
>>>http://www.msn.co.uk/messenger
>>>
>>>_______________________________________________
>>>OpenAFS-info mailing list
>>>OpenAFS-info@openafs.org
>>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>_________________________________________________________________
>>Express yourself with cool emoticons - download MSN Messenger today! 
>>http://www.msn.co.uk/messenger
>>
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>

_________________________________________________________________
Express yourself with cool emoticons - download MSN Messenger today! 
http://www.msn.co.uk/messenger