[OpenAFS] /afs does not exist

Dexter 'Kim' Kimball dhk@ccre.com
Wed, 14 Apr 2004 08:22:04 -0600


Hi JS,

> FSYNC_clientInit temporary failure (will retry): Connection refused

The fileserver "attaches" volumes when it starts.

???? Anyone:  I believe this means the fileserver reads the volheader and
makes sure the contents of the header make sense ..

Anyway, the volserver waits for the fileserver to complete this "attachment"
process and then "syncs" with the fileserver -- getting, IIRC, a list of
attached volumes from the fileserver.

Until the fileserver is finished forming attachments the volserver (which is
already running) gives the FSYNC_clientInit msg

IOW, not to worry -- normal message unless it doesn't go away a while after
the fileserver starts.  If my description is correct, the fileserver attach
time will be proportional to the number of volumes to be attached.

Kim


> -----Original Message-----
> From: openafs-info-admin@openafs.org
> [mailto:openafs-info-admin@openafs.org]On Behalf Of J S
> Sent: Wednesday, April 14, 2004 4:30 AM
> To: openafs-info@openafs.org
> Subject: RE: [OpenAFS] /afs does not exist
>
>
> More Info:
>
> I logged on to rsl155 and the BosLog is reporting the following:
>
> FSYNC_clientInit temporary failure (will retry): Connection refused
>
> Do you know what this means?
>
> >Hi Kim,
> >
> >Rebooted some of the bad clients last night to no avail. I have
> added some
> >answers to your questions below.
> >I think that the clients may have stopped working on 11 April
> actually (not
> >10 April).
> >
> >Thanks for your help on this.
> >
> >JS.
> >>
> >> >
> >> > >Is /afs available on other AFS clients?  That rules out some
> >> > possibilities.
> >> > >
> >> >
> >> > Yes, on some of them. 50% have this problem though.
> >>
> >>That's interesting.  One I can understand.  50% gives me a pause ...
> >>unless
> >>of course you only have 2 clients :)
> >
> >I've discovered at least 6 clients with this problem so far :(
> >>
> >> >
> >> > >I'd go to a well-behaved client, cd /afs, fs flushv, cd
> /afs and see
> >>if
> >> > >/afs
> >> > >is still available on that client.
> >> >
> >> > Yes, tried this and /afs was still available.
> >> >
> >> > >
> >>
> >> > vos exa root.afs gave (on well behaved client):
> >> >
> >> >
> >> > vos exa root.afs.readonly gace:
> >> > >
> >>
> >>These are both fine.
> >>
> >>I do suggest creating an RO on   "server rsl155 partition /vicepa RW
> >>ite"  -- same server, same partition as RW -- doesn't cost much
> since such
> >>a
> >>RO is COW/clone and will allow rsl155 to be used as an RO failover site.
> >>(Client won't fail over to the RW if it's supposed to get an
> RO, and when
> >>root.afs is replicated the client will be looking for an RO.)
> >Mmm not sure how to do this? I'm not exactly an AFS expert as you prob
> >guessed!
> >
> >>
> >>Are you able to "vos listvl root.afs" from a good client/bad client?
> >
> >Bad client:
> >$ vos listvl root.afs
> >vsu_ClientInit: Could not process files in configuration directory
> >(/usr/vice/etc).
> >could not initialize VLDB library (code=4294967295)
> >
> >Good client:
> >$ vos listvl root.afs
> >
> >root.afs
> >    RWrite: 536870915     ROnly: 536870916
> >    number of sites -> 3
> >       server rsl155 partition /vicepa RW Site
> >       server rsl156 partition /vicepa RO Site
> >       server rsl59 partition /vicepa RO Site
> >
> >>
> >> > >
> >> > >rxdebug <hostname> 7001    will give you some info about
> activity on
> >>the
> >> > >AFS
> >> > >client's callback port
> >> >
> >> > On a working client:
> >>
> >>Nothing conclusive here, but nothing unexpected either.  rs155
> has talked
> >>to
> >>a fileserver (apparently itself) and still has an open connection.
> >>
> >> >
> >> > rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001
> >> > Trying 167.156.154.55 (port 7001):
> >> > Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64
> >> > not waiting for packets.
> >> > 0 calls waiting for a thread
> >> > 1 threads are idle
> >> > Connection from host 167.156.154.55, port 7000, Cuid
> 9915c0ac/1817cfe8
> >> >   serial 128760,  natMTU 1444, flags pktCksum, security index 2,
> >> > client conn
> >> >   rxkad: level clear, flags pktCksum
> >> >   Received 271944 bytes in 2518 packets
> >> >   Sent 180088696 bytes in 128706 packets
> >> >     call 0: # 2518, state dally, mode: receiving, flags: receive_done
> >> >     call 1: # 0, state not initialized
> >> >     call 2: # 0, state not initialized
> >> >     call 3: # 0, state not initialized
> >> > Done.
> >>
> >>
> >>While rs156 either hasn't talked to a fileserver recently or at
> all -- in
> >>any case there's no connection.
> >>
> >>?????  Someone correct me if I'm wrong ... IIRC the connections to 7001
> >>from
> >>a given fileserver will time out after a period of non use???
> >>
> >> >
> >> > On a broekn client:
> >> >
> >> > rsl56:/# rxdebug rsl56 7001
> >> > Trying 167.156.154.56 (port 7001):
> >> > Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64
> >> > not waiting for packets.
> >> > 0 calls waiting for a thread
> >> > 1 threads are idle
> >> > Done.
> >> >
> >> >
> >> > >
> >> > >df to see if /afs still appears in the output
> >> >
> >> > /dev/logsarc     9469952   6327480   34%      244     1%
> /logs/archive
> >> > AFS
> >> > df: /afs: No such file or directory
> >> >
> >>
> >>Expected from broken client.
> >>
> >>
> >>Is there anything (like a reboot) that happened .... oh, I see, it looks
> >>like you're doing the Sunday default fileserver restarts ...
> judging from
> >>the dates ...
> >>
> >>rsl57:/usr/afs/local# ps -ef | grep afs
> >> > > >     root 17314 40486   0   11 Apr      -  0:00
> >>/usr/afs/bin/fileserver
> >> > > >     root 17686 40486   0   11 Apr      -  0:00
> >>/usr/afs/bin/volserver
> >> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
> >> > > > -stat 2800
> >> > > > -dcache 2400 -daemons 5 -volumes 128
> >> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
> >> > > > -stat 2800
> >>
> >>
> >>How about sending the output from bos status <fileserver> -long ... for
> >>each
> >>fileserver -- or at least telling me when the last restart
> times were for
> >>each.
> >
> >$ bos status -server rsl155 -long
> >Bosserver reports inappropriate access on server directories
> >Instance fs, (type is fs) currently running normally.
> >    Auxiliary status is: file server running.
> >    Process last started at Sun Apr 11 04:01:10 2004 (2 proc starts)
> >    Command 1 is '/usr/afs/bin/fileserver'
> >    Command 2 is '/usr/afs/bin/volserver'
> >    Command 3 is '/usr/afs/bin/salvager'
> >
> >Instance kaserver, (type is simple) currently running normally.
> >    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
> >    Command 1 is '/usr/afs/bin/kaserver'
> >
> >Instance buserver, (type is simple) currently running normally.
> >    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
> >    Command 1 is '/usr/afs/bin/buserver'
> >
> >Instance ptserver, (type is simple) currently running normally.
> >    Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
> >    Command 1 is '/usr/afs/bin/ptserver'
> >
> >
> >>
> >>Is it possible that all your fileservers restarted at the same time, or
> >>that
> >>the two fileservers with root.afs.readonly restarted at the
> same time, or
> >>were unavailable at the same time?
> >>
> >>If so, what were the broken clients doing during the restart?  Might be
> >>worth checking uptime on the broken clients to see if they were
> restarted
> >>while the fileservers were restarting.
> >All of these are possible but it's unlikely 6 clients would have been
> >restarted at the same time.
> >>
> >>?????  Do any of the fs commands (fs checkv or fs checks, e.g.) work on
> >>the
> >>client?  I don't know if those are blocked when /afs won't mount or not.
> >>Anyone???
> >
> >These ones worked:
> ># fs checkv
> >All volumeID/name mappings checked.
> ># fs checks
> >All servers are running.
> >
> >>
> >>How about fs setcache?  (Gotta be root, and I'd try this on a broken
> >>client
> >>you don't care about hurting ... but it might be worth
> resetting the cache
> >>size to 1, waiting for it to purge, then resetting to 0 --
> which restores
> >>the original size.  fs checkv tells the client to refetch info from the
> >>VLDB
> >>instead of trusting its cache, but is irrelevant if the unmounted /afs
> >>breaks all the fs commands -- not sure because afsd is running even tho
> >>/afs
> >>is broken)
> >
> ># fs setcache 1
> >New cache size set.
> ># fs setcache 0
> >New cache size set.
> ># fs checkv
> >All volumeID/name mappings checked.
> ># cd /afs
> >ksh: /afs:  not found.
> >
> >
> >>
> >>Possibility ... I'm thinking that the broken clients may have tried to
> >>mount
> >>/afs (root.afs.readonly) while the fileservers were restarting.
>  Couldn't
> >>find any root.afs.readonly so failed to mount /afs.  If so,
> argues for not
> >>restarting all the FS at the same time, and for having
> root.afs.readonly
> >>on
> >>each of your X (how many do you have) fileservers.
> >>
> >>
> >>Looks like afsd has been up for almost a year?  While AFS
> fileserver procs
> >>were restarted 11 Apr?  Clients were good on Apr 10 ...
> >
> >I'm not sure but it looks as if it was OK until the 11th April.
> >
> >>
> >>While we're at it, has anything changed with your AFS DB
> servers?  Are the
> >>CellServDB files correct on clients (/usr/vice/etc) and servers
> >>(/usr/afs/etc)?  What does "bos listhosts" report from each of the
> >>fileservers?  "fs listc" from the clients?
> >
> ># bos listhosts -server rsl155
> >Cell name is uk.dd.com
> >    Host 1 is rsl155
> >    Host 2 is rsl156
> >
> >Although if I run this on rsl155 it gives:
> >$ bos listhosts -server rsl155
> >bos: can't open cell database (/usr/vice/etc)
> >eventhough /usr/vice/etc/CellServDB is present.
> >
> ># fs listc
> >Cell uk.dd.com on hosts rsl155.dd.com.
> >
> ># cat /usr/vice/etc/CellServDB
> >>uk.dd.com   #Cell name
> >161.2.249.91    #rsl155.dd.com
> >
> >>
> >>Kim
> >>
> >
> >> > > > Hi,
> >> > > >
> >> > > > I'm having problems getting in to the /afs directory on an AIX
> >> > > > box and I'm
> >> > > > not sure how to fix it:
> >> > > >
> >> > > > # cd /afs
> >> > > > ksh: /afs:  not found.
> >> > > >
> >> > > > This has been running fine until now though. The processes are
> >>still
> >> > > > running:
> >> > > >
> >> > > > rsl57:/usr/afs/local# ps -ef | grep afs
> >> > > >     root 17314 40486   0   11 Apr      -  0:00
> >>/usr/afs/bin/fileserver
> >> > > >     root 17686 40486   0   11 Apr      -  0:00
> >>/usr/afs/bin/volserver
> >> > > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
> >> > > > -stat 2800
> >> > > > -dcache 2400 -daemons 5 -volumes 128
> >> > > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
> >> > > > -stat 2800
> >> > > > -dcache 2400 -daemons 5 -volumes 128
> >> > > >     ..
> >> > > >     ..
> >> > > >     root 40486     1   0   11 Apr      -  0:00
> >>/usr/afs/bin/bosserver
> >> > > >
> >> > > > And the logs look normal. I can ping the AFS server also.
> >> > > >
> >> > > > Is there anything else I can try?
> >> > > >
> >> > > > Thanks for any help.
> >> > > >
> >> > > > JS.
> >
> >_________________________________________________________________
> >Use MSN Messenger to send music and pics to your friends
> >http://www.msn.co.uk/messenger
> >
> >_______________________________________________
> >OpenAFS-info mailing list
> >OpenAFS-info@openafs.org
> >https://lists.openafs.org/mailman/listinfo/openafs-info
>
> _________________________________________________________________
> Express yourself with cool emoticons - download MSN Messenger today!
> http://www.msn.co.uk/messenger
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>