[OpenAFS] /afs does not exist
J S
vervoom@hotmail.com
Wed, 14 Apr 2004 09:13:50 +0000
Hi Kim,
Rebooted some of the bad clients last night to no avail. I have added some
answers to your questions below.
I think that the clients may have stopped working on 11 April actually (not
10 April).
Thanks for your help on this.
JS.
>
> >
> > >Is /afs available on other AFS clients? That rules out some
> > possibilities.
> > >
> >
> > Yes, on some of them. 50% have this problem though.
>
>That's interesting. One I can understand. 50% gives me a pause ... unless
>of course you only have 2 clients :)
I've discovered at least 6 clients with this problem so far :(
>
> >
> > >I'd go to a well-behaved client, cd /afs, fs flushv, cd /afs and see if
> > >/afs
> > >is still available on that client.
> >
> > Yes, tried this and /afs was still available.
> >
> > >
>
> > vos exa root.afs gave (on well behaved client):
> >
> >
> > vos exa root.afs.readonly gace:
> > >
>
>These are both fine.
>
>I do suggest creating an RO on "server rsl155 partition /vicepa RW
>ite" -- same server, same partition as RW -- doesn't cost much since such
>a
>RO is COW/clone and will allow rsl155 to be used as an RO failover site.
>(Client won't fail over to the RW if it's supposed to get an RO, and when
>root.afs is replicated the client will be looking for an RO.)
Mmm not sure how to do this? I'm not exactly an AFS expert as you prob
guessed!
>
>Are you able to "vos listvl root.afs" from a good client/bad client?
Bad client:
$ vos listvl root.afs
vsu_ClientInit: Could not process files in configuration directory
(/usr/vice/etc).
could not initialize VLDB library (code=4294967295)
Good client:
$ vos listvl root.afs
root.afs
RWrite: 536870915 ROnly: 536870916
number of sites -> 3
server rsl155 partition /vicepa RW Site
server rsl156 partition /vicepa RO Site
server rsl59 partition /vicepa RO Site
>
> > >
> > >rxdebug <hostname> 7001 will give you some info about activity on
>the
> > >AFS
> > >client's callback port
> >
> > On a working client:
>
>Nothing conclusive here, but nothing unexpected either. rs155 has talked
>to
>a fileserver (apparently itself) and still has an open connection.
>
> >
> > rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001
> > Trying 167.156.154.55 (port 7001):
> > Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64
> > not waiting for packets.
> > 0 calls waiting for a thread
> > 1 threads are idle
> > Connection from host 167.156.154.55, port 7000, Cuid 9915c0ac/1817cfe8
> > serial 128760, natMTU 1444, flags pktCksum, security index 2,
> > client conn
> > rxkad: level clear, flags pktCksum
> > Received 271944 bytes in 2518 packets
> > Sent 180088696 bytes in 128706 packets
> > call 0: # 2518, state dally, mode: receiving, flags: receive_done
> > call 1: # 0, state not initialized
> > call 2: # 0, state not initialized
> > call 3: # 0, state not initialized
> > Done.
>
>
>While rs156 either hasn't talked to a fileserver recently or at all -- in
>any case there's no connection.
>
>????? Someone correct me if I'm wrong ... IIRC the connections to 7001
>from
>a given fileserver will time out after a period of non use???
>
> >
> > On a broekn client:
> >
> > rsl56:/# rxdebug rsl56 7001
> > Trying 167.156.154.56 (port 7001):
> > Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64
> > not waiting for packets.
> > 0 calls waiting for a thread
> > 1 threads are idle
> > Done.
> >
> >
> > >
> > >df to see if /afs still appears in the output
> >
> > /dev/logsarc 9469952 6327480 34% 244 1% /logs/archive
> > AFS
> > df: /afs: No such file or directory
> >
>
>Expected from broken client.
>
>
>Is there anything (like a reboot) that happened .... oh, I see, it looks
>like you're doing the Sunday default fileserver restarts ... judging from
>the dates ...
>
>rsl57:/usr/afs/local# ps -ef | grep afs
> > > > root 17314 40486 0 11 Apr - 0:00
>/usr/afs/bin/fileserver
> > > > root 17686 40486 0 11 Apr - 0:00
>/usr/afs/bin/volserver
> > > > root 20134 1 0 08 May - 17:24 /usr/vice/etc/afsd
> > > > -stat 2800
> > > > -dcache 2400 -daemons 5 -volumes 128
> > > > root 20384 1 0 08 May - 17:23 /usr/vice/etc/afsd
> > > > -stat 2800
>
>
>How about sending the output from bos status <fileserver> -long ... for
>each
>fileserver -- or at least telling me when the last restart times were for
>each.
$ bos status -server rsl155 -long
Bosserver reports inappropriate access on server directories
Instance fs, (type is fs) currently running normally.
Auxiliary status is: file server running.
Process last started at Sun Apr 11 04:01:10 2004 (2 proc starts)
Command 1 is '/usr/afs/bin/fileserver'
Command 2 is '/usr/afs/bin/volserver'
Command 3 is '/usr/afs/bin/salvager'
Instance kaserver, (type is simple) currently running normally.
Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
Command 1 is '/usr/afs/bin/kaserver'
Instance buserver, (type is simple) currently running normally.
Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
Command 1 is '/usr/afs/bin/buserver'
Instance ptserver, (type is simple) currently running normally.
Process last started at Sun Apr 11 04:01:10 2004 (1 proc starts)
Command 1 is '/usr/afs/bin/ptserver'
>
>Is it possible that all your fileservers restarted at the same time, or
>that
>the two fileservers with root.afs.readonly restarted at the same time, or
>were unavailable at the same time?
>
>If so, what were the broken clients doing during the restart? Might be
>worth checking uptime on the broken clients to see if they were restarted
>while the fileservers were restarting.
All of these are possible but it's unlikely 6 clients would have been
restarted at the same time.
>
>????? Do any of the fs commands (fs checkv or fs checks, e.g.) work on the
>client? I don't know if those are blocked when /afs won't mount or not.
>Anyone???
These ones worked:
# fs checkv
All volumeID/name mappings checked.
# fs checks
All servers are running.
>
>How about fs setcache? (Gotta be root, and I'd try this on a broken client
>you don't care about hurting ... but it might be worth resetting the cache
>size to 1, waiting for it to purge, then resetting to 0 -- which restores
>the original size. fs checkv tells the client to refetch info from the
>VLDB
>instead of trusting its cache, but is irrelevant if the unmounted /afs
>breaks all the fs commands -- not sure because afsd is running even tho
>/afs
>is broken)
# fs setcache 1
New cache size set.
# fs setcache 0
New cache size set.
# fs checkv
All volumeID/name mappings checked.
# cd /afs
ksh: /afs: not found.
>
>Possibility ... I'm thinking that the broken clients may have tried to
>mount
>/afs (root.afs.readonly) while the fileservers were restarting. Couldn't
>find any root.afs.readonly so failed to mount /afs. If so, argues for not
>restarting all the FS at the same time, and for having root.afs.readonly on
>each of your X (how many do you have) fileservers.
>
>
>Looks like afsd has been up for almost a year? While AFS fileserver procs
>were restarted 11 Apr? Clients were good on Apr 10 ...
I'm not sure but it looks as if it was OK until the 11th April.
>
>While we're at it, has anything changed with your AFS DB servers? Are the
>CellServDB files correct on clients (/usr/vice/etc) and servers
>(/usr/afs/etc)? What does "bos listhosts" report from each of the
>fileservers? "fs listc" from the clients?
# bos listhosts -server rsl155
Cell name is uk.dd.com
Host 1 is rsl155
Host 2 is rsl156
Although if I run this on rsl155 it gives:
$ bos listhosts -server rsl155
bos: can't open cell database (/usr/vice/etc)
eventhough /usr/vice/etc/CellServDB is present.
# fs listc
Cell uk.dd.com on hosts rsl155.dd.com.
# cat /usr/vice/etc/CellServDB
>uk.dd.com #Cell name
161.2.249.91 #rsl155.dd.com
>
>Kim
>
> > > > Hi,
> > > >
> > > > I'm having problems getting in to the /afs directory on an AIX
> > > > box and I'm
> > > > not sure how to fix it:
> > > >
> > > > # cd /afs
> > > > ksh: /afs: not found.
> > > >
> > > > This has been running fine until now though. The processes are still
> > > > running:
> > > >
> > > > rsl57:/usr/afs/local# ps -ef | grep afs
> > > > root 17314 40486 0 11 Apr - 0:00
>/usr/afs/bin/fileserver
> > > > root 17686 40486 0 11 Apr - 0:00
>/usr/afs/bin/volserver
> > > > root 20134 1 0 08 May - 17:24 /usr/vice/etc/afsd
> > > > -stat 2800
> > > > -dcache 2400 -daemons 5 -volumes 128
> > > > root 20384 1 0 08 May - 17:23 /usr/vice/etc/afsd
> > > > -stat 2800
> > > > -dcache 2400 -daemons 5 -volumes 128
> > > > ..
> > > > ..
> > > > root 40486 1 0 11 Apr - 0:00
>/usr/afs/bin/bosserver
> > > >
> > > > And the logs look normal. I can ping the AFS server also.
> > > >
> > > > Is there anything else I can try?
> > > >
> > > > Thanks for any help.
> > > >
> > > > JS.
_________________________________________________________________
Use MSN Messenger to send music and pics to your friends
http://www.msn.co.uk/messenger