[OpenAFS] /afs does not exist

Dexter 'Kim' Kimball dhk@ccre.com
Tue, 13 Apr 2004 11:07:08 -0600




>
> >Is /afs available on other AFS clients?  That rules out some
> possibilities.
> >
>
> Yes, on some of them. 50% have this problem though.

That's interesting.  One I can understand.  50% gives me a pause ... unless
of course you only have 2 clients :)

>
> >I'd go to a well-behaved client, cd /afs, fs flushv, cd /afs and see if
> >/afs
> >is still available on that client.
>
> Yes, tried this and /afs was still available.
>
> >

> vos exa root.afs gave (on well behaved client):
>
>
> vos exa root.afs.readonly gace:
> >

These are both fine.

I do suggest creating an RO on   "server rsl155 partition /vicepa RW
ite"  -- same server, same partition as RW -- doesn't cost much since such a
RO is COW/clone and will allow rsl155 to be used as an RO failover site.
(Client won't fail over to the RW if it's supposed to get an RO, and when
root.afs is replicated the client will be looking for an RO.)

Are you able to "vos listvl root.afs" from a good client/bad client?

> >
> >rxdebug <hostname> 7001    will give you some info about activity on the
> >AFS
> >client's callback port
>
> On a working client:

Nothing conclusive here, but nothing unexpected either.  rs155 has talked to
a fileserver (apparently itself) and still has an open connection.

>
> rsl55:/afs/.uk.baplc.com# rxdebug rsl55 7001
> Trying 167.156.154.55 (port 7001):
> Free packets: 130, packet reclaims: 0, calls: 101338, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> Connection from host 167.156.154.55, port 7000, Cuid 9915c0ac/1817cfe8
>   serial 128760,  natMTU 1444, flags pktCksum, security index 2,
> client conn
>   rxkad: level clear, flags pktCksum
>   Received 271944 bytes in 2518 packets
>   Sent 180088696 bytes in 128706 packets
>     call 0: # 2518, state dally, mode: receiving, flags: receive_done
>     call 1: # 0, state not initialized
>     call 2: # 0, state not initialized
>     call 3: # 0, state not initialized
> Done.


While rs156 either hasn't talked to a fileserver recently or at all -- in
any case there's no connection.

?????  Someone correct me if I'm wrong ... IIRC the connections to 7001 from
a given fileserver will time out after a period of non use???

>
> On a broekn client:
>
> rsl56:/# rxdebug rsl56 7001
> Trying 167.156.154.56 (port 7001):
> Free packets: 130, packet reclaims: 0, calls: 79437, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> Done.
>
>
> >
> >df to see if /afs still appears in the output
>
> /dev/logsarc     9469952   6327480   34%      244     1% /logs/archive
> AFS
> df: /afs: No such file or directory
>

Expected from broken client.

>
> >

> >When was the client last confirmed working correctly?
>
> It was working until 10th April.
>

Is there anything (like a reboot) that happened .... oh, I see, it looks
like you're doing the Sunday default fileserver restarts ... judging from
the dates ...

rsl57:/usr/afs/local# ps -ef | grep afs
> > >     root 17314 40486   0   11 Apr      -  0:00 /usr/afs/bin/fileserver
> > >     root 17686 40486   0   11 Apr      -  0:00 /usr/afs/bin/volserver
> > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
> > > -stat 2800


How about sending the output from bos status <fileserver> -long ... for each
fileserver -- or at least telling me when the last restart times were for
each.

Is it possible that all your fileservers restarted at the same time, or that
the two fileservers with root.afs.readonly restarted at the same time, or
were unavailable at the same time?

If so, what were the broken clients doing during the restart?  Might be
worth checking uptime on the broken clients to see if they were restarted
while the fileservers were restarting.

?????  Do any of the fs commands (fs checkv or fs checks, e.g.) work on the
client?  I don't know if those are blocked when /afs won't mount or not.
Anyone???

How about fs setcache?  (Gotta be root, and I'd try this on a broken client
you don't care about hurting ... but it might be worth resetting the cache
size to 1, waiting for it to purge, then resetting to 0 -- which restores
the original size.  fs checkv tells the client to refetch info from the VLDB
instead of trusting its cache, but is irrelevant if the unmounted /afs
breaks all the fs commands -- not sure because afsd is running even tho /afs
is broken)

Possibility ... I'm thinking that the broken clients may have tried to mount
/afs (root.afs.readonly) while the fileservers were restarting.  Couldn't
find any root.afs.readonly so failed to mount /afs.  If so, argues for not
restarting all the FS at the same time, and for having root.afs.readonly on
each of your X (how many do you have) fileservers.


Looks like afsd has been up for almost a year?  While AFS fileserver procs
were restarted 11 Apr?  Clients were good on Apr 10 ...

While we're at it, has anything changed with your AFS DB servers?  Are the
CellServDB files correct on clients (/usr/vice/etc) and servers
(/usr/afs/etc)?  What does "bos listhosts" report from each of the
fileservers?  "fs listc" from the clients?

Kim

> >
> >Kim
> >
> >
> >=================================
> >Kim (Dexter) Kimball
> >CCRE, Inc.
> >afsinfo at ccre dot com
> >
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: openafs-info-admin@openafs.org
> > > [mailto:openafs-info-admin@openafs.org]On Behalf Of J S
> > > Sent: Tuesday, April 13, 2004 4:53 AM
> > > To: openafs-info@openafs.org
> > > Subject: [OpenAFS] /afs does not exist
> > >
> > >
> > > Hi,
> > >
> > > I'm having problems getting in to the /afs directory on an AIX
> > > box and I'm
> > > not sure how to fix it:
> > >
> > > # cd /afs
> > > ksh: /afs:  not found.
> > >
> > > This has been running fine until now though. The processes are still
> > > running:
> > >
> > > rsl57:/usr/afs/local# ps -ef | grep afs
> > >     root 17314 40486   0   11 Apr      -  0:00 /usr/afs/bin/fileserver
> > >     root 17686 40486   0   11 Apr      -  0:00 /usr/afs/bin/volserver
> > >     root 20134     1   0   08 May      - 17:24 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > >     root 20384     1   0   08 May      - 17:23 /usr/vice/etc/afsd
> > > -stat 2800
> > > -dcache 2400 -daemons 5 -volumes 128
> > >     ..
> > >     ..
> > >     root 40486     1   0   11 Apr      -  0:00 /usr/afs/bin/bosserver
> > >
> > > And the logs look normal. I can ping the AFS server also.
> > >
> > > Is there anything else I can try?
> > >
> > > Thanks for any help.
> > >
> > > JS.
> > >
> > > _________________________________________________________________
> > > It's fast, it's easy and it's free. Get MSN Messenger today!
> > > http://www.msn.co.uk/messenger
> > >
> > > _______________________________________________
> > > OpenAFS-info mailing list
> > > OpenAFS-info@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-info
> > >
> >
> >
> >_______________________________________________
> >OpenAFS-info mailing list
> >OpenAFS-info@openafs.org
> >https://lists.openafs.org/mailman/listinfo/openafs-info
>
> _________________________________________________________________
> It's fast, it's easy and it's free. Get MSN Messenger today!
> http://www.msn.co.uk/messenger
>
>
>