[OpenAFS] one solaris client suddenly can't mount

Derek Atkins warlord@MIT.EDU
31 May 2001 15:04:52 -0400


Oh, right...  I think I know what's going on now.  I'll see if I can
spend some time to come up with a patch.

In short: you're right, the server caches the fact that a server is
down and it becomes challenging or near-impossible to recover from
that (at least until the AFS Server restarts).  I had supplied a
patch that fixed certain cases of this condition, but I bet there
are other cases, too.

Unfortunately I don't know how to reliably reproduce this particular
case, which makes it challenging to test.

-derek

"Stotler, John" <jstotler@quelsys.com> writes:

> Well, the client is up and running again.
> 
> I rebooted the AFS server. It seems that it had somehow (?) cached the fact
> that it could not contact 192.168.0.14 from a hub outage earlier in the day,
> even though all indications through command-line utilities showed that the
> connection was fine.
> 
> I had already deleted all of the cache files, and when the server came back
> up, the solaris client crashed, dumped, and rebooted. When it came back up
> everything was fine.
> 
> Many thanks to all who offered assistance.
> 
> -----Original Message-----
> From: Derek Atkins [mailto:warlord@MIT.EDU]
> Sent: Thursday, May 31, 2001 1:36 PM
> To: Stotler, John
> Cc: openafs-info@openafs.org
> Subject: Re: [OpenAFS] one solaris client suddenly can't mount
> 
> 
> "Stotler, John" <jstotler@quelsys.com> writes:
> 
> > Sorry for the personal reply, it was intended for the list.
> > 
> > This is what I've got in the FileLog:
> > 
> > Thu May 31 09:18:59 2001 CB: RCallBackConnectBack failed for e00a8c0.22811
> > Thu May 31 11:05:23 2001 ProbeUuid failed for host e00a8c0.22811
> 
> Is your client's IP Address 192.168.0.14?  If not, then what machine
> has that IP Address?
> 
> Basically, this log entry is saying that the RCallBackConnectBack RPC
> failed, as did the ProbeUuid callback.  This generally means that the
> server cannot contact your client, for one reason or another.  The
> "22811" is "7001", but byte-order-swapped.
> 
> > This remains greek to me.
> > 
> > I'll have to build the utilities for Solaris, at the moment I have only
> > built them for Linux.
> 
> The OpenAFS build process should have built all the utilities
> automatically.  Look in .../dest/bin/
> 
> -derek
> 
> > -----Original Message-----
> > From: Derek Atkins [mailto:warlord@MIT.EDU]
> > Sent: Thursday, May 31, 2001 1:23 PM
> > To: Stotler, John
> > Cc: openafs-info@openafs.org
> > Subject: Re: [OpenAFS] one solaris client suddenly can't mount
> > 
> > 
> > You should have responded to openafs-info, not to me directly.
> > 
> > Having bos/vos/pts/fs on clients is usually a Good Thing.  If nothing
> > else they can help you debug situations like this.  If there are no
> > messages in the FileLog on the fileserver 192.168.0.59, then I don't
> > know what to say.
> > 
> > You can try clearing your AFS Cache, but I don't know why that would
> > help.  Is socratease.com your home cell?
> > 
> > -derek
> > 
> > "Stotler, John" <jstotler@quelsys.com> writes:
> > 
> > > The server is definitely running, other clients can mount the shared
> > folders
> > > without any problems.
> > > 
> > > I don't have bos installed on the clients, they're al just running the
> > > default installation. I'm not sure which utility programs will run on
> the
> > > clients....
> > > 
> > > -----Original Message-----
> > > From: Derek Atkins [mailto:warlord@MIT.EDU]
> > > Sent: Thursday, May 31, 2001 1:06 PM
> > > To: Stotler, John
> > > Cc: 'openafs-info@openafs.org'
> > > Subject: Re: [OpenAFS] one solaris client suddenly can't mount
> > > 
> > > 
> > > What happens if you 'bos status <server> -no -long'... Is the fileserver
> > > actually running?
> > > 
> > > -derek
> > > 
> > > "Stotler, John" <jstotler@quelsys.com> writes:
> > > 
> > > > I'm stumped on this one.....
> > > > 
> > > > One of my solaris clients (5.8 on sparc) has been working fine for
> > months
> > > > now. We rebooted the box this morning, and now we're getting:
> > > > 
> > > > May 31 10:45:03 db1 afs: [ID 888289 kern.notice] Starting AFS cache
> > > scan...
> > > > May 31 10:45:11 db1 afs: [ID 215355 kern.notice] found 4730 non-empty
> > > cache
> > > > files (94%).
> > > > May 31 10:45:12 db1 afs: [ID 446265 kern.notice] afs: Lost contact
> with
> > > file
> > > > server 1
> > > > 92.168.0.59 in cell socratease.com (all multi-homed ip addresses down
> > for
> > > > the server)
> > > > May 31 10:45:12 db1 last message repeated 1 time
> > > > 
> > > > I can't get the afs mount to pick up. The AFSLog on the client is
> empty,
> > > and
> > > > nothing seems to be logged on the server side either.
> > > > 
> > > > The client machine in question can ping, traceroute and DNS resolve
> the
> > > > server with no lost packets whatsoever.
> > > > 
> > > > Any idea what I need to do?
> > > > _______________________________________________
> > > > OpenAFS-info mailing list
> > > > OpenAFS-info@openafs.org
> > > > https://lists.openafs.org/mailman/listinfo/openafs-info
> > > 
> > > -- 
> > >        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> > >        Member, MIT Student Information Processing Board  (SIPB)
> > >        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
> > >        warlord@MIT.EDU                        PGP key available
> > 
> > -- 
> >        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> >        Member, MIT Student Information Processing Board  (SIPB)
> >        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
> >        warlord@MIT.EDU                        PGP key available
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> -- 
>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>        Member, MIT Student Information Processing Board  (SIPB)
>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>        warlord@MIT.EDU                        PGP key available
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available