[OpenAFS] connection timeout errors
Elliot Peele
ebpeele2@pams.ncsu.edu
26 Jun 2003 15:52:08 -0400
--=-wlOsu8zWLLzlgP5aH0ca
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
On Fri, 2003-06-06 at 10:37, Todd DeSantis wrote:
>=20
>=20
> Hi -
>=20
> Here are some things that you might try/check if you
> suspect that the clients are somehow losing their
> NAT mappings.
>=20
> - set the NAT mapping timeouts to 30 minutes.
> some customers have seen success when going to
> longer timeouts longer than 15 minutes. Try
> 30 minutes and then work your way down until
> you see problems.
Currently the timeouts are set at 30 min. It really isn't viable for me
to keep adjusting the timeouts because I can't keep rebooting my
firewall all the time. As it is now I have to reboot every one to three
weeks to fix the current problem.
> There are RPCs going between the fileserver and
> client every so often even if the client does
> not need to get information on any data. The client
> might send a check every 10 minutes and the same
> with the fileserver. This should be enough to
> keep the NAT mapping around as long as the timeout
> is greater than this number. If a fileserver has
> lots of hosts connecting to it and some drop off the
> network, this 10 minutes cycle can actually be longer
> than 10 minutes, so that is why we increased the time
> to 30 minutes to see if this helps and then work are
> way back down.
> - you can always take a callback dump from the
> fileserver via
This doesn't work for me, because I don't have access to the fileserver
other than through AFS.
> kill -XCPU <fileserver pid>
>=20
> This will create 3 files in /usr/afs/local
> callback.dump data file - don't need
> clients.dump ascii file for client/user connections
> hosts.dump ascii file with host connection data
> this is the file we would be interested
> in
>=20
> In the hosts.dump file, you can search for the hex equivalent of
> your NATed client IPs and look to see what the fileserver thinks
> about this machine. Are there multiple entries for this IP ?
> Did the real IP of the client show up in this list - it shouldn't
> be there. What is the port associated with this entry, 7001 or
> something else ?
>=20
> *** The -XCPU signal will block calls to the fileserver while these
> 3 files are created. If the number of connections/hosts hitting
> this fileserver is large, this can take many minutes to
> complete. You might see clients getting "waiting for busy volum=
e"
> messages when sending this signal. Just a warning here.
>=20
> - Also, you might want to look at the messages in the FileLog regarding
> the hex IP of the clients in question ? They might mention RCallback
> or "possible network or routing" problems. When did these messages
> start
> showing up in the FileLog ? Does that coincide with anything on the
> client machine =3D=3D> a reboot, a NAT firewall reboot, etc.
>=20
> Thanks
>=20
> Todd
Thanks
Elliot
>=20
>=20
>=20
>=20
>=20
> =
=20
> Derek Atkins =
=20
> <warlord@MIT.EDU> To: Elliot Peele <=
ebpeele2@pams.ncsu.edu> =20
> Sent by: cc: openafs-info@o=
penafs.org =20
> openafs-info-admin@ Subject: Re: [OpenAFS] =
connection timeout errors =20
> openafs.org =
=20
> =
=20
> =
=20
> 06/05/2003 11:09 AM =
=20
> =
=20
> =
=20
>=20
>=20
>=20
>=20
> Well, the bug that I was thinking about would occur with the IP/port
> would change (from the vantage point of the fileserver). So,
> rebooting the NAT box would effectively cause this bug (as would any
> other NAT mapping lossage). Is it possible that the affected machines
> are somehow losing their NAT mappings?
>=20
> Without seeing a packet trace it's hard to know what's going on. :(
>=20
> -derek
>=20
> Elliot Peele <ebpeele2@pams.ncsu.edu> writes:
>=20
> > I haven't tried sniffing the trafic to see what exactly is happening
> > yet. If I can get the connection timeouts to reproduce them selves, I'l=
l
> > try it tomorrow.
> >
> > I've noticed that if I reboot the firewall and delete the afs cache on
> > the client machine the problem goes away, but this is not viable option=
.
> >
> > Elliot
> >
> > On Wed, 2003-06-04 at 18:08, Derek Atkins wrote:
> > > Hmm, then I dont know what to suggest to you... AFS behind a NAT is
> > > just... weird. It usually works, but it can get into strange states
> > > sometimes. There were a few bugs in the fileserver where it would
> > > try to callback to the wrong address and fail to get a WhoAreYou
> > > response.
> > >
> > > Have you tried running a network sniffer on both sides of the NAT
> > > box to see what's going on with the failed connections?
> > >
> > > -derek
> > >
> > > Elliot Peele <ebpeele2@pams.ncsu.edu> writes:
> > >
> > > > These are desktop that are 100% of the time behind the NAT.
> > > >
> > > > Elliot
> > > >
> > > > On Wed, 2003-06-04 at 17:30, Derek Atkins wrote:
> > > > > Are these users on laptops or are they _ALWAYS_, 100% behind the
> NAT?
> > > > >
> > > > > -derek
> > > > >
> > > > > Elliot Peele <ebpeele2@pams.ncsu.edu> writes:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I thought I'd try this again worded a bit different and with a
> different
> > > > > > subject. I have several users that keep getting connection
> timeout
> > > > > > errors when trying to access there volumes from behind a
> firewall. I
> > > > > > believe this may be a problem with the udp timeouts. They are
> OpenAFS
> > > > > > clients connecting to Transarc AFS server through an iptables
> NATing
> > > > > > firewall running on Red Hat Linux 7.3 currently with kernel
> > > > > > 2.4.18-24.7.x.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Elliot
>=20
> --
> Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> Member, MIT Student Information Processing Board (SIPB)
> URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
> warlord@MIT.EDU PGP key available
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>=20
>=20
>=20
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--=-wlOsu8zWLLzlgP5aH0ca
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQA++07nmSqoIAXFTXMRAn94AKCkMhUweu5Ua1TzycB0POenkpVOUACglfE0
+SfIyUkenNdACkKEW09CbME=
=X5CX
-----END PGP SIGNATURE-----
--=-wlOsu8zWLLzlgP5aH0ca--