[OpenAFS-devel] "Lost contact with file server" problems
Lyle
lws@o-o.yi.org
Sun, 21 Aug 2005 03:16:05 -0400
What Derrick Said. =20
You have to leave a packet capture running continuously on the off =
chance
that this might happen... Not just the first error packet, but the last
couple of RPCs just before that. So you really want
1. a network monitor that implements stop triggers. These used to be
rather expensive, but maybe ethereal finally implemented them? I don't
know.
2. a true broadcast network or the ability to tap your switch so you =
don't
have to run monitor software directly on the fileserver. Running =
software
on the client is not likely to be useful unless you can reliably predict
which system will be affected. =20
Wait a sec. At this point, you're thinking you know which system will =
be
affected, it's this one at 192.168.18.34, right? But what I'm saying is =
--
After you reboot that machine, and it comes back up and is running =
normally
for a while, which client will be next to experience this bug? Is it =
always
the same one? Even after reboots? That is new, useful, and surprising
information. =20
My experience was that the affected client would vary and not be
particularly reproducible, which means that you have to monitor a whole =
lot
of connections simultaneously, hence a tap on the switch.
Make sense?
-----Original Message-----
From: openafs-devel-admin@openafs.org
[mailto:openafs-devel-admin@openafs.org] On Behalf Of Derrick J Brashear
Sent: Sunday, August 21, 2005 1:42 AM
To: openafs-devel@openafs.org
Subject: Re: [OpenAFS-devel] "Lost contact with file server" problems
it needs to include the first error packet, e.g. the window where it =
loses=20
contact, to be useful
once it's down, that's not interesting
Derrick