[OpenAFS] BreakDelayedCallbacks FAILED still an issue
Christopher D. Clausen
cclausen@acm.org
Thu, 27 Apr 2006 14:44:33 -0500
Jeffrey Altman <jaltman@secure-endpoints.com> wrote:
> Jim Rees wrote:
>> The openafs client is not to blame. Something is blocking the
>> callbacks. It's not a nat, because the client is at port 7001. My
>> guess is the Windows firewall. If not, then some other firewall.
>
> one of the bugs that has been fixed in 1.4.1 was that the server would
> continue to attempt to break callbacks on port 7001 even if the client
> moved to a different port number.
>
> If there is no NAT involved in this picture, then as Jim says it
> probably is the Windows firewall. This can be fixed either by the
> user manually adjusting the firewall rules or by installing 1.4.0 or
> 1.4.1 (final) on the machine.
Not sure if this is related or not to the original post; Its happened
twice now so I thought I'd better ask about it:
Client (flexo.acm.uiuc.edu) is Mac OS X 10.3 running the 1.4.1 binary
release from openafs.org (previous time this happened it was 1.4.1-rc8
from openafs.org.)
Server (alnitak.acm.uiuc.edu) is Solaris 10 SPARC running 1.4.1-rc10
(previous time it was running 1.4.1-rc8, I think) that I compiled from
source.
The client has a hardcoded IP of: 128.174.251.23 that is on the same
non-firewalled subnet as the server. The server apparently thinks that
the client has changed IPs (69.112.249.245) probes to find it, can't,
and the client marks the server down and makes all volumes on that
server inaccessible.
Restarting the client had no effect. I had to restart the fs process on
the server to remove the error condition.
Anyone else seen this happen? Or have a better solution than restarting
the fs process if it happens again? FileLog is below:
Thu Apr 27 13:59:13 2006 MultiProbe failed to find new address for host
69.112.249.245:7001
Thu Apr 27 13:59:20 2006 CB: Call back connect back failed (in break
delayed) for Host 69.112.249.245:7001
Thu Apr 27 13:59:20 2006 BreakDelayedCallbacks FAILED for host
69.112.249.245:7001 which IS UP. Connection from 128.174.251.23:7001.
Possible network or routing failure.
Thu Apr 27 13:59:20 2006 MultiProbe failed to find new address for host
69.112.249.245:7001
Thu Apr 27 14:02:20 2006 CB: Call back connect back failed (in break
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:02:20 2006 BreakDelayedCallbacks FAILED for host
69.112.249.245:7001 which IS UP. Connection from 128.174.251.23:7001.
Possible network or routing failure.
Thu Apr 27 14:02:20 2006 MultiProbe failed to find new address for host
69.112.249.245:7001
Thu Apr 27 14:06:56 2006 CB: WhoAreYou failed for 69.112.249.245:7001,
error -01
Thu Apr 27 14:07:03 2006 CB: Call back connect back failed (in break
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:07:03 2006 BreakDelayedCallbacks FAILED for host
69.112.249.245:7001 which IS UP. Connection from 128.174.251.23:7001.
Possible network or routing failure.
Thu Apr 27 14:07:03 2006 MultiProbe failed to find new address for host
69.112.249.245:7001
Thu Apr 27 14:09:05 2006 CB: WhoAreYou failed for 69.112.249.245:7001,
error -01
Thu Apr 27 14:09:12 2006 CB: Call back connect back failed (in break
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:09:12 2006 BreakDelayedCallbacks FAILED for host
69.112.249.245:7001 which IS UP. Connection from 128.174.251.23:7001.
Possible network or routing failure.
And yes, I am in the process of upgrading to the 1.4.1 release right now
on our servers.
<<CDC
--
Christopher D. Clausen
ACM@UIUC SysAdmin