[OpenAFS] BreakDelayedCallbacks FAILED still an issue

Christopher D. Clausen cclausen@acm.org
Thu, 27 Apr 2006 14:44:33 -0500


Jeffrey Altman <jaltman@secure-endpoints.com> wrote:
> Jim Rees wrote:
>> The openafs client is not to blame.  Something is blocking the
>> callbacks. It's not a nat, because the client is at port 7001.  My
>> guess is the Windows firewall.  If not, then some other firewall.
>
> one of the bugs that has been fixed in 1.4.1 was that the server would
> continue to attempt to break callbacks on port 7001 even if the client
> moved to a different port number.
>
> If there is no NAT involved in this picture, then as Jim says it
> probably is the Windows firewall.    This can be fixed either by the
> user manually adjusting the firewall rules or by installing 1.4.0 or
> 1.4.1 (final) on the machine.

Not sure if this is related or not to the original post; Its happened 
twice now so I thought I'd better ask about it:

Client (flexo.acm.uiuc.edu) is Mac OS X 10.3 running the 1.4.1 binary 
release from openafs.org (previous time this happened it was 1.4.1-rc8 
from openafs.org.)

Server (alnitak.acm.uiuc.edu) is Solaris 10 SPARC running 1.4.1-rc10 
(previous time it was running 1.4.1-rc8, I think) that I compiled from 
source.

The client has a hardcoded IP of: 128.174.251.23 that is on the same 
non-firewalled subnet as the server.  The server apparently thinks that 
the client has changed IPs (69.112.249.245) probes to find it, can't, 
and the client marks the server down and makes all volumes on that 
server inaccessible.

Restarting the client had no effect.  I had to restart the fs process on 
the server to remove the error condition.

Anyone else seen this happen?  Or have a better solution than restarting 
the fs process if it happens again?  FileLog is below:

Thu Apr 27 13:59:13 2006 MultiProbe failed to find new address for host 
69.112.249.245:7001
Thu Apr 27 13:59:20 2006 CB: Call back connect back failed (in break 
delayed) for Host 69.112.249.245:7001
Thu Apr 27 13:59:20 2006 BreakDelayedCallbacks FAILED for host 
69.112.249.245:7001 which IS UP.  Connection from 128.174.251.23:7001. 
Possible network or routing failure.
Thu Apr 27 13:59:20 2006 MultiProbe failed to find new address for host 
69.112.249.245:7001
Thu Apr 27 14:02:20 2006 CB: Call back connect back failed (in break 
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:02:20 2006 BreakDelayedCallbacks FAILED for host 
69.112.249.245:7001 which IS UP.  Connection from 128.174.251.23:7001. 
Possible network or routing failure.
Thu Apr 27 14:02:20 2006 MultiProbe failed to find new address for host 
69.112.249.245:7001
Thu Apr 27 14:06:56 2006 CB: WhoAreYou failed for 69.112.249.245:7001, 
error -01
Thu Apr 27 14:07:03 2006 CB: Call back connect back failed (in break 
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:07:03 2006 BreakDelayedCallbacks FAILED for host 
69.112.249.245:7001 which IS UP.  Connection from 128.174.251.23:7001. 
Possible network or routing failure.
Thu Apr 27 14:07:03 2006 MultiProbe failed to find new address for host 
69.112.249.245:7001
Thu Apr 27 14:09:05 2006 CB: WhoAreYou failed for 69.112.249.245:7001, 
error -01
Thu Apr 27 14:09:12 2006 CB: Call back connect back failed (in break 
delayed) for Host 69.112.249.245:7001
Thu Apr 27 14:09:12 2006 BreakDelayedCallbacks FAILED for host 
69.112.249.245:7001 which IS UP.  Connection from 128.174.251.23:7001. 
Possible network or routing failure.

And yes, I am in the process of upgrading to the 1.4.1 release right now 
on our servers.

<<CDC
-- 
Christopher D. Clausen
ACM@UIUC SysAdmin