[OpenAFS] Windows client stop working after changing network adapters.

Anders Magnusson ragge@ltu.se
Wed, 25 Feb 2009 18:38:15 +0100


Ok, thanks.  Hm, this was on a Dell Inspiron 8600. 

So the AFS problem is that Windows is resetting its network adapters
each 6 second,
and each time afsd needs to rebind to the MLA?  Ok, strange in that case.

It's also interesting that everything works fine until the first
disconnect (like changing from
wireless to cable), then it starts bailing out.

But OK,  we have some more things to go on now.  Thanks!

(will try Lenovo tomorrow, we usually buy those machines now...)

-- Ragge

Jeffrey Altman wrote:
> As a follow up to my own post.  The problem is not that this sequence of
> events is occurring.  It is supposed to.  The problem is that it is
> occurring as frequently as it is when there is no apparent reason for
> the network stack to be resetting its IP address.
>
> On one of the Lenovo laptops the Intel drivers for the wired adapter
> attempts to hide the status of the network from Windows in order to
> prevent XP from displaying the "not connected balloon" from the
> notification area.  The side effect of this is that the connection
> management software sees both the wireless and wired connections, turns
> off the wireless, then realizes the wired doesn't really exist, turns
> back on the wireless, thinks the wired is there, and repeats ...
>
> >From the user perspective the network performance sucks because there
> are a huge number of retries due to packet loss on the wireless
> interface.  From the perspective of afsd_service, the IP address list is
> constantly changing.
>
> The fix for this machine was to disable the wired adapter entirely
> because it was never being used.
>
> Jeffrey Altman
>
>
> Jeffrey Altman wrote:
>   
>> Anders:
>>
>> The NRC_BRIDGE error means that the IP address the netbios name "AFS" is
>> bound to is no longer available.  The afsd_service smb_Listener thread
>> therefore attempts to rebind the name to the existing adapter.  The
>> NRC_WILD error is a failure to be able to perform the re-bind.  It
>> therefore resets the adapter at the Netbios layer which breaks all
>> communication across the adapter and then rebinds.
>>
>> The smb_LanAdapterChange log message indicates that Windows has reported
>> a change in the IP address list (either an addition or removal of an IP
>> address) via the NotifyAddrChange() API.
>>
>> So the question you need to answer is "why is Windows modifying the IP
>> address list so frequently?"
>>
>> Jeffrey Altman
>>
>>
>> Anders Magnusson wrote:
>>     
>>> Hi,
>>>
>>> I got a report of this strange problem with the Windows client on a
>>> laptop. Client is 1.5.57, laptop is XP SP2:
>>>
>>> Logging in via VPN to the wireless network -> AFS works fine.
>>> Disconnect VPN, use wireless network directly (AFS is allowed) -> AFS
>>> stop working.
>>> Reconnecting via VPN does not help.
>>>
>>> Writing tokens says:
>>>
>>> C:\>tokens
>>> Tokens held by the Cache Manager:
>>> AFS device may not have started
>>>
>>> To me it sounds like the SMB gateway or AFS interface stop working, but
>>> looking at the
>>> configuration of the interface it seems OK.
>>>
>>> Note that this might happen after any change of network interfaces (like
>>> up/down), not
>>> necessarily an IP number change.  Also, it is not dependent of any VPN
>>> client to encounter
>>> the problem, using just plain wireless (or connecting/disconnecting
>>> cable) gives this result.
>>>
>>> Any hints?  I'll add the afsd_init.log below.
>>>
>>> The entries:
>>> 2009-02-25 09:57:10: smb_LanAdapterChange
>>> 2009-02-25 09:57:10: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 09:57:10: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> always shows up when it stops talking.
>>>
>>> What can we do to debug this?
>>>
>>> -- Ragge
>>>
>>> 09:53:40: Create log file
>>> 09:53:40: Created log file
>>> PATH=C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program
>>> Files\ATI Technologies\ATI Control
>>> Panel;C:\WINDOWS\system32\WindowsPowerShell\v1.0;C:\Program
>>> Files\MIT\Kerberos\bin;C:\Program Files\OpenAFS\Common;C:\Program
>>> Files\OpenAFS\Client\Program
>>> 2009-02-25 09:53:41: OEM Code Page = 850
>>> 2009-02-25 09:53:41: locale =  C
>>> 2009-02-25 09:53:41: running on 2000+ - using RegisterServiceCtrlHandlerEx
>>> 2009-02-25 09:53:41: C:\Program
>>> Files\OpenAFS\Client\Program\afsd_service.exe version 1.5.5700
>>> 2009-02-25 09:53:41: Num of Process Modules: 46
>>> 2009-02-25 09:53:41: C:\Program Files\OpenAFS\Client\Program\libosi.dll
>>> version 1.5.5700
>>> 2009-02-25 09:53:43: C:\Program Files\OpenAFS\Common\afsrpc.dll version
>>> 1.5.5700
>>> 2009-02-25 09:53:44: C:\Program Files\OpenAFS\Common\afspthread.dll
>>> version 1.5.5700
>>> 2009-02-25 09:53:44: C:\Program Files\OpenAFS\Common\afsauthent.dll
>>> version 1.5.5700
>>> 2009-02-25 09:53:44: C:\Program
>>> Files\OpenAFS\Client\Program\libafsconf.dll version 1.5.5700
>>> 2009-02-25 09:53:44: osi_InitDebug code 0
>>> 2009-02-25 09:53:44: gethostname fdc-l017
>>> 2009-02-25 09:53:45: Lock Order Validation Off
>>> 2009-02-25 09:53:45: Trace Options = 0
>>> 2009-02-25 09:53:45: Default trace buffer size 10000
>>> 2009-02-25 09:53:45: osi_LogCreate log addr 9958c0
>>> 2009-02-25 09:53:45: Cache size 500000
>>> 2009-02-25 09:53:45: Chunk size 262144 (18)
>>> 2009-02-25 09:53:45: Block size 4096
>>> 2009-02-25 09:53:45: Defaulting to 4 background daemons
>>> 2009-02-25 09:53:46: Defaulting to 25 server threads
>>> 2009-02-25 09:53:46: Status cache entries: 20000
>>> 2009-02-25 09:53:46: Default volume cache entries: 3333
>>> 2009-02-25 09:53:46: Default cell cache entries: 1024
>>> 2009-02-25 09:53:46: Logoff token transfer on
>>> 2009-02-25 09:53:46: Logoff token transfer timeout 120 seconds
>>> 2009-02-25 09:53:46: Default root volume name root.afs
>>> 2009-02-25 09:53:46: Mount root /afs
>>> 2009-02-25 09:53:47: Default cache path C:\WINDOWS\TEMP\AFSCache
>>> 2009-02-25 09:53:47: Cache type is FILE
>>> 2009-02-25 09:53:47: Cache Validation on Startup
>>> 2009-02-25 09:53:47: Set to trap on panic
>>> 2009-02-25 09:53:47: Sys name x86_win32 i386_w2k i386_nt40
>>> 2009-02-25 09:53:47: SecurityLevel is crypt
>>> 2009-02-25 09:53:47: CM ForceAnonVLDB is off
>>> 2009-02-25 09:53:47: DNS will be used to find AFS cell servers
>>> 2009-02-25 09:53:47: Freelance client feature is activated
>>> 2009-02-25 09:53:47: SMB Server Unicode Support is enabled
>>> 2009-02-25 09:53:47: Dot files/dirs will be marked hidden
>>> 2009-02-25 09:53:47: Maximum number of multiplexed sessions is 50
>>> 2009-02-25 09:53:47: Maximum number of VCs per server is 100
>>> 2009-02-25 09:53:47: SMB authentication type is EXTENDED
>>> 2009-02-25 09:53:47: RX Jumbograms are disabled
>>> 2009-02-25 09:53:47: RX extraPackets is 2176
>>> 2009-02-25 09:53:47: RX udpbufsize is 262144
>>> 2009-02-25 09:53:47: RX maximum MTU is 1260
>>> 2009-02-25 09:53:47: RX Peer Statistics gathering is enabled
>>> 2009-02-25 09:53:47: RX Process Statistics gathering is enabled
>>> 2009-02-25 09:53:47: RX Hot Thread is enabled
>>> 2009-02-25 09:53:47: CM CallBackPort is 7001
>>> 2009-02-25 09:53:47: EnableServerLocks: server requested
>>> 2009-02-25 09:53:47: CM DeleteReadOnly is 0
>>> 2009-02-25 09:53:47: CM BPlusTrees is 1
>>> 2009-02-25 09:53:47: No PrefetchExecutableExtensions
>>> 2009-02-25 09:53:47: CM OfflineReadOnlyIsValid is 0
>>> 2009-02-25 09:53:47: CM GiveUpAllCallBacks is 0
>>> 2009-02-25 09:53:47: CM FollowBackupPath is 0
>>> 2009-02-25 09:53:48: First Network address 82f062eb SubnetMask fffffc00
>>> 2009-02-25 09:53:48: lanmanworkstation : SessTimeout 45
>>> 2009-02-25 09:53:48: ConnDeadTimeout is 22
>>> 2009-02-25 09:53:48: HardDeadTimeout is 45
>>> 2009-02-25 09:53:48: Cache File "C:\WINDOWS\TEMP\AFSCache" already exists
>>> 2009-02-25 09:53:48: Existing File Size: 00000000:217747D4
>>> 2009-02-25 09:53:48: Granularity - 10000
>>> 2009-02-25 09:53:48: Reusing existing AFS Cache data:
>>> 2009-02-25 09:53:49:   Base Address   = 202D0000
>>> 2009-02-25 09:53:49:   stats          = 20000
>>> 2009-02-25 09:53:49:   chunkSize      = 262144
>>> 2009-02-25 09:53:49:   blockSize      = 4096
>>> 2009-02-25 09:53:49:   bufferSize     = 561465300
>>> 2009-02-25 09:53:49:   cacheType      = 1
>>> 2009-02-25 09:53:49:   volumeHashTableSize  = 467
>>> 2009-02-25 09:53:49:   currentVolumes = 6
>>> 2009-02-25 09:53:49:   maxVolumes     = 3333
>>> 2009-02-25 09:53:49:   cellHashTableSize = 139
>>> 2009-02-25 09:53:49:   currentCells   = 3
>>> 2009-02-25 09:53:49:   maxCells       = 1024
>>> 2009-02-25 09:53:49:   scacheHashTableSize  = 9973
>>> 2009-02-25 09:53:49:   currentSCaches = 6861
>>> 2009-02-25 09:53:49:   maxSCaches     = 20000
>>> 2009-02-25 09:53:49: Validating Cache Contents
>>> 2009-02-25 09:53:55: Volume Serial Number: 0x288b5350
>>> 2009-02-25 09:53:55: Machine SID: S-1-5-21-602162358-507921405-839522115
>>> 2009-02-25 09:53:55: Initializing Uuid to
>>> 7895e3f7-a31a-414d-afa9-f7c543c2f251
>>> 2009-02-25 09:53:55: Initializing Volume Data
>>> 2009-02-25 09:53:55: Initializing Cell Data
>>> 2009-02-25 09:53:55: Initializing ACL Data
>>> 2009-02-25 09:53:55: Initializing Stat Data
>>> 2009-02-25 09:53:55: CM UseDNLC = 1
>>> 2009-02-25 09:53:55: CM DebugDNLC = 0
>>> 2009-02-25 09:53:55: Initializing Data Buffers
>>> 2009-02-25 09:53:55: Cache Initialization Complete
>>> 2009-02-25 09:53:55: cm_InitMappedMemory code 0
>>> 2009-02-25 09:53:56: rx_SetNoJumbo successful
>>> 2009-02-25 09:53:56: rx_SetMaxMTU 1260 successful
>>> 2009-02-25 09:53:56: rx_SetUdpBufSize 262144
>>> 2009-02-25 09:53:56: rx_Init code 0
>>> 2009-02-25 09:53:57: rx_NewService addr 16fff68
>>> 2009-02-25 09:53:57: rx_NewService addr 16fffb0
>>> 2009-02-25 09:53:57: rx_StartServer
>>> 2009-02-25 09:53:57: cm_GetRootCellName code 0, cm_freelanceEnabled= 1,
>>> rcn= ltu.se
>>> 2009-02-25 09:53:58: Mountpoint[0] = ltu.se#ltu.se:root.cell.
>>> 2009-02-25 09:53:58: Mountpoint[1] = .ltu.se%ltu.se:root.cell.
>>> 2009-02-25 09:53:58: Mountpoint[2] = .root%ltu.se:root.afs.
>>> 2009-02-25 09:53:58: Mountpoint[3] = kth.se#kth.se:root.cell.
>>> 2009-02-25 09:53:59: Mountpoint[4] =
>>> athena.mit.edu#athena.mit.edu:root.cell.
>>> 2009-02-25 09:53:59: Mountpoint[5] = openafs.org#openafs.org:root.cell.
>>> 2009-02-25 09:53:59: cm_GetSCache code 0 scache 2056b808
>>> 2009-02-25 09:53:59: MpsSvc Service could not be opened for query: 0x424
>>> 2009-02-25 09:53:59: cm_InitDaemon complete
>>> 2009-02-25 09:53:59: RPC server listening
>>> 2009-02-25 09:53:59: AutoStart 0x2
>>> 2009-02-25 09:53:59: StoreAnsiFilenames = 0
>>> 2009-02-25 09:53:59: daemonCheckDownInterval is 180
>>> 2009-02-25 09:53:59: EnableSMBAsyncStore = 1
>>> 2009-02-25 09:53:59: daemonCheckUpInterval is 240
>>> 2009-02-25 09:53:59: SMBAsyncStoreSize = 131072
>>> 2009-02-25 09:53:59: daemonCheckVolInterval is 3600
>>> 2009-02-25 09:53:59: daemonCheckCBInterval is 60
>>> 2009-02-25 09:53:59: daemonCheckVolCBInterval is 0
>>> 2009-02-25 09:53:59: daemonCheckLockInterval is 60
>>> 2009-02-25 09:53:59: daemonCheckTokenInterval is 180
>>> 2009-02-25 09:53:59: daemonCheckOfflineVolInterval is 600
>>> 2009-02-25 09:53:59: daemonPerformanceTuningInterval is 0
>>> 2009-02-25 09:53:59: LAN adapter number 10
>>> 2009-02-25 09:54:00: Using >AFS< as SMB server name
>>> 2009-02-25 09:54:00: smb_localNamep is >AFS<
>>> 2009-02-25 09:54:00: Netbios NCBRESET lana 10 succeeded
>>> 2009-02-25 09:54:00: lana_list.length 1
>>> 2009-02-25 09:54:03: Netbios NCBADDNAME lana=10 code=0 retcode=0 complete=0
>>> 2009-02-25 09:54:03: Netbios NCBADDNAME added new name >AFS             <
>>> 2009-02-25 09:54:03: Netbios NCBADDNAME succeeded on lana 10
>>> 2009-02-25 09:54:03: smb_NetbiosInit smb_LANadapter=10
>>> 2009-02-25 09:54:03: MsV1_0SetProcessOption success
>>> 2009-02-25 09:54:03: Setting SMB server domain name to [FDC-L017]
>>> 2009-02-25 09:54:03: smb_StartListeners
>>> 2009-02-25 09:54:03: smb_Init complete
>>> 2009-02-25 09:54:03: Unable to open Windows Firewall Profile
>>> 2009-02-25 09:54:03: GlobalAutoMap thread completed
>>> 2009-02-25 09:55:07: Windows Firewall Configuration succeeded
>>> 2009-02-25 09:56:47: smb_LanAdapterChange
>>> 2009-02-25 09:56:47: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 09:56:47: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 09:56:53: Mountpoint[0] = ltu.se#ltu.se:root.cell.
>>> 2009-02-25 09:56:53: Mountpoint[1] = .ltu.se%ltu.se:root.cell.
>>> 2009-02-25 09:56:53: Mountpoint[2] = .root%ltu.se:root.afs.
>>> 2009-02-25 09:56:53: Mountpoint[3] = kth.se#kth.se:root.cell.
>>> 2009-02-25 09:56:53: Mountpoint[4] =
>>> athena.mit.edu#athena.mit.edu:root.cell.
>>> 2009-02-25 09:56:53: Mountpoint[5] = openafs.org#openafs.org:root.cell.
>>> 2009-02-25 09:57:10: smb_LanAdapterChange
>>> 2009-02-25 09:57:10: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 09:57:10: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 10:00:57: smb_LanAdapterChange
>>> 2009-02-25 10:01:16: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 10:01:16: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 10:01:19: smb_LanAdapterChange
>>> 2009-02-25 10:01:19: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 10:01:19: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 10:01:23: smb_LanAdapterChange
>>> 2009-02-25 10:01:23: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 10:01:23: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 10:10:01: smb_LanAdapterChange
>>> 2009-02-25 10:10:01: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 10:10:01: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>> 2009-02-25 10:10:29: smb_LanAdapterChange
>>> 2009-02-25 10:10:29: NCBLISTEN lana=10 failed with NRC_BRIDGE, retrying ...
>>> 2009-02-25 10:10:29: NCBLISTEN lana=10 failed with NRC_NOWILD, retrying ...
>>>
>>>
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>>>