[OpenAFS] OpenAFS / Explorer hang when disabling/enabling NIC

Matt Renzelmann mattp281@renzelmann.com
Thu, 8 Jul 2010 08:31:45 -0500


This is a multipart message in MIME format.

------=_NextPart_000_000B_01CB1E78.00794F10
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hello,

 

I've observed the following issue with OpenAFS.  Platform is Windows 7 x64
"Ultimate" with all the latest Windows Update patches.  The behavior occurs
with the last three stable releases of OpenAFS recommended for Windows:
1.5.75, 1.5.74, and 1.5.73.  Using Network Identity Manager 2.0.0.304 per
Help -> About - the latest.

 

Details of the behavior:

- If I disable and then reenable the main network adapter--the one that AFS
is ultimately using to access my AFS data--I observe that windows Explorer
gets "stuck."  It appears to be stuck in some kind of busy live-lock state.

- I suspect that if I lose my Internet connection on the same adapter for
any reason, I get a similar symptom, but I've not confirmed this.

- Attempting to terminate the explorer process once it's in this state
fails.  It will not terminate.  Task Manager and Process Explorer +
administrative escalation is not sufficient.

- All applications that use Explorer functionality, e.g. file open/save
windows, will hang as soon as they invoke said functionality.

- Rebooting resolves the problem, though I often have some difficulty
rebooting cleanly in this scenario.

 

More background:

- I'm using the DEBUG version of AFS currently in an effort to resolve this.
I've had the problem with 1.5.74/73 using the standard "release" version.

- I have Process Explorer setup with symbols for AFS and Windows enabled so
I can see full stack traces with all function names.  Let me know if you
want anything.

- The tail of the afsd_init.log when the problem occurs:

7/8/2010 6:54:15 AM: Mountpoint[0] = openafs.org#openafs.org:root.cell.

7/8/2010 6:54:15 AM: Mountpoint[1] = .openafs.org%openafs.org:root.cell.

7/8/2010 6:54:15 AM: Mountpoint[2] = .root%openafs.org:root.afs.

7/8/2010 6:54:15 AM: Mountpoint[3] = cs.wisc.edu#cs.wisc.edu:root.cell.

7/8/2010 7:35:15 AM: smb_LanAdapterChange

7/8/2010 7:35:15 AM: NCBLISTEN lana=4 failed with NRC_BRIDGE, retrying ...

7/8/2010 7:35:15 AM: NCBLISTEN lana=4 failed with NRC_NOWILD, retrying ...

7/8/2010 7:35:35 AM: smb_LanAdapterChange

7/8/2010 7:35:35 AM: NCBLISTEN lana=4 failed with NRC_BRIDGE, retrying ...

7/8/2010 7:35:35 AM: NCBLISTEN lana=4 failed with NRC_NOWILD, retrying ...

7/8/2010 7:35:35 AM: smb_LanAdapterChange

7/8/2010 7:35:38 AM: NCBLISTEN lana=4 failed with NRC_BRIDGE, retrying ...

7/8/2010 7:35:38 AM: NCBLISTEN lana=4 failed with NRC_NOWILD, retrying ...

7/8/2010 7:35:58 AM: smb_LanAdapterChange

7/8/2010 7:35:58 AM: NCBLISTEN lana=4 failed with NRC_BRIDGE, retrying ...

7/8/2010 7:35:58 AM: NCBLISTEN lana=4 failed with NRC_NOWILD, retrying ...

7/8/2010 7:36:03 AM: smb_LanAdapterChange

7/8/2010 7:36:03 AM: NCBLISTEN lana=4 failed with NRC_BRIDGE, retrying ...

7/8/2010 7:36:03 AM: NCBLISTEN lana=4 failed with NRC_NOWILD, retrying ...

 

- The log clearly shows me disabling/enabling the main network adapter.
Note that I disabled it once, then re-enabled it once a few seconds later.

- Let me know if you'd like more of the log--I've saved a copy.

- Example of the Explorer process after I've attempted to terminate it:

http://www.renzelmann.com/temp/explorer.png

 

It hangs with these threads running indefinitely.  Note that they are doing
something as they are consuming CPU, but they will not terminate.  Explorer
normally contains many additional threads--these have exited cleanly in this
screenshot.

 

- System configuration includes:

  * A wireless adapter.  The Wireless adapter is enabled but not in use or
connected.

  * A wired adapter.  The wired adapter is used for network/Internet.

  * Several VMware Workstation 7 Virtual NICs.

  * A virtual Hamachi VPN NIC.  The VPN adapter is in use, but I doubt is
the cause as I've had this issue before I installed Hamachi.

  * The OpenAFS Loopback adapter.

- I can reproduce the problem easily by disabling the wired adapter and then
reenabling it, and also attempt to access a mapped AFS drive in Windows
Explorer.

- I never have any problems if I leave the OpenAFS service disabled and have
no drives mapped, so I am certain that an important part of the problem is
something OpenAFS is doing--perhaps it's conflicting with something else?

 

Does anyone have any recommendations on how to proceed to get OpenAFS
working reliably with this setup?  Do you need any additional information?

Thanks and regards,

Matt


------=_NextPart_000_000B_01CB1E78.00794F10
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><META =
HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 14 =
(filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p =
class=3DMsoNormal>Hello,<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>I've =
observed the following issue with OpenAFS.&nbsp; Platform is Windows 7 =
x64 &quot;Ultimate&quot; with all the latest Windows Update =
patches.&nbsp; The behavior occurs with the last three stable releases =
of OpenAFS recommended for Windows:&nbsp; 1.5.75, 1.5.74, and =
1.5.73.&nbsp; Using Network Identity Manager 2.0.0.304 per Help -&gt; =
About &#8211; the latest.<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Details of =
the behavior:<o:p></o:p></p><p class=3DMsoNormal>- If I disable and then =
reenable the main network adapter--the one that AFS is ultimately using =
to access my AFS data--I observe that windows Explorer gets =
&quot;stuck.&quot;&nbsp; It appears to be stuck in some kind of busy =
live-lock state.<o:p></o:p></p><p class=3DMsoNormal>- I suspect that if =
I lose my Internet connection on the same adapter for any reason, I get =
a similar symptom, but I've not confirmed this.<o:p></o:p></p><p =
class=3DMsoNormal>- Attempting to terminate the explorer process once =
it's in this state fails.&nbsp; It will not terminate.&nbsp; Task =
Manager and Process Explorer + administrative escalation is not =
sufficient.<o:p></o:p></p><p class=3DMsoNormal>- All applications that =
use Explorer functionality, e.g. file open/save windows, will hang as =
soon as they invoke said functionality.<o:p></o:p></p><p =
class=3DMsoNormal>- Rebooting resolves the problem, though I often have =
some difficulty rebooting cleanly in this scenario.<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>More =
background:<o:p></o:p></p><p class=3DMsoNormal>- I'm using the DEBUG =
version of AFS currently in an effort to resolve this.&nbsp; I've had =
the problem with 1.5.74/73 using the standard &quot;release&quot; =
version.<o:p></o:p></p><p class=3DMsoNormal>- I have Process Explorer =
setup with symbols for AFS and Windows enabled so I can see full stack =
traces with all function names.&nbsp; Let me know if you want =
anything.<o:p></o:p></p><p class=3DMsoNormal>- The tail of the =
afsd_init.log when the problem occurs:<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 6:54:15 AM: Mountpoint[0] =3D =
openafs.org#openafs.org:root.cell.<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 6:54:15 AM: Mountpoint[1] =3D =
.openafs.org%openafs.org:root.cell.<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 6:54:15 AM: Mountpoint[2] =3D =
.root%openafs.org:root.afs.<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 =
6:54:15 AM: Mountpoint[3] =3D =
cs.wisc.edu#cs.wisc.edu:root.cell.<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 7:35:15 AM: =
smb_LanAdapterChange<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:15 =
AM: NCBLISTEN lana=3D4 failed with NRC_BRIDGE, retrying =
...<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:15 AM: NCBLISTEN =
lana=3D4 failed with NRC_NOWILD, retrying ...<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 7:35:35 AM: =
smb_LanAdapterChange<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:35 =
AM: NCBLISTEN lana=3D4 failed with NRC_BRIDGE, retrying =
...<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:35 AM: NCBLISTEN =
lana=3D4 failed with NRC_NOWILD, retrying ...<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 7:35:35 AM: =
smb_LanAdapterChange<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:38 =
AM: NCBLISTEN lana=3D4 failed with NRC_BRIDGE, retrying =
...<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:38 AM: NCBLISTEN =
lana=3D4 failed with NRC_NOWILD, retrying ...<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 7:35:58 AM: =
smb_LanAdapterChange<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:58 =
AM: NCBLISTEN lana=3D4 failed with NRC_BRIDGE, retrying =
...<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:35:58 AM: NCBLISTEN =
lana=3D4 failed with NRC_NOWILD, retrying ...<o:p></o:p></p><p =
class=3DMsoNormal>7/8/2010 7:36:03 AM: =
smb_LanAdapterChange<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:36:03 =
AM: NCBLISTEN lana=3D4 failed with NRC_BRIDGE, retrying =
...<o:p></o:p></p><p class=3DMsoNormal>7/8/2010 7:36:03 AM: NCBLISTEN =
lana=3D4 failed with NRC_NOWILD, retrying ...<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>- The log =
clearly shows me disabling/enabling the main network adapter.&nbsp; Note =
that I disabled it once, then re-enabled it once a few seconds =
later.<o:p></o:p></p><p class=3DMsoNormal>- Let me know if you'd like =
more of the log--I've saved a copy.<o:p></o:p></p><p class=3DMsoNormal>- =
Example of the Explorer process after I've attempted to terminate =
it:<o:p></o:p></p><p =
class=3DMsoNormal>http://www.renzelmann.com/temp/explorer.png<o:p></o:p><=
/p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>It =
hangs with these threads running indefinitely.&nbsp; Note that they are =
doing something as they are consuming CPU, but they will not =
terminate.&nbsp; Explorer normally contains many additional =
threads--these have exited cleanly in this screenshot.<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>- System =
configuration includes:<o:p></o:p></p><p class=3DMsoNormal>&nbsp; * A =
wireless adapter.&nbsp; The Wireless adapter is enabled but not in use =
or connected.<o:p></o:p></p><p class=3DMsoNormal>&nbsp; * A wired =
adapter.&nbsp; The wired adapter is used for =
network/Internet.<o:p></o:p></p><p class=3DMsoNormal>&nbsp; * Several =
VMware Workstation 7 Virtual NICs.<o:p></o:p></p><p =
class=3DMsoNormal>&nbsp; * A virtual Hamachi VPN NIC.&nbsp; The VPN =
adapter is in use, but I doubt is the cause as I've had this issue =
before I installed Hamachi.<o:p></o:p></p><p class=3DMsoNormal>&nbsp; * =
The OpenAFS Loopback adapter.<o:p></o:p></p><p class=3DMsoNormal>- I can =
reproduce the problem easily by disabling the wired adapter and then =
reenabling it, and also attempt to access a mapped AFS drive in Windows =
Explorer.<o:p></o:p></p><p class=3DMsoNormal>- I never have any problems =
if I leave the OpenAFS service disabled and have no drives mapped, so I =
am certain that an important part of the problem is something OpenAFS is =
doing--perhaps it's conflicting with something else?<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Does anyone =
have any recommendations on how to proceed to get OpenAFS working =
reliably with this setup?&nbsp; Do you need any additional =
information?<o:p></o:p></p><p class=3DMsoNormal>Thanks and =
regards,<o:p></o:p></p><p =
class=3DMsoNormal>Matt<o:p></o:p></p></div></body></html>
------=_NextPart_000_000B_01CB1E78.00794F10--