[OpenAFS-devel] Re: OpenAFS Master Repository branch, master, updated. BP--openafs-stable-1_6_x-32-g2ea508e

Benjamin Kaduk kaduk@MIT.EDU
Wed, 25 Aug 2010 20:32:16 -0400 (EDT)


On Wed, 25 Aug 2010, Matt W. Benjamin wrote:

> Hi Ben,
>
> I have tested both unmounting /afs, and rebooting with AFS mounted.  I 
> haven't attempted to load and start AFS again after unmounting.  So I 
> don't think this could be a regression.

Would you mind testing the mount-unmount-unload-reload-mount cycle when 
you get a chance?  My test box is having some issues at the moment (it's 
unclear if this is FreeBSD HEAD making it harder for us to hook the 
syscall table or an afsd regression or me doing something stupid).

>
> More importantly, consider two points.  First, the race between 
> osi_NetReceive and osi_StopListener with immediate soclose typically 
> results in the osi_NetReceive thread attempting illegal accesses to 
> various objects associated with rx_socket.  Second, we perform we can

I do believe I've seen panics from that race, yes.

> only perform one legitimate soclose on rx_socket, and we do perform it, 
> conditionally, after the wait.  The conditionally part may point to an 
> issue--I don't actually remember who introduced the notion of making 
> soclose conditional on so_is_disconn--if in fact we do omit to soclose, 
> I'd suggest removing the condition (and supporting macro).

That is probably reasonable. Though, of course, finding out if we do omit 
the call to soclose depends on testing it, which we apparently haven't 
done recently.
Given that """
  * soclose() destroys a socket after possibly waiting for it to disconnect.
  * This is a public interface that socket consumers should use to close and
  * release a socket when done with it.
"""
it would seem that the so_is_disconn check is probably bogus.

-Ben