[Port-solaris] Network availability during shutdown

Andrew Deason adeason@sinenomine.net
Wed, 9 Mar 2011 17:36:08 -0600


Hi,

Recently I've become aware that Solaris does not seem to like it when
OpenAFS tries to access the network during reboot/halt/poweroff. That
is, when the fs is unmounted during uadmin() -> kadmin() ->
vfs_unmountall() codepath, AFS kernel code cannot access the net.

Now, as of right now I don't think there are any releases of OpenAFS
that do try to hit the net on shutdown on Solaris (except prereleases),
but we will try to do that soon. The main reason for this being that we
want to notify fileservers that we are going away, so the don't try to
contact us.

When shutting down a Solaris box with a running a bleeding-edge
development version of OpenAFS with, say, 'reboot', I notice that it
takes quite a long time. The reason is that we are timing out on trying
to contact the fileservers. This could be a bug in OpenAFS's network
handling code, but I haven't seen anything like that yet.

So first of all, is it intentional that the network is not available at
this point in the shutdown process? I do not even see any errors given;
I can see that we are calling sosendmsg(), and it returns with no error
code, and no uio_resid. However, I have not been able to see any packets
on the wire that we're trying to send.

Assuming that's all intended and correct, is there any way for us to be
able to run something before the net is shut down? In the OpenSolaris
codebase, I see callbacks registered with the CB_CL_UADMIN_PRE_VFS class
are fired right before this, but I assume that is not helpful, since
that's right before the vfs_unmountall() call, so the net is probably
still not available then.

Assuming there is no way to do that, is there a good way to detect if
the network is available at this level? I can work around this by
preventing network access if the sys_shutdown global is nonzero, but I
don't know if that's the best way. I also assume that that is considered
not a public interface at all, since I can find no documentation on it.

Also, keep in mind this scenario I'm talking about is when someone turns
off or reboots the machine directly via reboot/halt/poweroff, and not
'shutdown'. So, it is not possible to prevent this via SMF/initscripts,
but it's also probably okay if whatever solution/workaround is
suboptimal compared to e.g. 'umount /afs'. I'd just like to avoid the
long delays, since if someone is running 'reboot', I'd expect they want
the machine to reboot quickly, even if it means some things shutting
down uncleanly.

-- 
Andrew Deason
adeason@sinenomine.net