[OpenAFS-devel] Solaris afs.rc file damage

Dean Anderson dean@av8.com
Wed, 11 Apr 2007 17:42:36 -0400 (EDT)


On Tue, 10 Apr 2007, Jeffrey Hutzelman wrote:

> On Tuesday, April 10, 2007 11:56:45 AM -0400 Dean Anderson <dean@av8.com> 
> wrote:
> 
> > On Mon, 9 Apr 2007, Robert Banz wrote:
> >
> >>
> >> One could go to the OpenSolaris folks and see if you can't get AFS
> >> officially allocated a syscall table entry that can be published in
> >> name_to_sysnum in future versions.
> >
> > I'll see if I can't make the contacts to do this. I have some other
> > kernel stuff (RFC1788, RFC4620 support) I'm hoping to get into solaris.
> 
> We have the contacts to do this; we just haven't done anything about it. 
> Unfortunately, part of the problem is that Sun doesn't consider the 
> user/kernel boundary to be a committed interface.  The committed interface 
> is the ABI between applications and the syscall stubs in libc; your libc 
> must match your kernel, and cross-version compatibility is present only for 
> dynamically-linked programs.

I just talked to some people, who suggested that it was the loadable
syscalls that 'a group within sun' wasn't committed to.  I recall the
certain linux folks also argue that loadable system calls are a bad
thing. I'm not sure why they argue that, but it perhaps merits some more
investigation into what their reasons are.  

Syscalls are nothing more than an ancient form of shared library which
once had implicit locking.  I think the same behavior can usually be
obtained by a driver with only ioctls.  I suppose the advantage is that
a driver has a notion of state from open file descriptors, and you can
tell if it is in-use and what is using it.  I suspose this makes things
easier when unloading and reloading.  ioctls can also be unique so that
hitting the wrong driver is less damaging.  On the downside, there are a
lot of potentially unnecessary open/close calls and file descriptors
which are held as nothing but status indicators.  There is a lot less
overhead in a dynamically loaded syscall.

So, I suspect the question should be: Could the afs kernel module be
turned into a driver with an ioctl?  There's a lot in there, and if
anything breaks the general premise that a system call can be cast as an
ioctl, this would probably be it...


> > [I'm not convinced a reboot is really necessary] Looking at the solaris
> > source, I can see that there is a modctl MODREADSYSBIND to read the
> > name_to_sysnum file. Unfortunately, I don't see any scriptable utility
> > in the solaris distribution to do this...a utility program will be
> > necessary.  There are some other alternatives: maybe modload should
> > always do MODREADSYSBIND before loading a module.  The kernel could also
> > do this all entirely by itself, just by stat'ing the file to see if it
> > needs to be re-read when searching for a free syscall entry, which
> > checking only happens if the name isn't found.
> 
> Hrm.  Doing that to a running system seems really dangerous, and having the 
> kernel do so automagically especially so.

As long as the numbers for inuse systems don't change unexpectedly,
there shouldn't be anything dangerous about this.  The point of this
table to assign syscall numbers to specific systems; numbers which stay
the same once assigned.  The numbers assigned shouldn't change. I agree
that if specific system/sycall bindings did change while in use, that
that would definitely be a problem.  There may also need to be locks to
prevent access during update, but that is merely a bug.

> > What to do with the other systems?  Do they really need reboots?  POSIX
> > extension for probing syscalls?
> 
> It really is necessary to reboot when package tells you to, 

It rarely necessary to reboot 'now'. It _may_ be necessary to reboot
before using the software, if the software can't run until a reboot. 
Most software isn't in this category. About the only time you need to 
reboot is if you have to statically link something into the kernel.
 
If the installer is going to make changes that will abruptly make the
system unrunnable (changing libc.so, for example) the script should
nicely notify the admin of that fact before making the changes and ask
if it should continue as a reboot will be necessary momentarilly.  But
this is also rarely the case.

But the point of my question here is whether the packager was actually
correct about needing a reboot.  Obviously, one does not need to reboot
when installing, say, gnu tar.

> the admin who wrote the package.proto wants, then the configuration should 
> be obeyed.  

My question is whether the admin was correct in their 'want', not
whether they should be obeyed if they are correct.

> If you don't ever want package to trigger a reboot, just don't 
> use the 'Q' flag.

What Q flag?  You've lost me.  The reboot wasn't triggered by a pkg or
rpm installer. It was in the init/rc startup script. There is no 'Q'
flag.

> As for your pontification on why automatic reboots are a bad idea, please 
> try to remember that there are as many ways to manage a large distributed 
> computing enviroment as there are large distributed computing environments. 
> Your believe that automatic reboots during startup are dangerous does not 
> mean that they cannot be used as part of a robust, successful 
> infrastructure for managing large numbers of systems without large numbers 
> of sysadmins.  As proof, I offer the example of the Andrew system, which 
> has used that approach since at least the late 1980's with great success.

As empirical proof that reboots on startup are a bad idea, I cite the
lack of 'reboot' in other /etc/init.d files in either solaris or linux.

And I cite the number of angry users when a system reboots unexpectedly;  
anger which is usually intensified when it is learned that the reboot
wasn't actually necessary at all.

And I cite the disdain that many people have for the 'reboot, reinstall'
mentality often associated with Microsoft.  A reboot is frequently no
cure for the ills of any computer.  An unscheduled reboot is even worse,
especially if its not really the case that the computer can't continue
to function.  But I concede that some people disagree, and that we will
have to agree to disagree. But I will just choose not to run software
that imposes that mentality on my sites, and will label such software as
'unstable' and 'unreliable'. Sometimes its not the software, but the
system administrators.  Some people have no choice but to accept system
administrators who are just short of being the BOFH. Others do not.  
Given the choice, I've found that most people prefer reasonable, careful
administrators who don't reboot systems unexpectedly, and who don't
reboot things without cause for the sheer hell of it.

		--Dean


-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 344 9000