[OpenAFS-devel] Solaris fixes for 1.4.x / AFS_SUN510_ENV

Frank Batschulat (Home) Frank.Batschulat@Sun.COM
Mon, 11 Feb 2008 19:50:10 +0100


On Mon, 11 Feb 2008 19:28:51 +0100, Derrick Brashear <shadow@gmail.com> wrote:

>> >> >> 1. SSYS process exiting considered harmful
>> >> >>
>> >> >>   The first problem is that setting process flag SSYS on a process that
>> >> >>   exits, as the afs_osi_Invisible routine on Solaris 10 does, causes the
>> >> >>   system not to clean up the contract state of the process.  This leaves
>> >> >>   a dangling kernel-memory pointer in the contract table which used to
>> >> >>   point to the process struct.
>> >> >>
>> >> >>   Any user can corrupt kernel memory and cause a panic with the 'ctstat'
>> >> >>   command and the system cannot shut down without either panicing or
>> >> >>   going into an infinite loop as svc.startd repeatedly tries to kill the
>> >> >>   non-existent process.
>> >> >>
>> >> >> I really don't know why the code would set SSYS on a userland process
>> >> >> that's about to exit in the first place.  Can anyone shed any light?
>> >> >
>> >> > Threads that call afs_osi_Invisible are not about to exit; they're about to
>> >> > become long-lived AFS kernel threads.  Setting SSYS is correct; we just
>> >>
>> >> Actually it is not appropriate for an arbitrary thread/proc to set SSYS.
>> >>
>> >> Only system processes [they exist only in kernel, i,e p_as is set to kas]
>> >> created with newproc() are eligible for SSYS, and that happens automatically in newproc().
>> >
>> > This is a system process, just not one created by newproc().
>>
>> actually there are only a few 'system processes' and these are sched, init, pageout, fsflush,
>> zsched and the cluster_wrapper. there are no other 'system processes' in that term.
>>
>> refer to main() in
>> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/main.c
>>
>> regular kernel threads are parented to sched (p0) while zone specific kernel threads
>> created by zthread_create() are parented to zsched.
>>
>> > Presumably we need to do something analogous to the linux
>> > kernel_thread code, calling newproc.
>>
>> nope, we've been there before:
>>
>> http://www.openafs.org/pipermail/openafs-devel/2002-April/007896.html
>>
>> I wonder what are you trying to accomplish by setting SSYS ? and I'm still
>> unclear if you are doing this to a kernel thread or a user land process.
>
> afsd->afs syscall() and then SSYS is set. Before the syscall returns,
> SSYS is cleared. I don't have notes handy but I assume this was "we
> really aren't interested in being signalled while we're in the
> kernel". I guess then (if that's really it) lwp_sigmask, or switch to
> real (not newproc) kernel threads.

ah, so the AFS daemon user land process issuing the AFS syscall is doing this, thanks.

if thats the intent, ie. block all signal over the AFS syscall kernel execution,
the afsd could possibly use sigfillset(3C) & thr_sigsetmask(3C), e.g

sigset_t sgset;

/* Block all signals

(void) sigfillset(&sgset);
(void) thr_sigsetmask(SIG_BLOCK, &sgset, NULL);

execute AFS syscall;

/* open for signals again

(void) thr_sigsetmask(SIG_UNBLOCK, &sgset, NULL);

I can't comment on the real kernel threads though because I'm not familiar
enough with how the syscall is currently implemented.

---
frankB