[OpenAFS] bosserver - lwp stack overflow

Marcus Watts mdw@spam.ifs.umich.edu
Tue, 05 Jun 2007 18:56:49 -0400


Karen L Eldredge <keldredg@us.ibm.com> writes:
> Date:    Tue, 05 Jun 2007 11:50:24 MDT
> To:      openafs-info@openafs.org
> From:    Karen L Eldredge <keldredg@us.ibm.com>
> Subject: [OpenAFS] bosserver - lwp stack overflow
> 
> I'm trying to configure an OpenAFS server on a PPC with SLES 10 installed, 
> and have compiled successfully.  When trying to setup the initial server 
> machine I get the following error when trying to run bosserver -noauth: 
> 
> topstack = 0xf7fc604c: stackptr = 0xf7f96008: stacksize = 0x30000
> Tue Jun  5 11:15:06 2007 LWP: stack overflow in process IO MANAGER!
> 
> Any help will be appreciated.

You don't report the line before this, which should read something
not quite like:
	stackcheck = %u: stack = %u 
did your version not print this?

You might try this:
find the 2nd line in lwp.c that allocates space for stackmemory,
might look something like this:
	#else /* !AFS_DARWIN_ENV */
		if ((stackmemory = (char *)malloc(stacksize + 7)) == NULL)
	#endif /* !AFS_DARWIN_ENV */

change the 7 to something larger.  512 maybe.  Then rebuild and try
the result.  What this will do is allocate extra slop at the "top" of the
stack.  If the problem is something to do with initial calling save problems,
and clobbering stuff "past" the top of the stack, this may fix it.
If this works, the correct fix is late in LWP_CreateProcess in the call
to savecontext(Create_Process_Part2; it will need another special
case for ppc64.  If you can figure out how much slop is actually needed
or even better why it's needed that would be helpful.

There's also a variable:
int lwp_MinStackSize = 0;
you might try setting this to something larger than
0x30000.  Try doubling it for starters.
This will help if the problem is that it really does need more stack
(at the bottom of the stack).

If you define DEBUG in lwp.c, rebuild, run something under gdb, & set
lwp_debug, you will get interesting debugging messages.  This is probably
not useful unless you are prepared to debug the problem.

				-Marcus