[OpenAFS] Errors: Fileserver freezes, Volumes contains orphans

Srikanth Vishwanathan vsrikanth@in.ibm.com
Thu, 30 Jan 2003 21:54:38 -0500


This is a multipart message in MIME format.
--=_alternative 000FA8FE85256CBF_=
Content-Type: text/plain; charset="US-ASCII"

> for starters, the LWP fileserver (if you build openafs, find it in
> src/viced/fileserver) will actually leave a core, unlike the pthreaded
> fileserver, if it dies. if as i'm guessing you have one pthread dying 
now
> this may help.

The LWP fileserver might drop a core, but it usually doesn't show the
stack trace of the LWP that caused the problem.

There's also this other trick for getting a multithreaded application
to dump core that I read about in some Linux newsgroup. The trick is to
register signal handlers for signals like ABRT, SEGV and BUS and have
the signal handler kill all the other threads before attempting to dump
core. This doesn't always work, but has helped me debug some Linux
problems.

int signal_handler(int num)
{
        struct sigaction new_action;

        pthread_kill_other_threads_np();
 
        sleep(1);

        /* Restore default handler for abort */ 
        sigemptyset(&new_action.sa_mask);
        new_action.sa_handler = SIG_DFL;
        new_action.sa_flags = 0;
        sigaction(SIGABRT, &new_action, NULL);

        abort();
}

And in main()

int main()
{
.
.
        struct sigaction new_action;

        sigemptyset(&new_action.sa_mask);
        new_action.sa_handler = error_handler;
        new_action.sa_flags = 0;

        sigaction(SIGABRT, &new_action, NULL);
        sigaction(SIGSEGV, &new_action, NULL);
        sigaction(SIGBUS, &new_action, NULL);

        pthread_create(... 
}

--=_alternative 000FA8FE85256CBF_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2><tt>&gt; for starters, the LWP fileserver (if you build
openafs, find it in<br>
&gt; src/viced/fileserver) will actually leave a core, unlike the pthreaded<br>
&gt; fileserver, if it dies. if as i'm guessing you have one pthread dying
now<br>
&gt; this may help.<br>
</tt></font>
<br><font size=2><tt>The LWP fileserver might drop a core, but it usually
doesn't show the</tt></font>
<br><font size=2><tt>stack trace of the LWP that caused the problem.</tt></font>
<br>
<br><font size=2><tt>There's also this other trick for getting a multithreaded
application</tt></font>
<br><font size=2><tt>to dump core that I read about in some Linux newsgroup.
The trick is to</tt></font>
<br><font size=2><tt>register signal handlers for signals like ABRT, SEGV
and BUS and have</tt></font>
<br><font size=2><tt>the signal handler kill all the other threads before
attempting to dump</tt></font>
<br><font size=2><tt>core. This doesn't always work, but has helped me
debug some Linux</tt></font>
<br><font size=2><tt>problems.</tt></font>
<br>
<br><font size=2><tt>int signal_handler(int num)</tt></font>
<br><font size=2><tt>{</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; struct sigaction
new_action;</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; pthread_kill_other_threads_np();</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; </tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sleep(1);</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; /* Restore default
handler for abort */ &nbsp; &nbsp; &nbsp; &nbsp;</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigemptyset(&amp;new_action.sa_mask);</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_handler
= SIG_DFL;</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_flags
= 0;</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGABRT,
&amp;new_action, NULL);</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; abort();</tt></font>
<br><font size=2><tt>}</tt></font>
<br>
<br><font size=2><tt>And in main()</tt></font>
<br>
<br><font size=2><tt>int main()</tt></font>
<br><font size=2><tt>{</tt></font>
<br><font size=2><tt>.</tt></font>
<br><font size=2><tt>.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; struct sigaction
new_action;</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigemptyset(&amp;new_action.sa_mask);</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_handler
= error_handler;</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_flags
= 0;</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGABRT,
&amp;new_action, NULL);</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGSEGV,
&amp;new_action, NULL);</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGBUS,
&amp;new_action, NULL);</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &nbsp; &nbsp; pthread_create(...
&nbsp; &nbsp; &nbsp; &nbsp;</tt></font>
<br><font size=2><tt>}</tt></font>
<br>
--=_alternative 000FA8FE85256CBF_=--