FW: [OpenAFS] Errors: Fileserver freezes, Volumes contains orphans

Rubino Geiß kb44@rz.uni-karlsruhe.de
Fri, 31 Jan 2003 10:44:42 +0100


This is a multi-part message in MIME format.

------=_NextPart_000_0035_01C2C915.D51DEE80
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

=20
-----Original Message-----
From: Rubino Gei=DF [mailto:kb44@rz.uni-karlsruhe.de]=20
Sent: Friday, January 31, 2003 8:48 AM
To: 'Srikanth Vishwanathan'
Subject: RE: [OpenAFS] Errors: Fileserver freezes, Volumes contains =
orphans


Yes, this might work, but most certainly not: the fileserver was not =
dying
on kill -KILL so why should he be reacting to some else signal? Anyway, =
I
will give it a try.
=20
Bye, Ruby

-----Original Message-----
From: openafs-info-admin@openafs.org =
[mailto:openafs-info-admin@openafs.org]
On Behalf Of Srikanth Vishwanathan
Sent: Friday, January 31, 2003 3:55 AM
To: openafs-info@openafs.org
Subject: RE: [OpenAFS] Errors: Fileserver freezes, Volumes contains =
orphans



> for starters, the LWP fileserver (if you build openafs, find it in
> src/viced/fileserver) will actually leave a core, unlike the pthreaded
> fileserver, if it dies. if as i'm guessing you have one pthread dying =
now
> this may help.

The LWP fileserver might drop a core, but it usually doesn't show the=20
stack trace of the LWP that caused the problem.=20

There's also this other trick for getting a multithreaded application=20
to dump core that I read about in some Linux newsgroup. The trick is to=20
register signal handlers for signals like ABRT, SEGV and BUS and have=20
the signal handler kill all the other threads before attempting to dump=20
core. This doesn't always work, but has helped me debug some Linux=20
problems.=20

int signal_handler(int num)=20
{=20
        struct sigaction new_action;=20

        pthread_kill_other_threads_np();=20
       =20
        sleep(1);=20

        /* Restore default handler for abort */        =20
        sigemptyset(&new_action.sa_mask);=20
        new_action.sa_handler =3D SIG_DFL;=20
        new_action.sa_flags =3D 0;=20
        sigaction(SIGABRT, &new_action, NULL);=20

        abort();=20
}=20

And in main()=20

int main()=20
{=20
.=20
.=20
        struct sigaction new_action;=20

        sigemptyset(&new_action.sa_mask);=20
        new_action.sa_handler =3D error_handler;=20
        new_action.sa_flags =3D 0;=20

        sigaction(SIGABRT, &new_action, NULL);=20
        sigaction(SIGSEGV, &new_action, NULL);=20
        sigaction(SIGBUS, &new_action, NULL);=20

        pthread_create(...        =20
}=20



------=_NextPart_000_0035_01C2C915.D51DEE80
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<TITLE>Nachricht</TITLE>

<META content=3D"MSHTML 6.00.2800.1126" name=3DGENERATOR></HEAD>
<BODY>
<DIV>&nbsp;</DIV>
<DIV></DIV>
<DIV class=3DOutlookMessageHeader lang=3Dde dir=3Dltr align=3Dleft><FONT =
face=3DTahoma=20
size=3D2>-----Original Message-----<BR><B>From:</B> Rubino Gei=DF=20
[mailto:kb44@rz.uni-karlsruhe.de] <BR><B>Sent:</B> Friday, January 31, =
2003 8:48=20
AM<BR><B>To:</B> 'Srikanth Vishwanathan'<BR><B>Subject:</B> RE: =
[OpenAFS]=20
Errors: Fileserver freezes, Volumes contains =
orphans<BR><BR></FONT></DIV>
<DIV><SPAN class=3D831304307-31012003><FONT face=3DArial color=3D#0000ff =
size=3D2>Yes,=20
this might work, but most certainly not: the fileserver was not dying on =
kill=20
-KILL so why should he be reacting to some else signal? Anyway, I will =
give it a=20
try.</FONT></SPAN></DIV>
<DIV><SPAN class=3D831304307-31012003><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D831304307-31012003><FONT face=3DArial color=3D#0000ff =
size=3D2>Bye,=20
Ruby</FONT></SPAN></DIV>
<BLOCKQUOTE=20
style=3D"PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px =
solid; MARGIN-RIGHT: 0px">
  <DIV></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Dde dir=3Dltr =
align=3Dleft><FONT face=3DTahoma=20
  size=3D2>-----Original Message-----<BR><B>From:</B>=20
  openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org] =
<B>On=20
  Behalf Of </B>Srikanth Vishwanathan<BR><B>Sent:</B> Friday, January =
31, 2003=20
  3:55 AM<BR><B>To:</B> openafs-info@openafs.org<BR><B>Subject:</B> RE:=20
  [OpenAFS] Errors: Fileserver freezes, Volumes contains=20
  orphans<BR><BR></FONT></DIV><BR><FONT size=3D2><TT>&gt; for starters, =
the LWP=20
  fileserver (if you build openafs, find it in<BR>&gt; =
src/viced/fileserver)=20
  will actually leave a core, unlike the pthreaded<BR>&gt; fileserver, =
if it=20
  dies. if as i'm guessing you have one pthread dying now<BR>&gt; this =
may=20
  help.<BR></TT></FONT><BR><FONT size=3D2><TT>The LWP fileserver might =
drop a=20
  core, but it usually doesn't show the</TT></FONT> <BR><FONT =
size=3D2><TT>stack=20
  trace of the LWP that caused the problem.</TT></FONT> <BR><BR><FONT=20
  size=3D2><TT>There's also this other trick for getting a multithreaded =

  application</TT></FONT> <BR><FONT size=3D2><TT>to dump core that I =
read about in=20
  some Linux newsgroup. The trick is to</TT></FONT> <BR><FONT=20
  size=3D2><TT>register signal handlers for signals like ABRT, SEGV and =
BUS and=20
  have</TT></FONT> <BR><FONT size=3D2><TT>the signal handler kill all =
the other=20
  threads before attempting to dump</TT></FONT> <BR><FONT =
size=3D2><TT>core. This=20
  doesn't always work, but has helped me debug some Linux</TT></FONT> =
<BR><FONT=20
  size=3D2><TT>problems.</TT></FONT> <BR><BR><FONT size=3D2><TT>int=20
  signal_handler(int num)</TT></FONT> <BR><FONT =
size=3D2><TT>{</TT></FONT>=20
  <BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; struct sigaction=20
  new_action;</TT></FONT> <BR><BR><FONT size=3D2><TT>&nbsp; &nbsp; =
&nbsp; &nbsp;=20
  pthread_kill_other_threads_np();</TT></FONT> <BR><FONT =
size=3D2><TT>&nbsp;=20
  &nbsp; &nbsp; &nbsp; </TT></FONT><BR><FONT size=3D2><TT>&nbsp; &nbsp; =
&nbsp;=20
  &nbsp; sleep(1);</TT></FONT> <BR><BR><FONT size=3D2><TT>&nbsp; &nbsp; =
&nbsp;=20
  &nbsp; /* Restore default handler for abort */ &nbsp; &nbsp; &nbsp;=20
  &nbsp;</TT></FONT> <BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp;=20
  sigemptyset(&amp;new_action.sa_mask);</TT></FONT> <BR><FONT =
size=3D2><TT>&nbsp;=20
  &nbsp; &nbsp; &nbsp; new_action.sa_handler =3D SIG_DFL;</TT></FONT> =
<BR><FONT=20
  size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_flags =3D =
0;</TT></FONT>=20
  <BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGABRT,=20
  &amp;new_action, NULL);</TT></FONT> <BR><BR><FONT size=3D2><TT>&nbsp; =
&nbsp;=20
  &nbsp; &nbsp; abort();</TT></FONT> <BR><FONT =
size=3D2><TT>}</TT></FONT>=20
  <BR><BR><FONT size=3D2><TT>And in main()</TT></FONT> <BR><BR><FONT=20
  size=3D2><TT>int main()</TT></FONT> <BR><FONT =
size=3D2><TT>{</TT></FONT> <BR><FONT=20
  size=3D2><TT>.</TT></FONT> <BR><FONT size=3D2><TT>.</TT></FONT> =
<BR><FONT=20
  size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; struct sigaction=20
  new_action;</TT></FONT> <BR><BR><FONT size=3D2><TT>&nbsp; &nbsp; =
&nbsp; &nbsp;=20
  sigemptyset(&amp;new_action.sa_mask);</TT></FONT> <BR><FONT =
size=3D2><TT>&nbsp;=20
  &nbsp; &nbsp; &nbsp; new_action.sa_handler =3D =
error_handler;</TT></FONT>=20
  <BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; new_action.sa_flags =
=3D=20
  0;</TT></FONT> <BR><BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp;=20
  sigaction(SIGABRT, &amp;new_action, NULL);</TT></FONT> <BR><FONT=20
  size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; sigaction(SIGSEGV, =
&amp;new_action,=20
  NULL);</TT></FONT> <BR><FONT size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp;=20
  sigaction(SIGBUS, &amp;new_action, NULL);</TT></FONT> <BR><BR><FONT=20
  size=3D2><TT>&nbsp; &nbsp; &nbsp; &nbsp; pthread_create(... &nbsp; =
&nbsp; &nbsp;=20
  &nbsp;</TT></FONT> <BR><FONT size=3D2><TT>}</TT></FONT>=20
<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0035_01C2C915.D51DEE80--