[OpenAFS-devel] fileserver deadlocks after internal error in callback.c
Rainer Toebbicke
rtb@pclella.cern.ch
Mon, 14 Apr 2003 14:34:11 +0200
This is a multi-part message in MIME format.
--------------040401050305060302090505
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
If the (pthreaded-) fileserver encounters an internal error in the callback
structures, it'll call ShutDown() (callback.c). Same for certain situations in
host.c.
In most cases ShutDown() is called with the H_LOCK lock held, and will
eventually call PrintCounters(). This in turn calls routines that acquire all
sorts of locks, in particular H_LOCK in h_GetWorkStats(), at which point a
deadlock situation arises.
Generally speaking relying on the correctness of too many internal structures
is unhealthy once it becomes obvious that something is wrong to a point that
you're ready to give up completely.
The attached patch modifies callback.c to call ShutDownAndCore(PANIC) instead
of simply ShutDown *and* skips calling PrintCounters() when 'dopanic' is set.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke http://cern.ch/~rtb rtb@mail.cern.ch O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland > |
Phone: +41 22 767 8985 Fax: +41 22 767 7155 ( )\( )
--------------040401050305060302090505
Content-Type: text/plain;
name="p_PanicNoDeadlock"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="p_PanicNoDeadlock"
*** openafs/src/viced/viced.c.org Mon Feb 10 10:31:51 2003
--- openafs/src/viced/viced.c Fri Apr 11 10:21:36 2003
***************
*** 874,880 ****
}
#endif
DFlush();
! PrintCounters();
/* do not allows new reqests to be served from now on, all new requests
are returned with an error code of RX_RESTARTING ( transient failure ) */
--- 874,880 ----
}
#endif
DFlush();
! if (!dopanic) PrintCounters();
/* do not allows new reqests to be served from now on, all new requests
are returned with an error code of RX_RESTARTING ( transient failure ) */
*** openafs/src/viced/callback.c.org Thu Mar 27 14:30:05 2003
--- openafs/src/viced/callback.c Fri Apr 11 10:26:49 2003
***************
*** 443,449 ****
assert(0);
ViceLog(0,("CDel: Internal Error -- shutting down: wanted %d from %d, now at %d\n",cbi,fe->firstcb,*cbp));
DumpCallBackState();
! ShutDown();
}
}
CDelPtr(fe, cbp);
--- 443,449 ----
assert(0);
ViceLog(0,("CDel: Internal Error -- shutting down: wanted %d from %d, now at %d\n",cbi,fe->firstcb,*cbp));
DumpCallBackState();
! ShutDownAndCore(PANIC);
}
}
CDelPtr(fe, cbp);
***************
*** 493,499 ****
if (safety > cbstuff.nblks) {
ViceLog(0,("FindCBPtr: Internal Error -- shutting down.\n"));
DumpCallBackState();
! ShutDown();
}
cb = itocb(*cbp);
if (cb->hhead == hostindex)
--- 493,499 ----
if (safety > cbstuff.nblks) {
ViceLog(0,("FindCBPtr: Internal Error -- shutting down.\n"));
DumpCallBackState();
! ShutDownAndCore(PANIC);
}
cb = itocb(*cbp);
if (cb->hhead == hostindex)
***************
*** 696,702 ****
if (safety > cbstuff.nblks) {
ViceLog(0,("AddCallBack1: Internal Error -- shutting down.\n"));
DumpCallBackState();
! ShutDown();
}
if (cb->hhead == h_htoi(host))
break;
--- 696,702 ----
if (safety > cbstuff.nblks) {
ViceLog(0,("AddCallBack1: Internal Error -- shutting down.\n"));
DumpCallBackState();
! ShutDownAndCore(PANIC);
}
if (cb->hhead == h_htoi(host))
break;
***************
*** 1443,1449 ****
if (ntimedout > cbstuff.nblks) {
ViceLog(0,("CCB: Internal Error -- shutting down...\n"));
DumpCallBackState();
! ShutDown();
}
} while (cbi != *thead);
*thead = 0;
--- 1443,1449 ----
if (ntimedout > cbstuff.nblks) {
ViceLog(0,("CCB: Internal Error -- shutting down...\n"));
DumpCallBackState();
! ShutDownAndCore(PANIC);
}
} while (cbi != *thead);
*thead = 0;
--------------040401050305060302090505--