[OpenAFS-devel] series of 1.4.2 fileserver crashes in rxi_FreeDataBufsTSFPQ (rx_packet.c)
Tom Keiser
tkeiser@gmail.com
Tue, 5 Dec 2006 06:44:35 -0500
On 12/5/06, Rainer Toebbicke <rtb@pclella.cern.ch> wrote:
> Jeffrey Altman wrote:
>
> >
> > Do you have any file server logs you can share?
> >
> > I'm interested in the interactions with the 3.4 clients.
> >
> > Did you save the tracebacks? Can you put them somewhere we can see them?
> >
> > Jeffrey Altman
> >
> >
>
> Just the last one:
>
> Tue Dec 5 01:41:28 2006 CheckHost: Probe failed for host
> 137.78.30.25:7001, code -01
> Tue Dec 5 01:42:24 2006 CheckHost: Probing all interfaces of host
> 130.199.48.51:7001 failed, code -01
> Tue Dec 5 01:44:24 2006 CB: WhoAreYou failed for 128.141.2.30:7001,
> error -01
> Tue Dec 5 01:45:11 2006 fssync: volume 537561738 restored; breaking
> all call backs
> Fatal Rx error: rx packet already free
>
>
> traceback:
> #1 0x0000003fc732fa1e in abort () from /lib64/tls/libc.so.6
> #2 0x0000000000470da0 in osi_Panic (msg=Could not find the frame base
> for "osi_Panic".
> ) at ./../rx/rx_user.c:222
> #3 0x0000000000486323 in rxi_FreeDataBufsTSFPQ (p=0x88f460, first=1,
> flush_global=0) at ./../rx/rx_packet.c:873
> #4 0x0000000000484783 in rxi_FreePackets (num_pkts=1, q=0x4820ba70)
> at ./../rx/rx_packet.c:397
> #5 0x000000000048a2af in rxi_PrepareSendPacket (call=0x2a98d618a0,
> p=0x2a97c25790, last=0)
> at ./../rx/rx_packet.c:2618
> #6 0x000000000048c622 in rxi_WritevProc (call=0x2a98d618a0,
> iov=0x4820bb90, nio=3, nbytes=68)
> at ./../rx/rx_rdwr.c:1086
> #7 0x000000000048cab7 in rx_WritevProc (call=0x2a98d618a0,
> iov=0x4820bb90, nio=3, nbytes=1540)
> at ./../rx/rx_rdwr.c:1178
> #8 0x000000000041a70d in FetchData_RXStyle (volptr=0x9a9940,
> targetptr=0x2a989c0de0, Call=0x2a98d618a0,
> Pos=589824, Len=1540, Int64Mode=0, a_bytesToFetchP=0x4820bd68,
> a_bytesFetchedP=0x4820bd60)
> at ../viced/afsfileprocs.c:7230
> #9 0x000000000040c794 in common_FetchData64 (acall=0x2a98d618a0,
> Fid=0x4820c0a0, Pos=589824, Len=65536,
> OutStatus=0x4820c030, CallBack=0x4820c020, Sync=0x4820c000,
> type=0) at ../viced/afsfileprocs.c:2444
> #10 0x000000000040d171 in SRXAFS_FetchData (acall=0x2a98d618a0,
> Fid=0x4820c0a0, Pos=589824, Len=65536,
> OutStatus=0x4820c030, CallBack=0x4820c020, Sync=0x4820c000) at
> ../viced/afsfileprocs.c:2571
> #11 0x000000000044e8de in _RXAFS_FetchData (z_call=0x2a98d618a0,
> z_xdrs=0x4820c110) at ../fsint/afsint.ss.c:69
> #12 0x00000000004552c9 in RXAFS_ExecuteRequest (z_call=0x2a98d618a0)
> at ../fsint/afsint.ss.c:1872
> #13 0x0000000000473ac3 in rxi_ServerProc (threadID=62, newcall=0x0,
> socketp=0x4820c1ac) at ./../rx/rx.c:1413
> #14 0x0000000000458bce in rx_ServerProc () at ../rx/rx_pthread.c:302
> #15 0x0000000000458629 in server_entry (argp=0x458af3) at
> ../rx/rx_pthread.c:100
>
>
Is this server running binaries built from pristine OpenAFS 1.4.2
source code, or were any patches applied? If patches were applied,
could you provide them if possible? I suppose there could be a race
between the worker servicing this request and the current network
receive thread...
--
Tom Keiser
tkeiser@gmail.com