[OpenAFS-devel] series of 1.4.2 fileserver crashes in rxi_FreeDataBufsTSFPQ (rx_packet.c)

Tom Keiser tkeiser@gmail.com
Tue, 5 Dec 2006 06:44:35 -0500


On 12/5/06, Rainer Toebbicke <rtb@pclella.cern.ch> wrote:
> Jeffrey Altman wrote:
>
> >
> > Do you have any file server logs you can share?
> >
> > I'm interested in the interactions with the 3.4 clients.
> >
> > Did you save the tracebacks?  Can you put them somewhere we can see them?
> >
> > Jeffrey Altman
> >
> >
>
> Just the last one:
>
> Tue Dec  5 01:41:28 2006 CheckHost: Probe failed for host
> 137.78.30.25:7001, code -01
> Tue Dec  5 01:42:24 2006 CheckHost: Probing all interfaces of host
> 130.199.48.51:7001 failed, code -01
> Tue Dec  5 01:44:24 2006 CB: WhoAreYou failed for 128.141.2.30:7001,
> error -01
> Tue Dec  5 01:45:11 2006 fssync: volume 537561738 restored; breaking
> all call backs
> Fatal Rx error: rx packet already free
>
>
> traceback:
> #1  0x0000003fc732fa1e in abort () from /lib64/tls/libc.so.6
> #2  0x0000000000470da0 in osi_Panic (msg=Could not find the frame base
> for "osi_Panic".
> ) at ./../rx/rx_user.c:222
> #3  0x0000000000486323 in rxi_FreeDataBufsTSFPQ (p=0x88f460, first=1,
> flush_global=0) at ./../rx/rx_packet.c:873
> #4  0x0000000000484783 in rxi_FreePackets (num_pkts=1, q=0x4820ba70)
> at ./../rx/rx_packet.c:397
> #5  0x000000000048a2af in rxi_PrepareSendPacket (call=0x2a98d618a0,
> p=0x2a97c25790, last=0)
>      at ./../rx/rx_packet.c:2618
> #6  0x000000000048c622 in rxi_WritevProc (call=0x2a98d618a0,
> iov=0x4820bb90, nio=3, nbytes=68)
>      at ./../rx/rx_rdwr.c:1086
> #7  0x000000000048cab7 in rx_WritevProc (call=0x2a98d618a0,
> iov=0x4820bb90, nio=3, nbytes=1540)
>      at ./../rx/rx_rdwr.c:1178
> #8  0x000000000041a70d in FetchData_RXStyle (volptr=0x9a9940,
> targetptr=0x2a989c0de0, Call=0x2a98d618a0,
>      Pos=589824, Len=1540, Int64Mode=0, a_bytesToFetchP=0x4820bd68,
> a_bytesFetchedP=0x4820bd60)
>      at ../viced/afsfileprocs.c:7230
> #9  0x000000000040c794 in common_FetchData64 (acall=0x2a98d618a0,
> Fid=0x4820c0a0, Pos=589824, Len=65536,
>      OutStatus=0x4820c030, CallBack=0x4820c020, Sync=0x4820c000,
> type=0) at ../viced/afsfileprocs.c:2444
> #10 0x000000000040d171 in SRXAFS_FetchData (acall=0x2a98d618a0,
> Fid=0x4820c0a0, Pos=589824, Len=65536,
>      OutStatus=0x4820c030, CallBack=0x4820c020, Sync=0x4820c000) at
> ../viced/afsfileprocs.c:2571
> #11 0x000000000044e8de in _RXAFS_FetchData (z_call=0x2a98d618a0,
> z_xdrs=0x4820c110) at ../fsint/afsint.ss.c:69
> #12 0x00000000004552c9 in RXAFS_ExecuteRequest (z_call=0x2a98d618a0)
> at ../fsint/afsint.ss.c:1872
> #13 0x0000000000473ac3 in rxi_ServerProc (threadID=62, newcall=0x0,
> socketp=0x4820c1ac) at ./../rx/rx.c:1413
> #14 0x0000000000458bce in rx_ServerProc () at ../rx/rx_pthread.c:302
> #15 0x0000000000458629 in server_entry (argp=0x458af3) at
> ../rx/rx_pthread.c:100
>
>

Is this server running binaries built from pristine OpenAFS 1.4.2
source code, or were any patches applied?  If patches were applied,
could you provide them if possible?  I suppose there could be a race
between the worker servicing this request and the current network
receive thread...

-- 
Tom Keiser
tkeiser@gmail.com