[OpenAFS] fileserver runs constantly; stops answering

emoy@apple.com emoy@apple.com
Fri, 7 Feb 2003 18:47:28 -0800


/usr/bin/sample

# sample 535 3 10

samples process 535 every 10 milliseconds for 3 seconds (300 samples  
total).
------------------------------------------------------------------------ 
--
Edward Moy
Apple Computer, Inc.
emoy@apple.com

(This message is from me as a reader of this list, and not a statement
from Apple.)

On Friday, February 7, 2003, at 06:07  PM, Brent Johnson wrote:

> How did you create the call stack?
>
> emoy@apple.com wrote:
>
>> I've been pushing thousands of files to the file server (copying the   
>> Mac OS X boot drive to AFS).  All of a sudden, /afs goes away, and  
>> the  fileserver process is in constant run:
>>
>>   535  ??  R<     8:37.80 /usr/afs/bin/fileserver
>>
>> bos can't stop it, so I have to kill -INT it.  This has happened   
>> several times.
>>
>> Sampling the process, I get the following, though it complains that  
>> the  stack is in an inconsistent state, so it truncates the trace:
>>
>> Analysis of sampling pid 535 every 10 milliseconds
>> Call graph:
>>     300 Thread_0e03
>>       300 Create_Process_Part2
>>         300 rx_ListenerProc
>>           298 rxi_ListenerProc
>>             153 select
>>               153 select [STACK TOP]
>>             80 rxevent_RaiseEvents
>>               79 clock_UpdateTime
>>                 77 getitimer
>>                   77 getitimer [STACK TOP]
>>                 2 clock_UpdateTime [STACK TOP]
>>               1 rxevent_RaiseEvents [STACK TOP]
>>             56 rxi_ReadPacket
>>               49 recvmsg
>>                 49 recvmsg [STACK TOP]
>>               3 rxi_ReadPacket [STACK TOP]
>>               2 bzero
>>                 2 bzero [STACK TOP]
>>               1 __error
>>                 1 __error [STACK TOP]
>>               1 rxi_ReadPacket
>>                 1 cerror
>>                   1 cthread_set_errno_self
>>                     1 cthread_set_errno_self [STACK TOP]
>>             3 __eprintf
>>               3 __eprintf [STACK TOP]
>>             3 rxi_RestoreDataBufs
>>               3 rxi_RestoreDataBufs [STACK TOP]
>>             2 rxi_ListenerProc [STACK TOP]
>>             1 recvmsg
>>               1 recvmsg
>>                 1 recvmsg [STACK TOP]
>>           2 rxevent_RaiseEvents
>>             2 rxevent_RaiseEvents [STACK TOP]
>>
>> Total number in stack (recursive counted multiple, when >=5):
>>
>> Sort by top of stack, same collapsed (when >= 5):
>>         select [STACK TOP]        153
>>         getitimer [STACK TOP]        77
>>         recvmsg [STACK TOP]        50
>>
>> Anyone seen this and/or know what it is about?  The server is Mac OS  
>> X  running OpenAFS 1.2.8, with various patches, though nothing in the  
>> lwp  area, where the sample is indicating.