[OpenAFS] fileserver runs constantly; stops answering

emoy@apple.com emoy@apple.com
Fri, 7 Feb 2003 17:38:57 -0800


I've been pushing thousands of files to the file server (copying the  
Mac OS X boot drive to AFS).  All of a sudden, /afs goes away, and the  
fileserver process is in constant run:

   535  ??  R<     8:37.80 /usr/afs/bin/fileserver

bos can't stop it, so I have to kill -INT it.  This has happened  
several times.

Sampling the process, I get the following, though it complains that the  
stack is in an inconsistent state, so it truncates the trace:

Analysis of sampling pid 535 every 10 milliseconds
Call graph:
     300 Thread_0e03
       300 Create_Process_Part2
         300 rx_ListenerProc
           298 rxi_ListenerProc
             153 select
               153 select [STACK TOP]
             80 rxevent_RaiseEvents
               79 clock_UpdateTime
                 77 getitimer
                   77 getitimer [STACK TOP]
                 2 clock_UpdateTime [STACK TOP]
               1 rxevent_RaiseEvents [STACK TOP]
             56 rxi_ReadPacket
               49 recvmsg
                 49 recvmsg [STACK TOP]
               3 rxi_ReadPacket [STACK TOP]
               2 bzero
                 2 bzero [STACK TOP]
               1 __error
                 1 __error [STACK TOP]
               1 rxi_ReadPacket
                 1 cerror
                   1 cthread_set_errno_self
                     1 cthread_set_errno_self [STACK TOP]
             3 __eprintf
               3 __eprintf [STACK TOP]
             3 rxi_RestoreDataBufs
               3 rxi_RestoreDataBufs [STACK TOP]
             2 rxi_ListenerProc [STACK TOP]
             1 recvmsg
               1 recvmsg
                 1 recvmsg [STACK TOP]
           2 rxevent_RaiseEvents
             2 rxevent_RaiseEvents [STACK TOP]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
         select [STACK TOP]        153
         getitimer [STACK TOP]        77
         recvmsg [STACK TOP]        50

Anyone seen this and/or know what it is about?  The server is Mac OS X  
running OpenAFS 1.2.8, with various patches, though nothing in the lwp  
area, where the sample is indicating.
------------------------------------------------------------------------ 
--
Edward Moy
Apple Computer, Inc.
emoy@apple.com

(This message is from me as a reader of this list, and not a statement
from Apple.)