[OpenAFS-port-darwin] 10.4.11 troubles

Jonas Maebe jonas.maebe@elis.ugent.be
Mon, 10 Dec 2007 12:00:45 +0100


Hello,

Since upgrading to Mac OS X 10.4.11, I've already twice experienced  
the following problem: suddenly, both the kernel and mds go full  
blast (both using close to 100% cpu on my dual G5), with the system  
log being deluged by a barrage of the following message:

Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! drog events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_eve add_event: event queue is  
full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
Dec 10 11:31:38 bigmac kernel[0]: fs_events: add_event: event queue  
is full! dropping events.
[etc]

(as you can see, syslogd sometimes can't even keep up)

The first time this happened, it was while running a test program (on  
an AFS volume) which first opens and then closes as many files as  
possible (I believe with an upper limit of 100 files, but I'm not  
certain and currently cannot find it anymore). The second time, it  
was at the very end of an "svn update" (again on an AFS volume). It  
had apparently fully completed, as after the reboot "svn cleanup" did  
not mention anything that had to be cleaned up. Afaik, svn also  
closes a lot of locking files at the end of an update.

I cannot reproduce the problem reliably though.

Once the kernel and mds are in that cycle, there's no way to kill -9  
the triggering process, and killing mds doesn't help either.  
Rebooting doesn't work either (it hangs, presumably while trying to  
kill the hanging process), and a forced reboot is required.

One more thing: both cases was with a prerelease of OpenAFS 1.4.5  
(the cvs version in which all the panics had been fixed). I don't  
know if later on some more things were committed which could solve  
this problem, so now I've installed the official 1.4.5 release.

Is there anything I can do to further debug this problem?


Jonas