[OpenAFS] cache manager locked under heavy load?

Jason Edgecombe jason@rampaginggeek.com
Sat, 06 Feb 2010 15:54:19 -0500

Alena Manova wrote:
> Hello,
> we have Apache webservers (with pretty high traffic) reading the content from AFS. normally the system runs fine, but at certain point (probably related to I/O load) AFS stops responding and all system load massively rises - all of the apache processes stuck in state "sending reply". restarting apache recovers the state.
> the cmdebug at the time shows messages similar to:
> Lock afs_xvcache status: (writer_waitingupgrade_waiting, upgrade_locked(pid:18571 at:5), 1 read_locks(pid:16782), 954 waiters)
> Lock afs_xvcache status: (writer_waitingupgrade_waiting, upgrade_locked(pid:16639 at:5), 713 waiters)
> The cache manager has 1GB cache size (tried even more with no results). The afs fileservers are in that time fine and other clients can access it.
> does anyone have any advice how to sort out this issue please?
I'm not familiar with the error, but the fact that "vcache" is in the 
message suggests that you vnode/stat cache is not big enough.

Look at the options that you pass to afsd on startup, and consider 
increasing the stat/vcache number using the "-stat" option. The stat 
cache size is orthogonal to the data cache size.
 the default number of cached stat entries is 300, and that is too small 
for a web server.

For reference, I'm using "-stat 50000" on my web server with 150 
concurrent apache processes and it got 406,526 hits this past Thursday.

To properly measure if you have the right stat value, do the following:
1. run "xstat_cm_test web-server-hostname 2 -onceonly" where 
web-server-hostname is the hostname of your web server.
2. Look at the  "vcacheHits" and "vcacheMisses" fields. These show the 
cumulative stat cache hits and misses since the AFS client was started.
3. Compute (vcacheMisses/(vcacheMisses+Vcachehits)) and if that is more 
than 0.01 (1percent), then increase your stat cache value.

A quicker and easier metric is that if your hits aren't at least 100 
times your misses, then increase the stat size.

I hope this helps.