[OpenAFS-devel] [grand.central.org #87977] MAX_FILESERVER_THREAD vs FD_HANDLE_SETASIDE

Dan Hyde Dan Hyde <drh@umich.edu>
Fri, 29 Feb 2008 11:06:09 -0500


I submitted a bug report last night, but wanted to go into more
details, and also work out possible solutions.

Back in '99, Transarc introduced to the fileserver a least recently
used (LRU) cache of file descriptors.  The idea was that when it came
time to close a file, it didn't actually get closed but instead got put
on a lru cache (perhaps removing and closing the oldest if the cache
was full).  When opening a file the cache was searched, and if found
was removed and used.  The size of this cache was chosen to not exceed
the operating system maximum allowed number of open files, AND leave
room to allow each running thread to open one more file, each.  Both
FD_HANDLE_SETASIDE and MAX_FILESERVER_THREAD were 64.

Unfortunately, AFS 3.5 changed MAX_FILESERVER_THREAD to 128 and didn't
change FD_HANDLE_SETASIDE to match.  This means the cache size no
longer allows each thread to open an additional file (up to 64 of them
will be at risk, depending on how many threads you are using).  If
fileserver load gets high enough, an open can fail, and cause an
Abort("VPutVnode: can't open index file!\n").  We've had this happen.

I'd like to discuss possible solutions.  One solution is to make sure
people know the problem so the problem doesn't occur again, but that
might not be the best solution.  Below is what I submitted.

Comments?

========
diff -u vol/ihandle.h vol/ihandle.h
--- vol/ihandle.h	Thu Dec 20 10:50:41 2007
+++ vol/ihandle.h	Thu Feb 28 16:37:17 2008
@@ -193,7 +193,8 @@
 #define STREAM_HANDLE_MALLOCSIZE 1
 
 /* Number of file descriptors needed for non-cached I/O */
-#define FD_HANDLE_SETASIDE	64
+/* This number is related to MAX_FILESERVER_THREAD in viced/viced.h */
+#define FD_HANDLE_SETASIDE	128
 
 /* Don't try to have more than 256 files open at once if you are planning
  * to use fopen or fdopen. The FILE structure has an eight bit field for