[OpenAFS] namei fileserver runs into "too many open files" with plenty of threads

Rainer Toebbicke rtb@pclella.cern.ch
Fri, 16 Nov 2007 12:19:00 +0100


This is a multi-part message in MIME format.
--------------000508070207030108040609
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The namei fileserver can run into "too many open files" since ih_open
assumes that it can always open a new file before deciding another 
file should be closed when exceeding a threshold. The EMFILE error is 
simply returned to the caller.

The "ad hoc" reserve of 64 free file descriptors can get depleted when 
more than 64 threads try to open a file simultaneously (the lock is 
dropped), just making the reserve bigger does not solve the problem as 
the number of threads grow.

The attached patch now recovers from EMFILE and adjusts the threshold 
as to not make recovery the norm. It does not alter the "reserve", 
although setting it to 0 now works fine as well.

Bcc'ed to openafs-bugs.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155

--------------000508070207030108040609
Content-Type: text/plain;
 name="p_namei_emfile"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="p_namei_emfile"

--- openafs/src/vol/ihandle.c.o144	2007-03-20 14:36:37.000000000 +0100
+++ openafs/src/vol/ihandle.c	2007-10-16 14:51:25.000000000 +0200
@@ -325,9 +325,10 @@
      */
     fdInUseCount += 1;
     IH_UNLOCK;
+ih_open_retry:
     fd = OS_IOPEN(ihP);
     IH_LOCK;
-    if (fd == INVALID_FD) {
+    if (fd == INVALID_FD && (errno != EMFILE || fdLruHead == NULL) ) {
 	fdInUseCount -= 1;
 	IH_UNLOCK;
 	return NULL;
@@ -337,13 +338,23 @@
      * we permit the number of open files to exceed fdCacheSize.
      * We only recycle open file descriptors when the number
      * of open files reaches the size of the cache */
-    if (fdInUseCount > fdCacheSize && fdLruHead != NULL) {
+    if ((fdInUseCount > fdCacheSize || fd == INVALID_FD)  && fdLruHead != NULL) {
 	fdP = fdLruHead;
 	assert(fdP->fd_status == FD_HANDLE_OPEN);
 	DLL_DELETE(fdP, fdLruHead, fdLruTail, fd_next, fd_prev);
 	DLL_DELETE(fdP, fdP->fd_ih->ih_fdhead, fdP->fd_ih->ih_fdtail,
 		   fd_ihnext, fd_ihprev);
 	closeFd = fdP->fd_fd;
+	if (fd == INVALID_FD) {
+	    fdCacheSize--;          /* reduce in order to not run into here too often */
+	    DLL_INSERT_TAIL(fdP, fdAvailHead, fdAvailTail, fd_next, fd_prev);
+	    fdP->fd_status = FD_HANDLE_AVAIL;
+	    fdP->fd_ih = NULL;
+	    fdP->fd_fd = INVALID_FD;
+	    IH_UNLOCK;
+	    OS_CLOSE(closeFd);
+	    goto ih_open_retry;
+	}
     } else {
 	if (fdAvailHead == NULL) {
 	    fdHandleAllocateChunk();

--------------000508070207030108040609--