[OpenAFS-devel] Re: 1.2.10 linux kernel hang during AFS backups

Joe Buehler jbuehler@hekimian.com
Fri, 10 Sep 2004 08:17:05 -0400


OK, I recompiled everything for debugging and I have a good crash dump.
I am soliciting some expert help at this point.

The trivial program attached below will reliably hang my dual-processor
RedHat 9 machine running 1.2.10.  Make a 1 GB file in AFS then run
the program passing the name of the file as the only argument.  All
the program does is mmap() the file and then write it with memset().

Looking at the state of the various tasks, it looks like kswapd is
deadlocked on an afs_lock structure.  Here is the stack for kswapd:

crash> bt 7
PID: 7      TASK: c252c000  CPU: 1   COMMAND: "kswapd"
  #0 [c252dd8c] schedule at c011cdd5
  #1 [c252ddac] afs_osi_Sleep at e0973bdf
  #2 [c252ddfc] Afs_Lock_Obtain at e092f387
  #3 [c252de3c] afs_UFSWrite at e094d92d
  #4 [c252df04] flush_tlb_others at c0117c11
  #5 [c252df1c] flush_tlb_page at c0117d54
  #6 [c252df44] try_to_unmap_one at c014b0c4
  #7 [c252df70] try_to_free_buffers at c0151c2a
  #8 [c252df8c] launder_page at c01420ed
  #9 [c252dff0] kernel_thread_helper at c0107603
crash>

Looking at the lock that frame 2 is trying to get, I see this:

crash> print *(struct afs_lock *) 0xe0a64ac8
$7 = {
   wait_states = 2 '\002',
   excl_locked = 4 '\004',
   readers_reading = 0,
   num_waiting = 1,
   spare = 0,
   time_waiting = {
     tv_sec = 2,
     tv_usec = 50134
   },
   pid_last_reader = 0,
   pid_writer = 760,
   src_indicator = 66
}
crash>

If I read this correctly, this is a "shared" lock currently owned
by pid 760.  Looking at that pid I see:

crash> bt 760
PID: 760    TASK: da4ee000  CPU: 0   COMMAND: "afs_background"
  #0 [da4efc98] schedule at c011cdd5
  #1 [da4efcb8] wakeup_kswapd at c0143e33
  #2 [da4efce8] __alloc_pages at c014593b
  #3 [da4efd20] generic_file_write at c013b426
  #4 [da4efda0] osi_rdwr at e0972dc7
  #5 [da4efdd0] afs_osi_Write at e0972985
  #6 [da4efe10] afs_UFSCacheFetchProc at e092961d
  #7 [da4efe50] afs_GetDCache at e092a7b0
  #8 [da4eff90] afs_BackgroundDaemon at e09276a5
  #9 [da4efff0] kernel_thread_helper at c0107603
crash>

Which would appear to be a deadlock situation -- it is waiting
for kswapd to do its thing.

The src_indicator says that the lock was obtained in afs_GetDCache
at line 1783 in this bit of code:

	if (hcmp(avc->flushDV, avc->m.DataVersion) < 0) {
	    /*
	     * By here, the cache entry is always write-locked.  We can
	     * deadlock if we call osi_Flush with the cache entry locked...
	     */
	    ReleaseWriteLock(&avc->lock);
	    osi_FlushText(avc);
	    /*
	     * Call osi_FlushPages in open, read/write, and map, since it
	     * is too hard here to figure out if we should lock the
	     * pvnLock.
	     */
	    ObtainWriteLock(&avc->lock,66);
	}

This is converted to a shared lock later in the same routine shortly before
afs_CacheFetchProc is called.

If I read it all correctly, the AFS client appears to be getting rx data and putting
it into the AFS cache.  During this process it needs memory.  There isn't enough,
so kswapd gets woken up to flush some pages.  kswapd unfortunately deadlocks
on a lock held by the AFS client.

What is being locked I don't know -- is kswapd trying to swap out a page
that the client is trying to read in?  Help!
-- 
Joe Buehler

// program to cause a machine hang

#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <memory.h>

int
main(int argc, char *argv[])
{
	int fd = open(argv[1] ? argv[1] : "temp.test", O_RDWR, 0666);
	void *p = mmap(0, 1024 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	if ((int)p == -1) {
		perror("mmap failed");
		return(1);
	}
	memset(p, 0, 1024 * 1024 * 1024);
//	for (int i=0; i<1024 * 1024 * 1024; ++i) {
//		*((char *)p + i) = 1;
//	}
	munmap(p, 1024 * 1024 * 1024);
	close(fd);
	return(0);
}