[OpenAFS-devel] locking fairness on Linux?

John S. Bucy bucy-openafs-devel@gloop.org
Thu, 5 May 2005 22:17:16 -0400


--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, May 05, 2005 at 05:55:19PM -0400, Chaskiel M Grundman wrote:
> --On Thursday, May 05, 2005 16:36:11 -0400 "John S. Bucy" 
> <bucy-openafs-devel@gloop.org> wrote:
> 
> >I don't think the struct vcache lock is the source of this behavior;
> >even if the tasks access different sets of files in the same
> >directory, I still see the same thing.
> >
> >I also notice that this extends to e.g. trying to run ls on the
> >directory that the test is running in, trying to chdir through it,
> >etc.  All such tasks are blocked in D until the busy one stops.
> 
> That would appear to be the i_sem on the directory inode, which is held 
> while linux's real_lookup calls the lookup op, and not anything that afs is 
> doing. the afs part of the lookup codepath should not be holding any 
> exclusive locks across rpc's other than the vcache rwlock on the object 
> being stat'd.

It is curious that real_lookup locks i_sem exclusively but I'm not
sure that this is it either: I still see the same behavior if I set
the working set size to fit within the dnlc (say, 250 files).  If I
understand that code correctly, it shouldn't find its way back into
afs_lookup unless it misses the dnlc.  

Furthermore, Linux semaphores are supposed to be fair; there was
some discussion of this on LKML awhile ago (~2.4.10).  If I could
always miss the dentry cache for a local filesystem (I think I can
adjust /proc/something to minimize it), I should be able to get the
same behavior from a large directory if i_sem is the culprit.

I've attached my test program.  Run it in a directory with files
numbered 0, 1, 2, etc. argv[1] is the number of files to loop over and
argv[2] is the number of times to run the outer loop.



john

--X1bOJ3K7DJ5YkBrT
Content-Type: text/x-csrc; charset=us-ascii
Content-Disposition: attachment; filename="statnr-rate.c"


#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>
#include <stdlib.h>
#include <sched.h>

void shuffle(int *a, int len) {
  int i, j, k;
  int tmp;

  srand(getpid());

  for(i = 0; i < len; i++) {
    a[i] = i;
  }

  for(i = 0; i < len *2; i++) {
    j = rand() % len;
    k = rand() % len;
    tmp = a[j];
    a[j] = a[k];
    a[k] = tmp;
  }

  //  for(i = 0; i < len; i++) {
  //    printf("%d\n", a[i]);
  //  }
}

int main(int argc, char **argv) {
  int i, j;
  int fd;
  int epoch;
  int bucket = 0;
  struct timeval tv0, tv1;
  double t0, t1, ttot = 0.0;
  int count = atoi(argv[1]);
  int reps = atoi(argv[2]);

  int *nums = calloc(count, sizeof(int));
  shuffle(nums, count);

  gettimeofday(&tv0, 0);
  epoch = tv0.tv_sec;

  for(j = 0; j < reps; j++) {
    for(i = 0; i < count; i++) {
      struct stat s;
      char name[128];
      snprintf(name, sizeof(name), "%d", nums[i]);

      gettimeofday(&tv0, 0);
      //      fd = open(name, O_RDWR);
      fd = stat(name, &s);
      gettimeofday(&tv1, 0);
      
      if(fd == -1) {
	perror("open");
	exit(1);
      }

      if(tv1.tv_sec > epoch) {
	printf("%d %d\n", epoch, bucket);
	epoch = tv1.tv_sec;
	bucket = 1;
      }
      else {
	bucket++;
      }

      t0 = tv0.tv_sec;
      t0 += tv0.tv_usec / 1000000.0;
      
      t1 = tv1.tv_sec;
      t1 += tv1.tv_usec / 1000000.0;
      ttot += (t1 - t0);
     

      //      sched_yield();
    }
    
  }
  printf("%d*%d in %f -> %f sec/1\n", reps, count, ttot, ttot/(count * reps));
}


--X1bOJ3K7DJ5YkBrT--