[OpenAFS-devel] locking fairness on Linux?
John S. Bucy
bucy-openafs-devel@gloop.org
Thu, 5 May 2005 22:17:16 -0400
--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Thu, May 05, 2005 at 05:55:19PM -0400, Chaskiel M Grundman wrote:
> --On Thursday, May 05, 2005 16:36:11 -0400 "John S. Bucy"
> <bucy-openafs-devel@gloop.org> wrote:
>
> >I don't think the struct vcache lock is the source of this behavior;
> >even if the tasks access different sets of files in the same
> >directory, I still see the same thing.
> >
> >I also notice that this extends to e.g. trying to run ls on the
> >directory that the test is running in, trying to chdir through it,
> >etc. All such tasks are blocked in D until the busy one stops.
>
> That would appear to be the i_sem on the directory inode, which is held
> while linux's real_lookup calls the lookup op, and not anything that afs is
> doing. the afs part of the lookup codepath should not be holding any
> exclusive locks across rpc's other than the vcache rwlock on the object
> being stat'd.
It is curious that real_lookup locks i_sem exclusively but I'm not
sure that this is it either: I still see the same behavior if I set
the working set size to fit within the dnlc (say, 250 files). If I
understand that code correctly, it shouldn't find its way back into
afs_lookup unless it misses the dnlc.
Furthermore, Linux semaphores are supposed to be fair; there was
some discussion of this on LKML awhile ago (~2.4.10). If I could
always miss the dentry cache for a local filesystem (I think I can
adjust /proc/something to minimize it), I should be able to get the
same behavior from a large directory if i_sem is the culprit.
I've attached my test program. Run it in a directory with files
numbered 0, 1, 2, etc. argv[1] is the number of files to loop over and
argv[2] is the number of times to run the outer loop.
john
--X1bOJ3K7DJ5YkBrT
Content-Type: text/x-csrc; charset=us-ascii
Content-Disposition: attachment; filename="statnr-rate.c"
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>
#include <stdlib.h>
#include <sched.h>
void shuffle(int *a, int len) {
int i, j, k;
int tmp;
srand(getpid());
for(i = 0; i < len; i++) {
a[i] = i;
}
for(i = 0; i < len *2; i++) {
j = rand() % len;
k = rand() % len;
tmp = a[j];
a[j] = a[k];
a[k] = tmp;
}
// for(i = 0; i < len; i++) {
// printf("%d\n", a[i]);
// }
}
int main(int argc, char **argv) {
int i, j;
int fd;
int epoch;
int bucket = 0;
struct timeval tv0, tv1;
double t0, t1, ttot = 0.0;
int count = atoi(argv[1]);
int reps = atoi(argv[2]);
int *nums = calloc(count, sizeof(int));
shuffle(nums, count);
gettimeofday(&tv0, 0);
epoch = tv0.tv_sec;
for(j = 0; j < reps; j++) {
for(i = 0; i < count; i++) {
struct stat s;
char name[128];
snprintf(name, sizeof(name), "%d", nums[i]);
gettimeofday(&tv0, 0);
// fd = open(name, O_RDWR);
fd = stat(name, &s);
gettimeofday(&tv1, 0);
if(fd == -1) {
perror("open");
exit(1);
}
if(tv1.tv_sec > epoch) {
printf("%d %d\n", epoch, bucket);
epoch = tv1.tv_sec;
bucket = 1;
}
else {
bucket++;
}
t0 = tv0.tv_sec;
t0 += tv0.tv_usec / 1000000.0;
t1 = tv1.tv_sec;
t1 += tv1.tv_usec / 1000000.0;
ttot += (t1 - t0);
// sched_yield();
}
}
printf("%d*%d in %f -> %f sec/1\n", reps, count, ttot, ttot/(count * reps));
}
--X1bOJ3K7DJ5YkBrT--