[OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Felix Frank Felix.Frank@Desy.de
Wed, 15 Apr 2009 11:44:50 +0200 (CEST)


On a hunch, I applied this to 1.4.8:

--- src/afs/LINUX/osi_vm.c.orig 2009-04-15 11:37:49.000000000 +0200
+++ src/afs/LINUX/osi_vm.c      2009-04-15 11:38:56.000000000 +0200
@@ -102,11 +102,6 @@ osi_VM_StoreAllSegments(struct vcache *a
  {
      struct inode *ip = AFSTOV(avc);

-    if (!avc->states & CPageWrite)
-       avc->states |= CPageWrite;
-    else 
-       return; /* someone already writing */
-
  #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,4,5)
      /* filemap_fdatasync() only exported in 2.4.5 and above */
      ReleaseWriteLock(&avc->lock);
@@ -120,7 +115,6 @@ osi_VM_StoreAllSegments(struct vcache *a
      AFS_GLOCK();
      ObtainWriteLock(&avc->lock, 121);
  #endif
-    avc->states &= ~CPageWrite;
  }

  /* Purge VM for a file when its callback is revoked.


This apparently solved the problem for 1.4.8 w/ disk cache. Will try 1.4.10
as well. BCC'ing openafs-bugs now.

On Wed, 15 Apr 2009, Felix Frank wrote:

> Dear all,
>
> i managed to reproduce a cache inconsistency among two amd_rhel50 nodes 
> running kernel 2.6.18-128.1.6.el5, using the short program 
> /afs/ifh.de/user/f/ffrank/public/afs/misbehave.c.
> The problem arises through changing a mmap'ed file after closing it.
>
> Cache problems are evident with client version 1.4.10 using disk cache.
> Related tests suggest that memory cache is afflicted as well, and that the 
> same holds true for 1.4.8.
> For 1.4.7, only memory cache appears to suffer this problem.
>
> What's more, this is not merely an inconsistency among different machines, 
> but restarting the cache manager on the local host will make the file appear 
> as in an older state as well, despite the data version
> (as reported by cmdebug -long) being alright.
>
> Some time after the offending access, the kernel module issues a
> WARNING: afs_ufswr vcp=... exOrW=0
>
> I suspect that the problem has been existence on Linux longer (some code 
> comments hint at tricky Linux behaviour), but has not applied to disk cache 
> before 1.4.8. I will start digging through the code in that direction and 
> post something hopefully more definite to openafs-bugs soon.
>
> Are there any further suggestions in the meantime?
>
> Sincerely
> Felix
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>