[OpenAFS] fakestat-all weirdness

Jack Neely jjneely@pams.ncsu.edu
Mon, 17 Mar 2008 17:50:04 -0400


--vEao7xgI/oilGqZ+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Folks,

With my RHEL 5 deployment I've begun using the -fakestat-all flag which
I'd hoped would make things work better and faster producing less load
on our AFS servers.

However, we have discovered that if AFS client B altered files used by
AFS client A with the fakestat-all flag that AFS client A would exhibit
some weird behavior.  When we removed the fakestat-all option and
rebooted the normal behavior resumed.  We noticed this when folks would
update their web content and the web server would either not see the
updated files or the updated files would produce read errors.

We have also discovered that our Solaris 10 machines with OpenAFS 1.4.6
using the fakestat-all option does not exhibit the broken behavior.

To my understanding the fakestat option only affects mount points.
Anyone know what's going on here?  I've included the tests that we've
been using to reproduce.

Jack Neely
-- 
Jack Neely <jjneely@ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89

--vEao7xgI/oilGqZ+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="afstests.txt"


Login to my home directory: /afs/xxxxxxx/brabec
users.brabec                      538563526 RW     759295 K  On-line
    uni10f.unity.ncsu.edu /vicepa 
    RWrite  538563526 ROnly          0 Backup  538563561 
    MaxQuota    1500000 K 
    Creation    Wed Jan  8 15:33:44 2003
    Last Update Thu Mar 13 17:18:56 2008
    15582 accesses in the past day (i.e., vnode references)

    RWrite: 538563526     Backup: 538563561 
    number of sites -> 1
       server uni10f.unity.ncsu.edu partition /vicepa RW Site 

    SunOS uni10f 5.8 Generic_117350-04 sun4u sparc SUNW,Sun-Fire-280R
    @(#) OpenAFS 1.2.13 built  2004-11-03 


on these hosts:
    - mosa = my desktop rh9 machine with working AFS
        openafs-1.2.10-3.9.1
        openafs-client-1.2.10-3.9.1
        openafs-devel-1.2.10-3.9.1
        openafs-kernel-1.2.10-3.9.1
        openafs-kernel-source-1.2.10-3.9.1

    - web03rmw = a rhel5 server exhibiting the problem
        openafs-1.4.6-2.EL5
        openafs-client-1.4.6-2.EL5

    - also tried uni42ws, which is a rhel3 box without this problem
        openafs-1.2.11-20.EL
        openafs-client-1.2.11-20.EL
        openafs-kernel-1.2.11-20.EL



# creating a file works
mosa% vi afstest1
    # add some text content
mosa% ls -al afs*
-rw-r--r--    1 brabec   ncsu           31 Mar 13 16:48 afstest1
web03rmw% ls -al afs*
-rw-r--r-- 1 brabec ncsu 31 Mar 13 16:48 afstest1

# editing a file on mosa does not
mosa% vi afstest1
    # add more content
mosa% ls -al afs*
-rw-r--r--    1 brabec   ncsu           60 Mar 13 16:50 afstest1
web03rmw% ls -al afs*
-rw-r--r-- 1 brabec ncsu 31 Mar 13 16:48 afstest1
    # note the old timestamp and unchanged file size

# attempted to edit same file on web03rmw... this time I got
# a read error, and an empty file in vim. ls now shows the correct
# file size and subsequent reads work correctly.

# on other tries, I have gotten the cached copy of the file, and I could 
# make changes and overwrite the file in AFS, quietly losing the
# changes made on mosa.

# creating a file on rhel5 works
web03rmw% vi afstest2
    # add some content
web03rmw% ls -al afs*
-rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1
-rw-r--r-- 1 brabec ncsu 24 Mar 13 16:52 afstest2
mosa% ls -al afs*
-rw-r--r--    1 brabec   ncsu           60 Mar 13 16:50 afstest1
-rw-r--r--    1 brabec   ncsu           24 Mar 13 16:52 afstest2

# editing the file on rhel5 works
web03rmw% vi afstest2
    # add some more content
web03rmw% ls -al afs*
-rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1
-rw-r--r-- 1 brabec ncsu 36 Mar 13 16:55 afstest2
mosa% ls -al afs*
-rw-r--r--    1 brabec   ncsu           60 Mar 13 16:50 afstest1
-rw-r--r--    1 brabec   ncsu           36 Mar 13 16:55 afstest2

# edit the second file on mosa first, web03rmw second
mosa% vi afstest2
    # found 2 comments, added a third
web03rmw% vi afstest2
    # found only 2 comments, added a different third
web03rmw% ls -al afs*
-rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1
-rw-r--r-- 1 brabec ncsu 53 Mar 13 16:57 afstest2
mosa% ls -al afs*
-rw-r--r--    1 brabec   ncsu           60 Mar 13 16:50 afstest1
-rw-r--r--    1 brabec   ncsu           53 Mar 13 16:57 afstest2
# cat on both machines shows the same content... that added second by the 
# rhel5/rmw host. the changes made on mosa were lost.

# timing...
# made edits to afstest2 and afstest3 on mosa, the ls looks like this
-rw-r--r--    1 brabec   ncsu           60 Mar 13 16:50 afstest1
-rw-r--r--    1 brabec   ncsu           79 Mar 13 16:59 afstest2
-rw-r--r--    1 brabec   ncsu           31 Mar 13 17:01 afstest3
# and the second edit was made at precisely Thu Mar 13 17:01:27 EDT 2008

# on web03rmw...
Thu Mar 13 17:07:46 EDT 2008
-rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1
-rw-r--r-- 1 brabec ncsu 53 Mar 13 16:57 afstest2
-rw-r--r-- 1 brabec ncsu 12 Mar 13 17:01 afstest3
% cat afstest3
cat: afstest3: No such file or directory
Thu Mar 13 17:08:01 EDT 2008
-rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1
-rw-r--r-- 1 brabec ncsu 53 Mar 13 16:57 afstest2
-rw-r--r-- 1 brabec ncsu 31 Mar 13 17:01 afstest3

# fs flush doesn't help, touch seems to (at least sometimes)

mosa% vi afstest1
    # add another lines
mosa% stat afstest1
  File: `afstest1'
  Size: 73              Blocks: 2          IO Block: 4096   Regular File
Device: ah/10d  Inode: 1405489178  Links: 1    
Access: (0644/-rw-r--r--)  Uid: (27926/  brabec)   Gid: (  108/    ncsu)
Access: 2008-03-13 17:12:46.000000000 -0400
Modify: 2008-03-13 17:12:46.000000000 -0400
Change: 2008-03-13 17:12:46.000000000 -0400
web03rmw% stat afstest1
  File: `afstest1'
  Size: 60              Blocks: 2          IO Block: 4096   regular file
Device: 15h/21d Inode: 1405489186  Links: 1
Access: (0644/-rw-r--r--)  Uid: (27926/  brabec)   Gid: (  108/    ncsu)
Access: 2008-03-13 16:50:15.000000000 -0400
Modify: 2008-03-13 16:50:15.000000000 -0400
Change: 2008-03-13 16:50:15.000000000 -0400

# run 'fs flush .' on both machines... no changes

web03rmw% touch afstest1
web03rmw% stat afstest1
  File: `afstest1'
  Size: 73              Blocks: 2          IO Block: 4096   regular file
Device: 15h/21d Inode: 1405489178  Links: 1
Access: (0644/-rw-r--r--)  Uid: (27926/  brabec)   Gid: (  108/    ncsu)
Access: 2008-03-13 17:13:17.000000000 -0400
Modify: 2008-03-13 17:13:17.000000000 -0400
Change: 2008-03-13 17:13:17.000000000 -0400

# note the inode change, from the old value to the new one found on mosa


--vEao7xgI/oilGqZ+--