[OpenAFS-devel] OpenAFS, Linux and truncate_inode_pages()

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Tue, 28 Feb 2006 10:14:10 -0500


In message <440410F9.1030704@pclella.cern.ch>,Rainer Toebbicke writes:
>Actually, I ran into this when told that running a 'find 
>/usr/vice/cache ...' has been suspected to hang AFS. Luckily 
>osi_UFSTruncate is about the only place where the i_sem is 
>downed/upped correctly, so that wasn't it. But it illustrates that 
>certain conventions should be taken seriously.

maybe.  there is some doubt in my mind about the order of i_alloc_sem,
i_sem and the BKL.  but it seems to be working so its best not to
change it unless it solves a problem.

>Oh? Easy! You need a farm of about 10 clients on Gb Ethernet, 4-5 
>small servers on Gb as well with decently performing RAIDs and plenty 
>of time:
>
>set up a directory /afs/.../$hostname for each client, about 10 2GB 
>volumes per client mounted at /afs/.../$hostname/[0-9], and then run
>/afs/cern.ch/user/r/rtb/common/bin/disk_stress -rN500 \
>	/afs/.../$hostname/?
>on each client and wait. The problem usually manifests itself after a 
>few days, sometimes 1-2 weeks. Survival after > 3 weeks on all clients 
>is exceptional.

not a problem.  we are not a small shop.  btw, you said this test
"fails" but didnt indicate what the failure mode is.

>Anyway, I'm knocking at various portions of the client code and listen 
>if it sounds hollow. The setup described seems to look artificial but 
>we've got enough traffic and reported oddities to suspect that it is 
>also triggered by normal use. At what frequency - no idea.

have any changes to i_sem proven useful in fixing this problem?
btw, addressing some of your other concerns in the previous message:

>One of the prominent occasions where this looks particular careless is 
>in afs_linux_read() (osi_FlushPages) prior to calling
>generic_file_read(). With a printf() in osi_VM_FlushPages and a little 
>mickey-mousing you can show that at least through this code path 
>truncate_inode_pages() is called without the i_sem lock. My local 

it should be safe to add an i_sem around the truncate_inodes_pages()
in osi_VM_FlushPages().  there is only one path to osi_VM_FlushPages()
via osi_FlushPages().

>... Similar suspects like osi_VM_Truncate and 
>osi_VM_FlushVCache have the same problem - a fast growing tree to 
>trace back.

osi_VM_FlushVCache() is called as part of recycling an inode for
use.  so its typically called in the afs_lookup/afs_create/afs_mkdir
code paths.  it also happens to get called during inode and dentry
revalidation.  the parent dir's i_sem is held during this operations,
so the new inode's i_sem can probably be taken safely.

osi_VM_Truncate() typically happens when an inode's size changes.
again, very likely safe to just take i_sem here.

it would be pretty easy to just down_trylock() to catch any paths
that might have a double lock and then fix these broken code paths.