[OpenAFS] Craziness with cache, Input/Output error, Linux

Derrick Brashear shadow@gmail.com
Mon, 21 Apr 2008 12:46:24 -0400


On Mon, Apr 21, 2008 at 12:40 PM, Jeff Blaine <jblaine@kickflop.net> wrote:
> Derrick et al,
>
>  ~:maverick> uname -a
>  Linux maverick 2.4.21-53.ELsmp #1 SMP Wed Nov 14 03:46:35 EST 2007 x86_64
> x86_64 x86_64 GNU/Linux
>  ~:maverick> strings /usr/vice/etc/afsd | grep OpenAFS
>  @(#) OpenAFS 1.4.6 built  2008-03-04
>  ~:maverick> pwd
>  /afs/rcf/user/jblaine
>  ~:maverick> tar xf /mtc/raid8/ictools/xilinx/download/10.1i/ise_SFD.tar
>  tar: ise/ise/idata/drop0369_iSE_K31wIP_4.zip.xz: Wrote only 4608 of 10240
> bytes
>  tar: ise/ise/idata/drop0378_iSE_K31wIP_4.zip.xz: Cannot write: No space
> left on device

The detail on this was that Kris Van Hees was looking into it, and he said:

In essence, it seems to be possible that as AFS is flushing data to the
server, and deletes the content of cache blocks, it tries to write new data
immediately after deleting old data, and that the delete has not yet been
committed to disk.  In that case, the data blocks taken up by the deleted
data are not released for reallocation yet, and thus while AFS thinks there
is enough room left in the cache, there actually isn't.
Note that if e.g. jbd-debug is enabled (in my case, at a level of 2 or more),
enough extra work is being done by the kernel that the kjournald processing
actually manages to do the commits before AFS demands the space, and thus it
won't trigger the ENOSPC condition.
How is that for a nice issue :)

[...]

Good news on that one is that Chas believes that just setting
S_SYNC on the inode before truncation is probably a good thing on *any*
filesystem on Linux, not just ext3, making the patch trivial.  Testing
scheduled for that (and potential impact, if any) later today.
"
Which was February 2005 timeframe.

The change we made at the time instead was:
(in osi_file.c on Linux)
+    filp->f_mode = FMODE_READ|FMODE_WRITE;

Obviously we need to revisit this. For the record I have never
produced it on my own test hardware.