[OpenAFS-devel] Re: Problems on 8-way Itanium2 system

Martin MOKREJŠ mmokrejs@ribosome.natur.cuni.cz
Wed, 02 Mar 2005 23:12:51 +0100


Kris Van Hees wrote:
> On Wed, Mar 02, 2005 at 12:30:59PM -0800, Alf Wachsmann wrote:
> 
>>On Fri, 28 Jan 2005, Alf Wachsmann wrote:
>>
>>>we have problems with the stability of AFS on our 72 CPU SGI Altix
>>>system running a 2.4.21-sgi302r24 kernel with OpenAFS-1.2.10.
>>
>>Someone (I think Chas Williams in an email to me) suggested to also try
>>using ext3 as filesystem for the AFS cache (we were using ext2).
>>
>>Last weekend, I had the opportunity to test this and it turns out that
>>OpenAFS 1.2.13 works just fine on the full machine with an ext3 cache!
> 
> 
> Pending a patch I am still working on, you *may* run into an issue with an
> ext3 cache.  Basically, if you are writing a file to an AFS volume, and that
> file is larger than the cache, and the cache is its own ext3 partition, it
> is possible that a race condition pops up between the journal commit process
> and the AFS client.  The journal commit process can at times be too slow in
> writing truncate commit records to disk, which causes cache blocks to be
> considered available by the AFS client while the ext3 fs will not be able to
> allocate those blocks yet.
> 
> Result: partition full error from the fs layer, and thus the write aborts.
> 
> Patch is being worked on (actually, I have a patch - I just need to test it
> more to see what (if any) the performance impact is).

Do you think this could be the case for bug #17740, while on ext2?

Martin