[OpenAFS-devel] Re: Problems on 8-way Itanium2 system
Kris Van Hees
aedil-afs@alchar.org
Wed, 2 Mar 2005 15:46:37 -0500
On Wed, Mar 02, 2005 at 12:30:59PM -0800, Alf Wachsmann wrote:
> On Fri, 28 Jan 2005, Alf Wachsmann wrote:
> > we have problems with the stability of AFS on our 72 CPU SGI Altix
> > system running a 2.4.21-sgi302r24 kernel with OpenAFS-1.2.10.
>
> Someone (I think Chas Williams in an email to me) suggested to also try
> using ext3 as filesystem for the AFS cache (we were using ext2).
>
> Last weekend, I had the opportunity to test this and it turns out that
> OpenAFS 1.2.13 works just fine on the full machine with an ext3 cache!
Pending a patch I am still working on, you *may* run into an issue with an
ext3 cache. Basically, if you are writing a file to an AFS volume, and that
file is larger than the cache, and the cache is its own ext3 partition, it
is possible that a race condition pops up between the journal commit process
and the AFS client. The journal commit process can at times be too slow in
writing truncate commit records to disk, which causes cache blocks to be
considered available by the AFS client while the ext3 fs will not be able to
allocate those blocks yet.
Result: partition full error from the fs layer, and thus the write aborts.
Patch is being worked on (actually, I have a patch - I just need to test it
more to see what (if any) the performance impact is).
Kris