[OpenAFS-devel] Are readdir() results in afs sorted?

Martin MOKREJŠ mmokrejs@ribosome.natur.cuni.cz
Fri, 29 Oct 2004 01:51:33 +0200

Hi Jeff,

Jeffrey Hutzelman wrote:
> On Friday, October 22, 2004 11:53:51 +0200 Martin MOKREJ© 
> <mmokrejs@ribosome.natur.cuni.cz> wrote:
>>   while thinking which underlying filesystem to pickup for afs server
>> partition, I came across ext2/3 email thread where is suggested to take
>> advantage of tune2fs -O dir_index /dev/xyz
>> using htrees, one needs first to do readdir(), then sort the
>> files by inode number before trying to open or stat them.
> I think there are several possible things you could be asking here, and 
> it's not clear which it is, so I'll try to answer them all.  It is 
> important to remember through this discussion that the structure of the 
> contents of AFS is not reflected in the structure of the data stored on 
> vice partitions.  AFS internally tracks files by a file ID (FID), which 
> consists of a volume ID, vnode number, and uniqifier.  The cache manager 
> refers to files and directories by their FID's, and the data stored on a 
> vice partition is arranged to make it easy for the fileserver to find a 
> file by its FID.
> (1) Will using htree directories on a vice partition help to
>    improve the performance of the fileserver?
> Probably, though to what extent is unclear.  The fileserver arranges 
> data on vice partitions in a tree-like structure designed to limit the 
> number of files in any one directory, to keep the lookup times down 
> (because most traditional filesystems store directories as a list of 
> entries, lookups and insertions require a linear search and can be quite 
> slow on directories with many entries).  Using htrees may or may not 
> have a noticeable effect on the size directories used by the fileserver.

Well, in those referred threads was mentioned sometime the htree index
has helped but sometimes was reported not to help.

Thank you for great explanation. So I won't care about directory indexes
in case of AFS. I did some benchmark using bonnie++, maybe should have used
more testfiles for some tests ... anyway, the results are here:

I'm not sure what type of performanmce should one prefer. On one hand,
there're mostly read on /vicepX partitions, so we should prefer
fast read IO. But should it be in respect of sequential or random?

On the other hand, we have client caches, so maybe someone would propose
to tune for write perfomance, as reading is solved mostly bych caches ...
But I don't believe in this.

I think more importantly, fileserver shouldn't be overloaded, and the filesystem
shouldn't require too much CPU. I that case, I'd go for xfs. ;)

I think those fast test of random file create and random file delete operations
in reiserfs aren't so important for AFS, as we delete data rarely, right. ;)
Reiserfs3 takes a lot of CPU in some test, where other filesystem don't need much.

"mke2fs -T largefile or -T largefile4" enforces bad performance of
"Sequential create /Create" test in bonnie++(1). It seems the best are
small blocks, as "-T news" wins here. It might be related to the raid5
nature with 128kB block size.

Simply, I've no idea if I should prefer better sequential or random performance.
Any clues?