[OpenAFS] Re: Afs User volume servers in VM's

Jeffrey Altman jaltman@your-file-system.com
Wed, 26 Oct 2011 13:43:03 -0400

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Instead of discussing infinite possibilities it would be useful to be
able to discuss particular use cases.  As much as I dislike Andrei
Maslennikov's CMS and ATLAS results, the biggest struggle those in the
OpenAFS community encounter when trying to address them is finding out
what the parameters of the CMS and ATLAS jobs are.  Earlier this month
at DESY we finally found out enough of the details to understand some of
the issues.

What are the batch jobs that are bringing a file server to a halt?  We
know that they all take place within a single volume but there is a lot
we don't know:

=2E how are the file servers configured?

=2E how many client machines are issuing requests against the volume?

=2E how many client processes?

=2E how many client PAGs are involved since that affects the maximum
number of outstanding RPCs in parallel?

=2E how are the client machines configured?

=2E are the jobs more like CMS (large sequential reads) or like ATLAS
(small random seeks with very large files) or something different?

=2E are the jobs read only or read/write?

=2E if read/write, are the jobs creating and removing large numbers of
files in a common set of directories?

=2E if read/write, are the jobs competing for access to a common set of f=

=2E does the data once read or written get used again on the same client?=

Separate and apart from the bottlenecks in the file server, it is really
important to understand the requirements of the job that is executing
and how both the configuration of the client cache manager and the file
server will impact it.  It is also critical to understand how the AFS
cache coherency model is going to impact these jobs.

AFS is a caching file system.  It therefore must ensure that cache
coherency is maintained.  The most critical aspect of this is data
visibility.  A file system is an inter-process communication path that
can be used in parallel with other inter-process communication
mechanisms.  It is critical that a client that performs a file creation
or a data store not be told that the operation has completed until such
time as all of the other clients accessing that directory or file are
told that their cached data is invalid.  Otherwise, a message
transmitted from process A that performed the data changing RPC could be
received by process B before the cache invalidation message was
received.  This race would cause process B to read the wrong data.

In exactly the same way that CMS and ATLAS jobs tune their datasets to
optimize for certain conditions it is appropriate to optimize the
datasets stored in AFS to be aware of the cache coherency requirements.

I fear that the problem that is being faced here is primarily due to the
enforcement of cache coherency.  In particular, are these jobs designed
in such a manner that there are more client accessing and modifying a
given directory or subset of directories than there are threads in the
file server?

Is the file server grinding to a halt because each store operation
requires that every client taking part in the job be notified of the
change before the next one can begin to execute?

More information about the jobs would really be helpful.

Jeffrey Altman

Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

Version: GnuPG v1.4.9 (MingW32)