[OpenAFS] Puzzler: lack of access to AFS files

Jeffrey Altman jaltman@secure-endpoints.com
Mon, 17 Dec 2007 14:28:08 -0500


Rodney M. Dyer wrote:
> At 10:46 AM 12/17/2007, Jeffrey Altman wrote:
>> While AFS on UNIX is limited by the performance of Rx/UDP, on Windows
>> we are actually limited by the CIFS/SMB implementation.  A native
>> redirector will be a big win here.
> 
> Am I wrong here in thinking that the code for CIFS/SMB access is already
> faster than network + raw disk read access, and that any more
> improvements are simply ram cache read related improvements?  I mean a
> 7200 rpm disk can only spin so fast, and these disks don't have 100 MB
> ram caches.  Once you've read out 8-16 MB of the file, your basically
> waiting on the disk.

The Windows Cache Manager cache is Page File backed.  If you have the
memory, it will be loaded in RAM.

But you are missing the point.  CIFS access is not faster than disk I/O.
 You are creating a CIFS request, queuing it for network transfer on a
loopback adapter, waiting for the network request to be read by the
network stack, waiting for the SMB server code to process it, waiting
for the request to be delivered to the AFS Client Service, waiting for
the AFS client service to process the request and generate the response,
waiting for the response to be queue, transfered, received, processed,
etc. before the next request can be sent because CIFS (prior to SMB 2.0
 in Vista) does not support chaining.

For each file operation CIFS sends anywhere from three to five requests.
 As a result there is significant overhead that increases the round trip
time and limits the overall throughput.

> On a related note, I've always wondered why the AFS file servers do not
> have "read cache" on the server side, so when one person requests a
> file, the second person sees the same file from the servers cache.  Or
> is that handled by the server OS cache (via the swap)?

I would hope your file server's operating system provides file caching.

>> Rodney's complaint about reliability was not about the file servers
>> but about the clients.
> 
> How do I know?  If I request a 100 MB file from the file server and get
> the following error on random machines at random times...
> 
>    From the Visual Studio 2005 install log:
> 
>      "Error 1335.The cabinet file
>       '_14314_VC80_PDB_WINSXS_ATLMFC_x64.msm'
>      required for this installation is corrupt and cannot be
>      used. This could indicate a network error, an error
>      reading from the CD-ROM, or a problem with this
>      package."
> 
> ... then how do I know where the problem lies?  This is similar to other
> application install errors that we get from time to time.

And when you copy the MSM or MSI to local disk it works or when you run
the install from a .readonly volume it works or from a volume in which
the client user only has 'rl' privileges.  If you try installing from a
read/write volume and the application can't obtain the lock it wants,
you get that error.

We don't have byte range locking on the file servers.  When we do, this
will get better.

> In the above instance, the AFS client/server environment can cause us
> grief as IT administrators in performing change-management of our PC
> client software.  To debug issues like this can take hours.  That's a
> lot of money and productivity down the tubes.

In the meantime, don't use read/write volumes for distribution of
.MSI/.MSM packages.

> I'm willing to put up with issues like this because of my
> "expectations", however new adopters to the AFS world are not.  This is
> why I'm sometimes reluctant to advocate AFS at times.

This is why I don't describe OpenAFS as a first class file system for
MacOS X or Windows.