[OpenAFS] OpenAFS future ramblings

Nickolai Zeldovich kolya@MIT.EDU
Thu, 12 Dec 2002 23:27:25 -0500


Derrick J Brashear <shadow@dementia.org> wrote:

> When we have some prayer of doing so. The same problem would affect an nfs
> mount of the local host from the local host, apparently; kolya can offer a
> better summary. We'd need to detect deadlock and unroll, I'm guessing, as
> the only way of avoidance.

Right; to support non-dedicated UFS logging cache partitions we need to
somehow detect deadlock in doing a UFS write.  I talked to some Sun guys
at Usenix OSDI earlier this week, and though they didn't know to fix our
problem, they seemed to agree that our problem stems from using UFS write
in VOP_GETPAGE.  They suggested that perhaps we can return data from our
afs_getpage right away, and write to the cache in the background.  This
seems like a good thing to do, except that it requires a considerable
change to the client architecture.

For the record, here's the problem:

  * Client has shared UFS-logging cache partition; for example, consider
    the case of a common / partition which is UFS-logging.

  * Client runs "cp /afs/some/file /tmp"

  * cp mmap()'s /afs/some/file, creates /tmp/file and issues a write()
    to write the mmap'ed file to /tmp/file.

  * This creates an async UFS transaction to write the mmap'ed data (that
    hasn't been page-faulted yet) to /tmp/file.

  * Something triggers a sync UFS transaction on the same filesystem.
    As a result, UFS tries to flush the above async transaction.

  * Trying to flush the async transaction calls afs_getpage to page-fault
    the mmap'ed file.

  * afs_getpage fetches the file over the network and tries to write it
    to the cache partition before returning it to the user.

  * afs_getpage's UFS write hangs because it's waiting for the sync UFS
    transaction to clear before starting another async transaction.

and at this point we have the deadlock.

-- kolya