[AFS3-std] byte-range locking draft (split from delegation, updated)

Matt Benjamin matt@linuxbox.com
Fri, 23 Jan 2009 10:03:59 -0500


This is a multi-part message in MIME format.
--------------020109030903030104050204
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi List,

Based on recent feedback, I'm proposing to split byte-range locking from
delegation

(Delegation should return here in future, as an updated extension draft.)

Other changes include:

* 64-bit Uniq attribute
* Opaque Txid and Token attributes added for future use
* Discussion of expected assert-extend lock interactions, periodically
and across server restarts
* Improved language in a number of areas (we hope)

Regards,


Matt

- --

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJedxfJiSUUSaRdSURCJK+AJ9dIwhEp1ChWN61v5k0EGazwsW4SwCdH0Xa
d5YrsHPvkA09wbRQ2Z7Xhrw=
=G8Nz
-----END PGP SIGNATURE-----

--------------020109030903030104050204
Content-Type: text/plain;
 name="locking_d11.txt"
Content-Disposition: inline;
 filename="locking_d11.txt"
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by aa.linuxbox.com id
	n0NF4TCL006195

AFS Byte-Range Locking

Matt Benjamin <matt@linuxbox.com>

01/22/2009

Status of this Memo

This document specifies a standards track protocol extension for=20
the OpenAFS community, and requests discussion and suggestions=20
for improvements.

Key Words

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL=20
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and=20
"OPTIONAL" in this document are to be interpreted as described in=20
Internet Engineering Task Force RFC 2119.

Abstract

The AFS-3 protocol supports file locks, but only on whole files,=20
only in advisory mode, and using an inefficient protocol.=20
Efficient support for byte-range file locking, together with the=20
stronger semantics with which they are associated, are required=20
to improve the suitability of AFS as a LAN file-sharing protocol=20
for both Unix and Windows clients. Applications on the Windows=20
platform, in particular (e.g., Microsoft Office), actually=20
require byte-range locking to function correctly. Emulation in=20
the client has alleviated most serious problems, albeit, with=20
reduced semantics. We propose protocol enhancements facilitating=20
server-coordinated byte-range locks, atomic lock up/down-grade=20
support, improved semantics for files under byte-range lock=20
control, protocol support for wait-on-lock with fairness, and=20
mandatory lock enforcement for clients on request. The delegation=20
proposal, included within this document in previous drafts, has=20
been split out into a separate proposal, based on feedback from=20
reviewers.Table of Contents

Status of this Memo
Key Words
Abstract
    1 AFS-3 File Locking
    2 Byte-Range Locking Interfaces
        2.1 Dependencies
        2.2 Backward Compatibility
        2.3 Concepts
            2.3.1 General
            2.3.2 Lock Management
            2.3.3 Deferred Locks
            2.3.4 Server Restarts
        2.4 Constants
            2.4.1 Lock Flags
                AFSLock_Flag_Mand
                AFS_LockFlagWait
            2.4.2 Lock Status
                AFSLock_Flag_Extend
                AFSLock_Flag_Discard
            2.4.3 Extended Callback Constants
            2.4.4 Extended Callback Extra Flags
                AFSCB_Lock_Flag_All
            2.4.5 Callback Result Constants
                AFSCB_Cancel_ExtendLocks
                AFSCB_Cancel_RevokeLocks
                AFSCB_Flag_ExtendLocks
                AFSCB_Flag_RevokeLocks
        2.5 Data Types
            2.5.1 AFSByteRangeLock
                Fid
                Type
                Owner
                Uniq
                Offset
                Length
                ExpirationTime
                Txid
                Token
            2.5.2 AFSByteRangeLockSeq
            2.5.3 AFSLockFlagsSeq
            2.5.4 HostIdentifierSeq
            2.5.5 AFSCB_ResultData Redefinition
                AFSCB_Result_ReturnLocks
                AFSCB_Result_ResponseDeferred
        2.6 Procedures
            2.6.1 SetByteRangeLock
                Notes
            Error Codes
                EACCES
                EWOULDBLOCK
                EDEADLK=20
                EINVAL
                ENOLCK
            2.6.2 ReleaseByteRangeLock
            Notes
            Error Codes
                EINVAL
            2.6.3 UpgradeByteRangeLock
            Error Codes
                EINVAL
                EWOULBLOCK
                EDEADLK
            2.6.4 DowngradeByteRangeLock
            Notes
            Error Codes
                EINVAL
            2.6.5 AssertExtendLocks
            Error Codes
                EACCES
            2.6.6 CancelByteRangeLock
        2.7 Windows & Unix Lock Semantics
            2.7.1 Byte-Range Locking
            2.7.2 Read/Write vs. Shared/Exclusive
            2.7.3 Atomic Lock Open
        2.8 Mandatory Enforcement
            2.8.1 Governing Ideas
            2.8.2 Enforcement Rules
    3 Appendix A: XDR Grammar (afsint.xg)
    4 Appendix A: XDR Grammar (afscbint.xg)


1 AFS-3 File Locking

While AFS-3 does support file locking, it permits locking of=20
whole-files only, and provides this support inefficiently. AFS=20
clients can take locks on any file object, with the granularity=20
of an entire file, using the RXAFS_SetLock procedure, and release=20
them with the RXAFS_ReleaseLock procedure. AFS uses a poll-based=20
locking model. AFS file locks, once issued, are considered to=20
persist only for 5 minutes, unless extended by the requesting=20
client using the RXAFS_ExtendLock procedure. This simplifies the=20
AFS file server, but complicates clients and wastes network=20
capacity. The OpenAFS file server implementaion, based on the=20
original Transarc AFS file server, tracks locks directly in its=20
on-disk volume structures. Considering the 5-minute duration=20
asserted for file locks, the reason for this decision is clearly=20
not to support lock persistence for long periods, although it may=20
have been intended to allow locks to persist through server=20
restarts (or crashes). The disk package tracks lock type=20
(LockRead or LockWrite), numbers of clients holding locks, and a=20
timestamp. Lock ownership, which in many cases may be reliably=20
inferred, is not recorded. Hence, a broken or malicious client=20
might release locks it never set (i.e., locks set by other=20
clients). The AFS protocol also does not permit atomic lock=20
upgrades (or downgrades).

2 Byte-Range Locking Interfaces

2.1 Dependencies

The byte-range lock feature depends on support for extended=20
callback notifications and extended host tracking support in=20
client and server.

2.2 Backward Compatibility

AFS clients and servers will indicate their support for=20
byte-range locking through new client and file server capability=20
flags:

const CLIENT_CAPABILITY_BYTE_RANGE_LOCK =3D 0x0008;

const VICED_CAPABILITY_BYTE_RANGE_LOCK =3D 0x0010;

2.3 Concepts

2.3.1 General

An AFS file server is responsible to coordinate byte-range=20
locking requests and, optionally, enforce mandatory locking=20
semantics relative to file operations, initiated at different=20
clients. By contrast with the traditional AFS file locking=20
protocol, the proposed byte-range locking protocol makes an=20
attempt to associate locks with a unique subject, specifically, a=20
ViceID and unique identifier which could correspond to a unique=20
session or process executing on the client machine.=20

Clients (cache-manager processes not co-located in memory)=20
request and release byte-range locks through a pair of interfaces=20
(RequestByteRangeLock, ReleaseByteRangeLock) similar to those=20
provided by the traditional AFS locking implementation. The same=20
lock types (read and write, in general regarded as =E2=80=9Cshared=E2=80=9D=
 or =E2=80=9C
exclusive=E2=80=9D) locks are defined as in traditional AFS locking.=20
Additional arguments and flags are provided to permit selection=20
of desired lock ranges, intention to =E2=80=9Cwait=E2=80=9D on the lock (=
i.e.,=20
willing to accept a deferred issue of the lock at such time as=20
the file server can grant the lock, if it cannot be granted=20
immediately), and desired special semantics--currently, the=20
client may request mandatory enforcement. Clients already holding=20
a read or write lock on a range may atomically upgrade or=20
downgrade the lock to the orthogonal type, i.e., they need not=20
release a lock of one type before requesting the other type,=20
avoiding the race condition present in the traditional AFS=20
locking protocol.

Byte-range locks are permanently associated with an owner, the=20
client which requested the lock. A lock may not be released by a=20
client which never owned it.=20

A file server may revoke locks granted to any client, for any=20
reason. The file server may also request clients to re-assert=20
their interest in outstanding locks, at any time--in particular,=20
if a client holding locks has not been heard from for a long=20
period (e.g., 10 minutes). Provision is made for re-establishment=20
of state after server restarts or other service interruptions.

Administrative users may under various circumstances have need to=20
identify the owner and state of locks on a locked file, and to=20
revoke file locks administratively. This proposal includes RPCs=20
allowing administrative users to perform these operations, and=20
suggests exposure through new AFS pioctls and the fs command.

2.3.2 Lock Management

Lock management in the proposed interface is completely redefined=20
relative to the file locking in AFS-3. Concepts are borrowed from=20
AFS cache management, including the callback concept. A=20
byte-range lock may be regarded as a special-purpose callback. A=20
file server may use the ExtendedCallBack interface to request=20
re-assertion of existing locks or revoke (cancel) locks=20
completely. These indications re-use the existing=20
AFSCB_Event_Cancel extended callback notification, adding new=20
cancellation types defined below.

2.3.3 Deferred Locks

Where possible, locks are granted immediately with the completion=20
of the SetByteRangeLock request. A file server MAY, on explicit=20
request and subject to client capability, agree to prospectively=20
issue a lock to an interested client at a future time, when the=20
requested lock becomes available. Such deferred locks constitute=20
a promise to issue the lock with best-effort consideration of=20
fairness. A new procedure in the client RPC interface=20
(AsyncIssueByteRangeLock) is provided to effect asynchronous=20
issue of a deferred lock to a waiting client. Deferred locks may=20
themselves be canceled.

2.3.4 Server Restarts

When a byte-range locking capable client receives one of the=20
InitCallBackState RPCs from a byte-range locking capable file=20
server, it must assume that any byte-range locks it held prior to=20
receipt must be re-asserted or bulk-released at the file server,=20
using the server's AssertExtendLocks RPC. A conformant file=20
server may, but need not, be prepared to validate locks=20
previously issued to clients, across server restarts. Certainly,=20
it is not expected that all issued locks be committed to stable=20
storage, so the server's ability to do so is presumed to be=20
limited. In future revisions, the Token attribute of=20
AFSByteRangeLock may allow file servers to reliably recognize=20
locks they issued in these circumstances, using cryptographic or=20
other mechanisms.

2.4 Constants

2.4.1 Lock Flags

The following flag constants are defined for use in the Flags=20
member of the AFSByteRangeLock structure and equivalently in the=20
Flags argument of the SetByteRangeLock procedure, with the same=20
semantics:

const AFSLock_Flag_Mand =3D 1; /* req. enforcement */

const AFSLock_Flag_Wait =3D 2; /* req. async wait on lock */

  AFSLock_Flag_Mand

Requests mandatory enforcement when sent with a SetByteRangeLock=20
request or in a deferred AFSByteRangeLock instance. Asserts=20
mandatory enforcement in an AFSByteRangeLock instance.

  AFS_LockFlagWait

Requests deferred lock if immediate lock cannot be granted when=20
sent with a SetByteRangeLock request. Indicates deferred lock in=20
an AFSByteRangeLock instance. The SetByteRangeLock procedure may=20
return locks in this state, subject to client capability and if=20
so requested in the Flags argument.

2.4.2 Lock Status

The following flag constants are provided to coordinate advanced=20
lock-management operations:

const AFSLock_Flag_Extend =3D 4; /* request extension, or server=20
ack extended */

const AFSLock_Flag_Discard =3D 8; /* discard lock, or server ack=20
discarded */

  AFSLock_Flag_Extend

Sent with AssertExtendLocks indicates request to assert/extend=20
the corresponding lock. Returned from AssertExtendLocks in=20
OutStatus array, indicates lock confirmation.

  AFSLock_Flag_Discard

Sent with AssertExtendLocks indicates intention to discard the=20
corresponding lock. Returned from AssertExtendLocks in OutStatus=20
array, acknowleges lock discard.

2.4.3 Extended Callback Constants

The following extended callback cancellation types and flags are=20
provided, to facilitate lock management through the=20
ExtendedCallback interface:

const AFSCB_Cancel_ExtendLocks =3D 7; /* re-assert locks, or lose=20
them */

const AFSCB_Cancel_RevokeLocks =3D 8; /* locks on Fid revoked */

These cancellation types are intended to be sent with=20
notifications of the existing AFSCB_Event_Cancel type.

2.4.4 Extended Callback Extra Flags

  AFSCB_Lock_Flag_All

Sent as the value of ExtraFlags when the notification type is=20
AFSCB_Cancel_ExtendLocks or AFSCB_Cancel_RevokeLocks, the=20
notification shall apply to all eligible objects, in which a 0=20
value has also been set for one or more of Volume, Fid, Uniq in=20
the corresponding callback, with the following intepretation:

=E2=80=A2 If Volume is non-zero, and is published from the sending file=20
  server, while Fid and Uniq are 0, then all outstanding locks on=20
  files in the volume are requested to be re-asserted or revoked,=20
  depending on the value of the corresponding notification

  =E2=80=93 If the notification type is AFSCB_Cancel_ExtendLocks, all=20
    corresponding locks are requested to be extended

  =E2=80=93 If the notification type is AFSCB_Cancel_RevokeLocks, all=20
    corresponding locks are revoked

=E2=80=A2 If all of Volume, Fid, and Uniq are 0, then all outstanding=20
  locks on files published from this server are requested to be=20
  re-asserted or revoked, depending on the value of the=20
  corresponding notification

  =E2=80=93 If the notification type is AFSCB_Cancel_ExtendLocks, all=20
    corresponding locks are requested to be extended

  =E2=80=93 If the notification type is AFSCB_Cancel_RevokeLocks, all=20
    corresponding locks are revoked

2.4.5 Callback Result Constants

The following constant is provided as a discriminator for the=20
AFSCB_ResultData member of AFSCBExtendedCallbackResult allowing=20
clients to indicate their intention to defer returning locks=20
until a subsequent RPC, within the time limit provided by the=20
server with the notification:

const AFSCB_Result_ResponseDeferred =3D 2;

The following constant is provided as a discriminator for the=20
AFSCB_ResultData member of AFSCBExtendedCallbackResult allowing=20
clients to indicate their intention to return locks in the=20
CallBack_Result_Array OUT parameter:

const AFSCB_Result_ReturnLocks =3D 3;

  AFSCB_Cancel_ExtendLocks

When sent as the reason for cancellation in an ExtendedCallback=20
notification, indicates the server requires re-assertion of all=20
locks on FID using the file server's AssertExtendLocks procedure.=20
The client MUST execute the procedure for all locks it asserts on=20
FID prior to the ExpirationTime in the callback, else it MUST=20
consider any locks it held on FID to be canceled.

  AFSCB_Cancel_RevokeLocks

When sent as the reason for cancellation in an ExtendedCallback=20
notification, indicates administrative cancellation of all locks=20
on FID.

const AFSCB_Flag_AssertLocks =3D 4; /* request ExtendLock */

const AFSCB_Flag_RevokeLocks =3D 8; /* locks cancelled */

  AFSCB_Flag_ExtendLocks

Has the same meaning and effect as AFSCB_Cancel_ExtendLocks, but=20
may be sent with an arbitrary extended callback message.

  AFSCB_Flag_RevokeLocks

Has the same meaning and effect as AFSCB_Cancel_RevokeLocks, but=20
may be sent with an arbitrary extended callback message.

2.5 Data Types

2.5.1 AFSByteRangeLock

The AFSByteRangeLock data type represents a byte-range lock=20
issued by an AFS file server:

struct AFSByteRangeLock {

  AFSFid Fid;

  afs_uint32 Type;

  afs_uint32 Owner;

  afs_uint64 Uniq;

  afs_uint32 Flags;

  afs_uint64 Offset;

  afs_uint64 Length;

  afs_uint64 ExpirationTime;

  AFSOpaque Txid;

  AFSOpaque Token;

};

  Fid

The Fid on which the lock is held.=20

  Type

The type of lock requested, LockRead or LockWrite. A byte-range=20
read lock is a non-exclusive read assertion on the stated range,=20
which may be shared by any number of readers and no writers. A=20
byte-range lock is an exclusive write assertion on the stated=20
range.

  Owner

The ViceID in use by the client requesting the lock.

  Uniq

Value uniquely identifying a session or process context at the=20
client. The representation of Uniq is intended to be able to=20
uniquely represent the most relevant process or thread context on=20
modern platforms.

  Offset

The distance in bytes from beginning-of-file to the start of the=20
locked range.

  Length

Length in bytes of the locked range.

  ExpirationTime

AFSByteRangeLock instances may be regarded as a special-purpose=20
callback. Instances persist until canceled, or until=20
ExpirationTime is reached.

  Txid

An arbitrary counted bytestring originating at the client with=20
the original request granting a lock. Defined for this revision=20
of the specification to have a maximum length of 0.

  Token

An arbitrary counted bytestring originating at the server when=20
the lock is issued. Defined for this revision of the=20
specification to have a maximum length of 0. In future revisions=20
it may be used to store an =E2=80=9Cirrefutable=E2=80=9D object which may=
 be used=20
to re-assert locks after server restart, or similar scenarios.

2.5.2 AFSByteRangeLockSeq

A variable-length array of type AFSByteRangeLock used for bulk=20
calls for asserting and locks.

const AFS_LOCK_SEQ_MAX =3D 10000;

typedef AFSByteRangeLock AFSByteRangeLockSeq <AFS_LOCK_SEQ_MAX>;

2.5.3 AFSLockFlagsSeq

An array of flags used in parallel with AFSByteRangeLockSeq,=20
above.

const AFS_LOCK_SEQ_MAX =3D 10000;

typedef afs_int32 AFSLockFlagsSeq <AFS_LOCK_SEQ_MAX>;

2.5.4 HostIdentifierSeq

const AFS_LOCK_SEQ_MAX =3D 10000;

typedef AFSLockHostIdentifierSeq <AFS_LOCK_SEQ_MAX>;

An array of HostIdentifier structures used by the=20
GetByteRangeLockStatus procedure to report client machines=20
holding locks.

2.5.5 AFSCB_ResultData Redefinition

The AFSCB_ResultData union defined in the Callback Extended=20
Information draft is redefined (upward compatibly), as the=20
following:

union AFSCB_ResultData switch (afs_uint32 Result_Type) {

case AFSCB_Result_NoResult:

    void;

case AFSCB_Result_ResponseDeferred:

    void;

case AFSCB_Result_ReturnLocks:

    AFSByteRangeLockSeq AssertedLocks_Array;

};

  AFSCB_Result_ReturnLocks

The result is used to return (synchronously, in the=20
ExtendedCallBack RPC) a list of byte-range locks being extended=20
in response to an extended callback notification of type=20
AFSCB_Flag_AssertLocks.

  AFSCB_Result_ResponseDeferred

The result is used to indicate that the client will not assert or=20
return locks synchronously in the ExtendedCallBack RPC (and will=20
instead assert or return locks using the asychronous RPCs=20
provided.)

2.6 Procedures

2.6.1 SetByteRangeLock

Requests a lock of type Type on Fid, on the range [Offset,=20
Offset+Length). Type must be one of LockRead or LockWrite. Owner=20
shall be set to the ViceID corresponding to the requesting=20
process or equivalent, or to 0 if this is not known. Uniq shall=20
be set to a value uniquely identifying the requesting process or=20
equivalent. On Unix-like systems, Uniq could be set to the PID of=20
the requesting process. Txid shall be a counted bytestring=20
corresponding to the AFSByteRangeLock attribute of the same name.=20
Txid is defined at this revision to have length 0.

proc SetByteRangeLock(

    IN AFSFid *Fid,

      afs_uint32 Type,

      afs_uint32 Flags,

      afs_uint32 Owner,

      afs_uint64 Uniq,

      afs_uint64 Offset,

      afs_uint64 Length,

     AFSOpaque Txid,

    OUT AFSByteRangeLock *Lock

) =3D 65601;

  Notes

On successful return the file server has granted the requested=20
lock, and Lock points to the server's asserted AFSByteRangeLock=20
structure. If the client has requested and the server agrees to=20
issue a deferred lock, Lock points to the server's asserted=20
deferred AFSByteRangeLock structure. The client may safely=20
determine if it has been granted a deferred lock by inspecting=20
the value of Lock->Flags.

The returned Lock structure MUST NOT differ from the request with=20
respect to range, except in the case where the requested lock=20
would overlap with a lock of the same type already held by the=20
same client, in which case, the locks are merged and the merged=20
range returned in Lock. The returned Lock structure MAY differ=20
from request with respect to Flags.

The value of the Flags argument may alter the semantics and/or=20
processing of the call:

=E2=80=A2 if (Flags & AFSLock_Flag_Mand), file server is requested to=20
  provide mandatory locking semantics as defined below--if the=20
  file server is willing to provide mandatory enforcement, it MAY=20
  set the corresponding flag in Lock, and if so MUST restrict=20
  writes on the asserted range to the holding client for the=20
  duration of the lock

=E2=80=A2 if (Flags & AFSLock_Flag_Wait), file server is requested to=20
  issue a deferred lock if the requested lock may not be=20
  immediately granted--the file server MAY grant a deferred lock=20
  in response to this request, indicating its agreement by=20
  setting the corresponding flag in Lock. Lock is in this=20
  instance an indicator only of the deferred lock promise

  Error Codes

  EACCES

The caller does not have the necessary rights.

  EWOULDBLOCK

The server is unable to grant the request due to conflicting=20
locks. If a deferred lock was requested, a Flags value of=20
AFSLock_Flag_Wait indicates the deferred lock is granted.

  EDEADLK=20

The server declines to grant the requested lock (or deferred=20
lock) because granting it would cause a deadlock.

  EINVAL

An illegal lock type was specified.

  ENOLCK

The server has insufficient resources to grant the lock, or the=20
requesting client or file has too many locks outstanding. (No=20
specific limits are mandated or suggested by this document.)

2.6.2 ReleaseByteRangeLock

Releases the byte-range lock represented in Lock, asserted to be=20
held by the calling client.

proc ReleaseByteRangeLock(

  IN AFSByteRangeLock *Lock

) =3D 65602;

  Notes

When an AFS client intends to release a byte-range write lock, it=20
MUST ensure that any changed data in the effected range has been=20
sent to the file server with the appropriate StoreData RPC, and=20
that the RPC completed successfully. This requirement is based on=20
an implied assertion that holding a lock on some region of a file=20
implies, invariantly, an up-to-date view on the locked region.

  Error Codes

  EINVAL

The caller does not own the corresponding lock.

2.6.3 UpgradeByteRangeLock

Upgrades the byte-range lock represented in Lock, asserted to be=20
held by the calling client, from its current type (which should=20
be LockRead) to LockWrite. The upgrade is executed atomically (no=20
opportunity exists for another client to set a conflicting lock=20
in the upgraded range while the upgrade is being executed).

proc UpgradeByteRangeLock(

  IN AFSByteRangeLock *Lock,

    afs_uint32 Type

) =3D 65603;

  Error Codes

  EINVAL

The caller does not own the corresponding lock or it is not of=20
the correct type.

  EWOULBLOCK

The lock could not be granted due to conflicting locks.

  EDEADLK

The lock could not be granted because granting it, with deferral,=20
would cause deadlock.

2.6.4 DowngradeByteRangeLock

Downgrades the byte-range lock represented in Lock, asserted to=20
be held by the calling client, from its current type (which=20
should be LockWrite) to LockRead. The downgrade is executed=20
atomically (no opportunity exists for another client to set a=20
conflicting lock in the downgraded range while the downgrade is=20
being executed).

proc DowngradeByteRangeLock(

    IN AFSByteRangeLock *Lock,

    afs_uint32 Type

) =3D 65604;

  Notes

When an AFS client intends to downgrade a byte-range write lock,=20
it MUST ensure that any changed data in the effected range has=20
been sent to the file server with the appropriate StoreData RPC,=20
and that the RPC completed successfully. This requirement is=20
based on an implied assertion that holding a lock on some region=20
of a file implies, invariantly, an up-to-date view on the locked=20
region.

  Error Codes

  EINVAL

The caller does not own the corresponding lock or it is not of=20
the correct type.

2.6.5 AssertExtendLocks

A file server may, at any time, request a client to re-assert its=20
interest in oustanding locks, or revoke those locks altogether.=20
It is expected that clients not heard from for a long period=20
(e.g., 10 minutes) would be requested to re-assert any=20
outstanding locks they hold. To request re-assertion of=20
outstanding locks, the file server may send the client an=20
extended callback notification on the corresponding Fids of type=20
AFSCB_Cancel_ExtendLocks, or it may set the flag=20
AFSCB_Flag_ExtendLocks on a notification of another type it was=20
already intending to send.=20

On receipt of an AFSCB_Cancel_ExtendLocks or=20
AFSCB_Flag_ExtendLocks notification through the extended callback=20
interface, a client MUST either:

=E2=80=A2 return any locks it asserts in AssertedLocks_Array, the type of=
=20
  union AFSCB_ResultData for these calls

  =E2=80=93 if the server rejects any locks asserted by the client, it=20
    will so notify client in a subsequent cancellation message

=E2=80=A2 set a result of AFSCB_Result_ResponseDeferred, and execute the=20
  AssertExtendLocks bulk call before the ExpirationTime in the=20
  AFSExtendedCallback structure sent with the callback

Fid is the file for which locks are being extended. Flags=20
contains indication of special semantics (e.g., mandatory=20
enforcement) being asserted, if any. AssertedLocks_Array points=20
to a variable length array of AFSByteRangeLock structures the=20
client asserts to hold. At the completion of the call, the=20
parallel array OutResult indicates the server's confirmation (or=20
refusal) to extend each asserted lock--a value of (Flags &=20
AFSLock_Flag_Extend_Ok) indicates confirmation.

/* Assert locks on Fid, on request */

AssertExtendLocks(

    IN AFSFid Fid,

      afs_uint32 Flags,

      AFSByteRangeLockSeq *AssertedLocks_Array,

    OUT AFSLockFlagsSeq *OutResult

) =3D 65607;

GetByteRangeLockStatus=20

Diagnostic procedure provided to permit system administrators to=20
identify client machines and software running on those clients=20
that are currently holding locks on a file. Fid is the file to=20
report on. The call returns parallel variable-length arrays of=20
locks and their associated hosts. The procedure may only be=20
executed by the AFS super user or members of the=20
system:administrators group.

proc GetByteRangeLockStatus(

    IN Fid,

    OUT AFSByteRangeLockSeq *AssertedLocks_Array,

        AFSLockHostIdentifierSeq *Clients_Array

) =3D 65605;

  Error Codes

  EACCES

The caller does not have the necessary rights.

2.6.6 CancelByteRangeLock

The CancelByteRangeLock procedure permits system administrators=20
to revoke active locks that may be obstructing normal operations,=20
perhaps due to a system or network problem. Fid is the file on=20
which to revoke locks. If successful, all locks in range [Offset,=20
Offset+Length) are canceled If a value of 0 is given for Offset=20
and Length the range is taken to span the entire file. The=20
procedure may only be executed by the AFS super user or members=20
of the system:administrators group.

proc CancelByteRangeLocks(

    IN Fid,

       afs_uint64 Offset,

       afs_uint64 Length

) =3D 65606;

2.7 Windows & Unix Lock Semantics

Implementation of interoperable locking behavior presents=20
challenges for a distributed file system like AFS, which must=20
support clients on platforms which do not agree precisely on the=20
semantics desirable or possible to enforce.

2.7.1 Byte-Range Locking

As byte-range locking is effectively required for correct=20
behavior of Windows applications, the OpenAFS for Windows client=20
has been forced to implement a locally-enforced byte-range=20
locking mechanism. In the Windows client today, local byte-range=20
are shadowed by a whole-file lock in AFS. With the introduction=20
of server-coordinated byte-range locking, the Windows client is=20
expected to use server byte-range locks when possible.

2.7.2 Read/Write vs. Shared/Exclusive

In the current OpenAFS for Windows client, shared (whole-file)=20
locks are mapped to AFS read locks, and Windows exclusive=20
(whole-file) locks are mapped to AFS write locks. This mapping=20
applies equally for byte-range locks.

2.7.3 Atomic Lock Open

Windows provides the ability to open and lock a file in a single=20
operation, and key Windows applications such as Microsoft Office=20
rely this behavior. Although this behavior has no direct=20
equivalent in the AFS protocol (which does not provide an OPEN=20
file operation) the correct behavior from the point of view of=20
Windows applications is already emulated by the Windows client.

2.8 Mandatory Enforcement

Mandatory enforcement of file locks is considered a requirement=20
for Windows interoperation. The rules proposed here reflect some=20
consideration and discussion of unique features in AFS, and also=20
compromises made in competing systems intended to support mixed=20
Windows and Unix clients, particularly NFSv4.

2.8.1 Governing Ideas

=E2=80=A2 Byte-range locks may be taken out on a file under the same=20
  circumstances under which a whole file might be taken out in=20
  traditional AFS

=E2=80=A2 Clients asserting advisory locks on a file by definition do not=
=20
  expect any special semantics from the file system; however, it=20
  seems logically reasonable that advisory and mandatory locks=20
  should interact equivalently as locks, and so where this=20
  document asserts that in a given scenario, a lock by a client A=20
  would conflict with a lock held by a client B, it is is not=20
  considered relevant whether either client's lock is advisory or=20
  mandatory

=E2=80=A2 The mechanism of lock enforcement is to fail the operation=20
  being attempted, a hint shall be sent in the return code of the=20
  reason for failure

=E2=80=A2 An operation which fails due to conflict with an existing lock=20
  fails completely

=E2=80=A2 Mandatory enforcement is taken to mean enforcement, generally,=20
  of write denial in any locked range, including by clients not=20
  observing any locking protocol

=E2=80=A2 Attempts to write outside any conflicting locked range on a=20
  file with at least one mandatory locked range, considering the=20
  view of locks on the file at the fileserver when the write=20
  request is processed, are considered valid (this is the=20
  documented behavior on Windows platforms)

=E2=80=A2 Since applications exist, particularly for the command line=20
  (e.g., tar) which know nothing about locks, and may have=20
  legitimate reason to read (though not write) data protected by=20
  mandatory locks, relaxed semantics are enforced for reads by=20
  clients reading outside any range they have themselves=20
  locked--such reads never conflict with lock enforcement--the=20
  view of data provided to such a client shall be whatever is=20
  available, conforming to regular AFS semantics

=E2=80=A2 Mandatory enforcement of a read or write lock is asserted to=20
  govern only the StoreData operation (by other clients), and=20
  not, e.g., the various directory change operations or FetchData[footnot=
e:
Mandatory read lock enforcement is silly, Eisler 2006. More=20
importantly, it causes difficulties for the AFS cache consistency=20
model.
]

2.8.2 Enforcement Rules

=E2=80=A2 If a client A has a mandatory lock of any type on a range R in=20
  a file F, then StoreData operations by any other client B which=20
  would alter data in any overlapping range or truncate F such as=20
  to reduce or eliminate R, the conflicting operation (initiated=20
  by B) fails

3 Appendix A: XDR Grammar (afsint.xg)

const VICED_CAPABILITY_BYTE_RANGE_LOCK =3D 0x0010;



const AFSLock_Flag_Mand =3D 1; /* req. enforcement */

const AFSLock_Flag_Wait =3D 2; /* req. wait on lock */



struct AFSByteRangeLock {

  AFSFid Fid;

  afs_uint32 Type;

  afs_uint32 Flags;

  afs_uint32 Owner;

  afs_uint64 Uniq;

  afs_uint64 Offset;

  afs_uint64 Length;

  afs_uint64 ExpirationTime;

};



/* Request byte-range file lock */

proc SetByteRangeLock(

    IN AFSFid *Fid,

        afs_uint32 Type,

        afs_uint32 Flags,

        afs_uint32 Owner,

        afs_uint64 Uniq,

        afs_uint64 Offset,

        afs_uint64 Length,

    OUT AFSByteRangeLock *Lock

) =3D 65601;



/* Release byte-range file lock */

proc ReleaseByteRangeLock(

    IN AFSByteRangeLock *Lock

) =3D 65602;



/* Upgrade byte-range file lock (i.e., from Read to Write) */

proc UpgradeByteRangeLock(

    IN AFSByteRangeLock *Lock,

    afs_uint32 Type

) =3D 65603;



/* Downgrade byte-range file lock (i.e., from Write to Read) */

proc DowngradeByteRangeLock(

    IN AFSByteRangeLock *Lock,

    afs_uint32 Type

) =3D 65604;



/* Request lock status report (system:administrators) */

proc GetByteRangeLockStatus(

    IN Fid,

    OUT AFSByteRangeLockSeq *AssertedLocks_Array,

        AFSLockHostIdentifierSeq *Clients_Array

) =3D 65605;



/* administratively cancel locks (system:administrators) */

proc CancelByteRangeLocks(

    IN Fid,

       afs_uint64 Offset,

       afs_uint64 Length

) =3D 65606;



const AFS_LOCK_SEQ_MAX =3D 10000;

typedef AFSByteRangeLock AFSByteRangeLockSeq <AFS_LOCK_SEQ_MAX>;

typedef AFSLockFlagsSeq <AFS_LOCK_SEQ_MAX>;



const AFSLock_Flag_Extend =3D 4; /* client request extend, server=20
ack extended */=20

const AFSLock_Flag_Discard =3D 8; /* client request disard, server=20
ack discarded */



/* Assert locks on Fid, on request */

AssertExtendLocks(

    IN AFSFid Fid,

       afs_uint32 Flags,

       AFSByteRangeLockSeq *AssertedLocks_Array,

    OUT AFSLockFlagsSeq *OutResult

) =3D 65607;

4 Appendix A: XDR Grammar (afscbint.xg)

const CLIENT_CAPABILITY_BYTE_RANGE_LOCK =3D 0x0008;



/* Byte-Range Locking Cancellation Types */

const AFSCB_Cancel_ExtendLocks =3D 7; /* re-assert locks, or lose=20
them */

const AFSCB_Cancel_RevokeLocks =3D 8; /* locks on Fid revoked */



/* Cancellation Flags */

const AFSCB_Flag_AssertLocks =3D 4; /* request ExtendLock */

const AFSCB_Flag_RevokeLocks =3D 8; /* locks cancelled, sorry */



/* confirm issue of deferred lock requests */

proc AsyncIssueByteRangeLock(

    IN HostIdentifier *Server,

       AFSByteRangeLockSeq <AFS_LOCK_SEQ_MAX>

) =3D 65540;


--------------020109030903030104050204--