[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended
Call Backs, Byte Range Locking, etc.
Hartmut Reuter
reuter@rzg.mpg.de
Mon, 27 Jul 2009 14:44:08 +0200
Jeffrey:
thank you very much for your long advise. I will follow it. The only
change I would propose is to name the new RPCs differently because we
will have asynchronous I/O not only with OSDs, but also with direct
access to visible fileserver partitions (what I called embedded
filesystems). So I think we need six new RPCs:
RRAFS_StartAsyncFetch(...)
RXAFS_ExtendAsyncFetch(...)
RXAFS_EndAsyncFetch(...)
RXAFS_StartAsyncStore(...)
RXAFS_ExtendAsyncStore(...)
RXAFS_EndAsyncStore(...)
These RPCs would replace RXAFS_GetOSDlocation, and RXAFS_Serverpath and
in some cases storeMini.
Hartmut
Jeffrey Altman wrote:
>
> Hartmut:
>
> The issue to which Jeff Hutzelman is referring is RXAFS_SetLock,
> RXAFS_ReleaseLock, and RXAFS_ExtendLock. As you know, these RPCs are
> used to manage the CM-FS transactions for file locks. A CM requests a
> lock with SetLock and then proceeds to extend the lifetime of the lock
> every five minutes with ExtendLock and releases the lock with ReleaseLock.
>
> The problem is that there is no magic cookie or lockId or transactionId
> returned as part of the SetLock call. Therefore, when the FS receives a
> ExtendLock or ReleaseLock call it does not know if the request came from
> the CM that issued the original SetLock or not.
>
> An ExtendLock can be issued and will succeed as long as the lock count
> is non-zero. If there is a client that is issuing ExtendLock calls on a
> FID, those will fail until such time as another client obtains a read
> lock at which point the lock will be successfully extended even though
> it was never issued.
>
> In the same regards, a ReleaseLock can be issued and will succeed on a
> FID even when there is no outstanding lock issued to the CM performing
> the release.
>
> We have seen these problems in practice. A CM was issued a lock and
> then gets disconnected from the network for longer than five minutes
> (perhaps due to a suspend). The lock for that CM should have been
> dropped but the CM is unaware and when it wakes attempts to ExtendLock
> and eventually ReleaseLock causing the lock counts to get out of sync.
> We have also seen buggy clients that issue ExtendLocks and never stop
> even after the client has issued a ReleaseLock.
>
> Now that we have UUIDs for most clients (UUIDs are not required) we can
> mitigate the problem by tracking the clients that are actively issued
> locks and when they will expire. However, it cannot be fixed entirely.
>
> The proper way to address this is for SetLock to return some identifier
> for the lock that can be used to ensure that when an ExtendLock or
> ReleaseLock is sent, it applies only to the one instance of a lock that
> was issued and not to any others.
>
> The
> RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData
> and
> RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData
> rpcs are going to have exactly the same issue as
> SetLock/ExtendLock/ReleaseLock rpcs. Jeff's point is that we must not
> repeat the same mistakes from our past.
>
> Jeffrey Altman
>
--
-----------------------------------------------------------------
Hartmut Reuter e-mail reuter@rzg.mpg.de
phone +49-89-3299-1328
fax +49-89-3299-1301
RZG (Rechenzentrum Garching) web http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------