[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended Call Backs, Byte Range Locking, etc.

Mon, 27 Jul 2009 14:44:08 +0200

Jeffrey:

thank you very much for your long advise. I will follow it. The only
change I would propose is to name the new RPCs differently because we
will have asynchronous I/O not only with OSDs, but also with direct
access to visible fileserver partitions (what I called embedded
filesystems). So I think we need six new RPCs:

RRAFS_StartAsyncFetch(...)
RXAFS_ExtendAsyncFetch(...)
RXAFS_EndAsyncFetch(...)
RXAFS_StartAsyncStore(...)
RXAFS_ExtendAsyncStore(...)
RXAFS_EndAsyncStore(...)

These RPCs would replace RXAFS_GetOSDlocation, and RXAFS_Serverpath and
in some cases storeMini.

Hartmut

Jeffrey Altman wrote:
> 
> Hartmut:
> 
> The issue to which Jeff Hutzelman is referring is RXAFS_SetLock,
> RXAFS_ReleaseLock, and RXAFS_ExtendLock.   As you know, these RPCs are
> used to manage the CM-FS transactions for file locks.  A CM requests a
> lock with SetLock and then proceeds to extend the lifetime of the lock
> every five minutes with ExtendLock and releases the lock with ReleaseLock.
> 
> The problem is that there is no magic cookie or lockId or transactionId
> returned as part of the SetLock call.  Therefore, when the FS receives a
> ExtendLock or ReleaseLock call it does not know if the request came from
> the CM that issued the original SetLock or not.
> 
> An ExtendLock can be issued and will succeed as long as the lock count
> is non-zero.  If there is a client that is issuing ExtendLock calls on a
> FID, those will fail until such time as another client obtains a read
> lock at which point the lock will be successfully extended even though
> it was never issued.
> 
> In the same regards, a ReleaseLock can be issued and will succeed on a
> FID even when there is no outstanding lock issued to the CM performing
> the release.
> 
> We have seen these problems in practice.  A CM was issued a lock and
> then gets disconnected from the network for longer than five minutes
> (perhaps due to a suspend).  The lock for that CM should have been
> dropped but the CM is unaware and when it wakes attempts to ExtendLock
> and eventually ReleaseLock causing the lock counts to get out of sync.
> We have also seen buggy clients that issue ExtendLocks and never stop
> even after the client has issued a ReleaseLock.
> 
> Now that we have UUIDs for most clients (UUIDs are not required) we can
> mitigate the problem by tracking the clients that are actively issued
> locks and when they will expire.  However, it cannot be fixed entirely.
> 
> The proper way to address this is for SetLock to return some identifier
> for the lock that can be used to ensure that when an ExtendLock or
> ReleaseLock is sent, it applies only to the one instance of a lock that
> was issued and not to any others.
> 
> The
> RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData
> and
> RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData
> rpcs are going to have exactly the same issue as
> SetLock/ExtendLock/ReleaseLock rpcs.   Jeff's point is that we must not
> repeat the same mistakes from our past.
> 
> Jeffrey Altman
> 

-- 
-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------