[AFS3-std] Re: [OpenAFS-devel] convergence of RxOSD, Extended Call Backs, Byte Range Locking, etc.

Jeffrey Altman jaltman@secure-endpoints.com
Mon, 27 Jul 2009 09:56:22 -0400


Hartmut:

You are welcome for the advise.  I would be happy to provide much more
of it once I am able to read a protocol specification.

Thank you.

Jeffrey Altman


Hartmut Reuter wrote:
> Jeffrey:
> 
> thank you very much for your long advise. I will follow it. The only
> change I would propose is to name the new RPCs differently because we
> will have asynchronous I/O not only with OSDs, but also with direct
> access to visible fileserver partitions (what I called embedded
> filesystems). So I think we need six new RPCs:
> 
> RRAFS_StartAsyncFetch(...)
> RXAFS_ExtendAsyncFetch(...)
> RXAFS_EndAsyncFetch(...)
> RXAFS_StartAsyncStore(...)
> RXAFS_ExtendAsyncStore(...)
> RXAFS_EndAsyncStore(...)
> 
> These RPCs would replace RXAFS_GetOSDlocation, and RXAFS_Serverpath and
> in some cases storeMini.
> 
> Hartmut
> 
> 
> 
> Jeffrey Altman wrote:
>> Hartmut:
>>
>> The issue to which Jeff Hutzelman is referring is RXAFS_SetLock,
>> RXAFS_ReleaseLock, and RXAFS_ExtendLock.   As you know, these RPCs are
>> used to manage the CM-FS transactions for file locks.  A CM requests a
>> lock with SetLock and then proceeds to extend the lifetime of the lock
>> every five minutes with ExtendLock and releases the lock with ReleaseLock.
>>
>> The problem is that there is no magic cookie or lockId or transactionId
>> returned as part of the SetLock call.  Therefore, when the FS receives a
>> ExtendLock or ReleaseLock call it does not know if the request came from
>> the CM that issued the original SetLock or not.
>>
>> An ExtendLock can be issued and will succeed as long as the lock count
>> is non-zero.  If there is a client that is issuing ExtendLock calls on a
>> FID, those will fail until such time as another client obtains a read
>> lock at which point the lock will be successfully extended even though
>> it was never issued.
>>
>> In the same regards, a ReleaseLock can be issued and will succeed on a
>> FID even when there is no outstanding lock issued to the CM performing
>> the release.
>>
>> We have seen these problems in practice.  A CM was issued a lock and
>> then gets disconnected from the network for longer than five minutes
>> (perhaps due to a suspend).  The lock for that CM should have been
>> dropped but the CM is unaware and when it wakes attempts to ExtendLock
>> and eventually ReleaseLock causing the lock counts to get out of sync.
>> We have also seen buggy clients that issue ExtendLocks and never stop
>> even after the client has issued a ReleaseLock.
>>
>> Now that we have UUIDs for most clients (UUIDs are not required) we can
>> mitigate the problem by tracking the clients that are actively issued
>> locks and when they will expire.  However, it cannot be fixed entirely.
>>
>> The proper way to address this is for SetLock to return some identifier
>> for the lock that can be used to ensure that when an ExtendLock or
>> ReleaseLock is sent, it applies only to the one instance of a lock that
>> was issued and not to any others.
>>
>> The
>> RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData
>> and
>> RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData
>> rpcs are going to have exactly the same issue as
>> SetLock/ExtendLock/ReleaseLock rpcs.   Jeff's point is that we must not
>> repeat the same mistakes from our past.
>>
>> Jeffrey Altman
>>
> 
>