[AFS3-std] [RPC Request] 64-bit volume IDs, quotas, and block usages

Mon, 22 Jun 2009 02:20:37 -0400

Jeffrey Altman wrote:
> Modernization does not have to require breaking backward compatibility. 
> What it does require is that as any new functionality is added to the
> protocol that the designers consider how the new data forms will be
> represented using the older RPC variants and where they cannot be, how
> the transition will take place and how those that deploy the new
> functionality will be able to manage the fact that some data may only be
> visible to a subset of the deployed clients for some extended period of
> time. 
>
> Increasing the volumeId field size to 64-bits is fine provided that the
> transition model for older clients is documented and well understood. 
> If the need of the community is that we support the older RPCs for a
> minimum of ten years from the time the last client ships with just the
> older RPCs, then rolling out RPCs that require new clients to be able to
> handle 64-bit time and volumeIds this year will provide the community 29
> years to upgrade software or retire old clients before the existing RPCs
> will simply fail.  Given that some of the NASA deployments have hardware
> deployed with frozen software that is active for more than 20 years on
> some missions, we know that some portions of the community will require
> that much time.  For OpenAFS, our commitment to backward compatibility
> may result in restricting volumeIds to 32-bits for a decade or more. 
> There are implementation specific changes that can more efficiently use
> the existing volumeId space in the meantime.  That should not stop some
> other implementation of the protocol suite from starting from scratch
> with the new protocols and ignoring the installed base.
>
> Jeffrey Altman
Here are the types of things that I would like to see in a proposal that
has backward compatibility implications.

   1. What is the problem that is being addressed by the extension?
   2. What is the impact on older clients that do not support the
      extended functionality?
   3. What is the impact on older servers that do not support the
      extended functionality?
   4. How should the lack of support for the extended functionality be
      communicated to older clients and servers?
   5. Are there implementation suggestions that can be used to mitigate
      the problem for existing clients once all of the servers are upgraded?

For 64-bit volumeIds I would answer them as follows:

   1. There are sites that today do not deploy AFS that would if only
      they could represent more than 2^32 volumes.  There are sites that
      deploy AFS today that are worried about the exhaustion of the
      existing volumeId space given the implementation specific algorithms.
   2. Clients that only support the existing RPCs will not be able to
      access volumes whose volumeId is larger than 32-bits. 
   3. For servers it will not be safe to deploy extended volumeIds until
      such time as all servers are upgraded to support them.  
      Availability of the new RPCs on all affected service instances
      combined with a local policy permitting the use of extended
      volumeId should be required before extended volumeIds are allocated.
   4. A determination will need to be made as to whether a volume group
      that contains any extended volumeIds can be reported to a client
      using the existing RPCs.  In other words, can a volume group for
      which one or more volumes cannot be identified be reported as
      existing at all?  If not, the proper response might be VL_NOENT. 
      It will not be possible to allocate a new error code for this case
      because the existing clients will not be aware of it.
   5. Assuming that servers are upgraded to support extended volumeIds
      and that the clients cannot be, there are several things that can
      be done by an implementation to mitigate the volumeId space
      exhaustion concerns:
         1. Once all services are upgraded within the cell, an
            implementation may choose to allocate all temporary
            volumeIds from the extended volumeId space.
         2. The Transarc and OpenAFS implementations allocate the
            initial volumeId in the middle of the available range and
            then increment the volumeId with each created volume
            (temporary or otherwise).  If the initial volumeId is known,
            the implementation can choose to map the range 2^32 to
            2^32+initial_volumeId onto the range 0 to initial_volumeId-1
            when the existing RPCs are used.  If such mapping is
            performed, then the value 2^32 should be reserved to prevent
            mapping to the volumeId 0 which may be used by clients to
            indicate an unknown volumeId.
         3. Implementations can choose to implement algorithms that
            permit unused volumeIds to be recaptured.

There are certainly other considerations that will have to be addressed
as well as other implementation specific suggestions that could be made
to improve the transition.  This is just a start.

Jeffrey Altman