[OpenAFS-devel] RE: Yuck... largefile support really fouled things up bad...

Neulinger, Nathan nneul@umr.edu
Fri, 21 Mar 2003 13:17:15 -0600


Current situation is that there are a couple severe problems.

The current cvs trunk code is badly broke, independent of the issues
related to 1-4 below. The file server is completely unusable on linux
right now. With the last patch I sent (and Derrick committed), the
volserver is usable. The problem is that the changes for large file
support made some fundamental data structure changes, and those changes
were not followed through in way too many places, resulting in code that
sortof looks like it will work, and maybe does work properly on some
platforms, but is completely broke on others. Basically, all over the
place, things changed into afs_int64's, and the code was not changed to
be aware of the fact that it might not be passing around afs_int32's any
more.

I've done some limited testing changing the afs_int64 data types to be a
structure instead of a real integer. Based on the HUGE number of compile
problems - cascading as well, ever fix shows up more - the number of
places in the code that this isn't safe for are quite dramatic.

My current recommendation is that the largefile support be backed out,
and the following done:

	Make the afs_int64 type always a structure until the code is
stabilized. For that matter, I'm not sure leaving it as a structure will
really cause that bad of a performance hit since the structure can be a
single element that is the real 64bit int in most cases.

	Start changing the underlying server code to be largefile safe -
i.e. basically make ALL of the sizes/offsets/inodes/etc. be stored in
afs_int64/afs_uint64's. This is the vast majority of the work. Note -
LEAVE the server as 32 bit limited still, but get all of the code to
where it uses the 64bit types for it's tracking. Should be able to do
this in stages - start with the inodes.=20

	Add back in the code to start dealing with 64bit files to the
file server.=20

I believe that:

	a. Any platform should be capable of dealing with the fileserver
code that handles 64 bit files, but on platforms not capable of doing 64
bit ops - the code that actually depends on it should cause failures.

	b. Structures for dealing with the large files should be made so
that they will be functional with either a real largefile inode, or with
namei largefile inodes.=20

	c. Requiring a fsconv_ or similar to convert to a server with
large file support should not be a tragedy. AFS was designed to make
moving volumes easy, and adding largefile support is such a dramatic new
feature that this doesn't seem like a big deal. Easy way out is to clear
a server at a time, and just move the volumes. Providing a tool to
convert the filesystem on upgrade (which I believe should only be
necessary for namei installations) should be the way to go for allowing
an in-place upgrade.

	d. The large file support should probably check the current on
disk contents in the case of a namei server for a flag or trigger. If
not present and data is found, should error out indicating that the
on-disk contents are not largefile aware.=20

	e. It might take some extra code, but I think it should be
possible to have the VN_GET_INO() and friends be able to check to see if
the current partition is largefile-aware/capable, and process the
contents of the vnode accordingly.=20

	f. We have an easy way out - SMALL64VNODEMAGIC
LARGE64VNODEMAGIC. If the vnode has the old magic, it is not 64bit
aware, and handle the content accordingly. This should make (e) fairly
easy to implement. This should also allow a transparent in-place upgrade
that will remain mostly backwards compatible. If the write taking place
is going to require largefile support, update the vnode to use 64bit
magic, otherwise mark it using the 32bit. Doing that will allow you to
modify content in-place.=20

I will be trying to put together a patch for Derrick to back out the
current code to where the trunk fileserver is at least usable. Seems
like no matter what we do, it'll be alot easier to apply stuff a small
chunk at a time starting from a working base. Right now, things are in a
bad enough state that I would hate to see how long it will take to get
us back to a fully working system.=20

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Hartmut Reuter [mailto:reuter@rzg.mpg.de]=20
> Sent: Thursday, March 20, 2003 3:27 AM
> To: R. Lindsay Todd
> Cc: Neulinger, Nathan; openafs-devel@openafs.org
> Subject: Re: Yuck... largefile support really fouled things up bad...
>=20
>=20
>=20
> Sorry for my late reply, but I broke my leg and was in the=20
> hospital for=20
> some days.
>=20
> The point is:
> 1.) You can have large file support only with the namei-interface.
> 2.) the namei-interface needs AFS_64BIT_IOPS_ENV to be set=20
> because the=20
> inode number used in namei_ops.c is 64 bit long.
> 3.) the field vn_ino_hi used to store redundantly the same=20
> contents as=20
> the field uniqifier. Therefore I thought it would be better to modify=20
> VNDISK_GET_INO and VNDISK_SET_INO in order to use immediatly=20
> uniquifier=20
> and to use vn_ino_hi for the high order 32 bits of the file length.
> 4.) You never will be able to change an old fileserver to a=20
> large file=20
> supporting fileserver by just installing the new binaries.=20
> You will have=20
> to move the volumes to the new server. During the move=20
> process the old=20
> contents ov vn_ino_hi is not transfered and the field is cleared.
>=20
> Hartmut
>=20
>=20
>=20
> R. Lindsay Todd schrieb:
> > I can live with this change, and certainly agree that it is=20
> better to=20
> > make this change now than later.
> >=20
> > /Lindsay
> >=20
> > Neulinger, Nathan wrote:
> >=20
> >> How do y'all feel about maintaining on-disk compatability=20
> with adding
> >> the largefile fileserver support to openafs? There are=20
> some issues with
> >> the current trunk code that make assumptions that are not=20
> correct (such
> >> as any 64bit_env build MUST HAVE largefile_env defined, or=20
> it doesn't
> >> work right). Also assumes that we will never be able to do=20
> both 64bit
> >> iops and large files in the same binary.
> >>
> >> I'd like to use to reserved6 field in the vnode disk=20
> structure to add a
> >> length_hi, and start with the attached patch, plus other=20
> sanity checking
> >> code will need added. Since this code hasn't ever been in=20
> a real release
> >> of openafs, now is the time to decide how to do it.
> >> Current implementation forces some dead ends, seems like using the
> >> reserved slot is the way to go.
> >> Derrick wanted me to talk to both of you first before he started
> >> applying these changes. He did apply one I sent that at=20
> least gets rid
> >> of the failure I talked about on -devel yesterday.
> >> Do either of you object to using reserved6 as length_hi,=20
> and eliminating
> >> the field re-use that is currently in place?
> >>
> >> -- Nathan
> >>
> >> ------------------------------------------------------------
> >> Nathan Neulinger                       EMail:  nneul@umr.edu
> >> University of Missouri - Rolla         Phone: (573) 341-4841
> >> Computing Services                       Fax: (573) 341-4216
> >>
> >>
> >> =20
> >>
> >>> -----Original Message-----
> >>> From: Derrick J Brashear [mailto:shadow@dementia.org]=20
> Sent: Thursday,=20
> >>> March 13, 2003 12:54 PM
> >>> To: Neulinger, Nathan
> >>> Subject: RE: Yuck... largefile support really fouled=20
> things up bad...
> >>>
> >>> On Thu, 13 Mar 2003, Neulinger, Nathan wrote:
> >>>
> >>>  =20
> >>>
> >>>> If largefile_env is defined, it forces namei.
> >>>>    =20
> >>>
> >>> that doesn't conflict with my statement.
> >>>
> >>>  =20
> >>>
> >>>> largefile_env is currently incompatible with 64bit_iops    =20
> >>>
> >>> (it's reusing
> >>>  =20
> >>>
> >>>> vn_ino_hi, which iops uses), but the only reason it doesn't    =20
> >>>
> >>> clash/fail
> >>>  =20
> >>>
> >>>> is that it is forcing namei.
> >>>>    =20
> >>>
> >>> ok, so talk to lindsay or hartmut and see if they care if=20
> we switch.
> >>>
> >>>  =20
> >>
> >>
> >> =20
> >>
> >=20
>=20
>=20
> --=20
> -----------------------------------------------------------------
> Hartmut Reuter                           e-mail reuter@rzg.mpg.de
> 					   phone +49-89-3299-1328
> RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
> Institut fuer Plasmaphysik (IPP)
> -----------------------------------------------------------------
>=20
>=20