[OpenAFS-devel] Q's Java API (JAFS) in the 1.4.x build branch

Jim Doyle rockymtnmagic@yahoo.com
Fri, 11 Jul 2008 13:52:21 -0700 (PDT)


I think the next best step is for me throw together some UML for you
guys so we have pretty pictures (aka object model) to look at.  I can
do that today and this weekend, and forward out a ZIP of the JPG's to
people individually, or to the mailing list (though I doubt the mail-list
will accept attachments).

It's been A LONG TIME since I've been an <<3l33t hex0r>> C systems programmer, gosh forbid I live squarely in the land of IDE, UML modeling tools and trying to adhere to sensible design patterns - I think I can
actually remember what "extern" means. :)   But what I propose is massively
salvaging and reusing IBM native methods C glue code as well as salvaging the Makefile build targeting and essentially grafting the vital organs of this into a new body.  What I'm also proposing is a completely NEW Java API 
with zero backwards compatibility with circa 2001 JAFS.  I think I can
justify design forces that justify a clean break from that past.  Believe
it or not, the Java side is EASY,  the scrap-yard work of repurposing the existing C code is not too complex, but would take the MOST TIME.  However, NONE of the C code work should even begin unless people agree to a new design.

So here's a redux of issues with JAFS: 

MAJOR
=====

1.  In JAFS, Classes mix facts with behavior.  A big problem with this
tight coupling can be seen in the use of the internal refresh() methods
whose job it is to keep the JAFS 'value-object' side attributes in tune
with Ubik database records or volume headers off the volservice. In Volume,
for instance, there are multiple refresh() methods to fetch properties of
dependent objects.  This is insanely complex!  The fix to the problem:
More finely grained POJOs needed!  Also, more service interfaces to permit
easy fetching of dependent associations very easy. POJOs are snapshots of
state in time of a domain object. The Service interfaces is where you produce, or transform them.

2.  In JAFS, there are consequences to tightly coupling the native methods
to the domain classes:   What if the native method signatures need to change?  What if the Java internal signatures need to change?  How will
this impact upstream code?   Horrifically most of the time.   Decouple the native methods and the native method support code from the POJOs.   Move the factories that produce the POJO from Ubik/Volserver Rx calls to a DEDICATED factory class where ALL of the Rx/Native Method magic needs to happen. Have the factory implement an Service interface.   Now, if the native side stuff has to change, it can, and will not ripple through to the POJOs or the Interfaces.

3. In JAFS, the Objects are declared Serializable but they really really
should not be remoted to anything given their fragile state!  The tight coupling issues of (1) and (2) make remoting difficult. Why?  Lets say my serializable class jumps the wire on an RMI call and I call something that calls a protected method which then calls a native method to do something?  KABOOM!  Major exceptions thrown by the JVM.

4. Remoting JAFS over RMI makes tons and tons of sense.... What Remoting
will do is massively encapsulate away from client programmer all of the
configuration issues that JAFS currently requires:  Where is my CellServDB, how do I get a token?  Do I need krb5.conf/aklog or am I still using kaserver?   Where is libafs.so, and where is libjafs.so?  And what sick combination of -Dxxx flags do I need to pass to java when I want to run my program ??   To remote, I need Serializable value objects with no behaviour methods *AND* I need a Service Locator object and a set of Interfaces to
remote proxies that will let me talk to Ubik databases or the Volserver using native method glue code that I DONT NEED in my Classpath!  The
remote server needs its classpath, as well as all the other startup and
initialization information - but the client doesnt.  All client programmer
NEEDS is a JAR file with the POJOs, the Interfaces, the skeletons and a
URL for where to attach to the RMI service... An SSL key will also help. :)

The current implementation is not kosher for remoting and will never be so.
Thats because the tight coupling not only complicates the maintainability of the code, but it also FORCES onto the client programmer all of the configuration and setup dependencies I've mentioned above. A consequence
of this complexity is its stifling for potentional Java programmer who
MIGHT want to contribute at the UI/Admin app/Tool level...  If it's not
reasonably attainable to setup and manage - noone's gonna step up to the plate, and that means all that marvelous AFS stuff in C, Rx land will never
see the light of day of reuse and extension. :(

5. Side effects of refactoring create fertile ground for others ; namely
Web Services and, God Have Mercy on us All, CORBA IDL support.  You just
need the interfaces, the native method factories, the POJOs and the Exceptions to do this.  Leave behind the RMI adapter and RMI support glue code... 



MINOR
=====
Other smaller issues with JAFS:

6.  Better modelling of the Volume Object.   My opinion is that the
current design too much mixes facts about from the VLDB with
facts from the Volserver.   The reality is that most of these
domain objects exist WITHOUT the presence of the VLDB. In other words,
Servers, Partitions and Volumes are present regardless of that the VLDB says.  So, using the VLDB as a defacto source of truth to navigate the
Object graph of Cell -> Server -> Partition -> Volume is poorly information.  There are TWO ways to find a partition, or volume:
    -  Find it from the VLDB and then use that info to contact volserver
    -  Explicitly connect to a volserver using a hostname and learn
       partitions and contained volumes.

So, the Volume object needs refinement to break the tight coupling between
VLDB and Volume header info.

7. Administrative functions (aka vos command equivalents) work by COMPOSITION of actions on the VLDB and Volserver interfaces... This composition should NOT be enforced by the revised API.  Rather, Utilities classes that mirror "vos commands" can be implemented that act by composition on the Volserver and VLDB interfaces.  (no tight coupling!)


8.  Marcus mentioned PTS and Ka objects are too tightly coupled. Agreed!
The PTService is nothing more than a lookup name to id ; and the difference
between a user and group is whether or not that ID is >0 or <0. It's the
job of "something else" to make sure that Every pts user name maps on
to one with some Kerberos name (possible, in a Kerberos Realm that is NOT THE SAME as the AFS cell name!).  Further, it's possible for an ACL to have entries for user and groups that DONT EXIST in the current ptserver database.   The objects need to continue to behave properly in that particular case - because that case occurs when people dump and restore volumes across cells,  operate in disaster recover mode and deal with archival backups.  So, the User and Group objects need to SUPPORT the case where there is no backing PTS data and they must also be divorced of the kaserver.

9.  Keep administrative tools OUT OF the API.  As people have mentioned, in   the Land of Big Cells,  walking the entire VLDB or PTserver in an Admin UI is expensive and slow. That's just not a problem I want to solve in JAFS... People can build these tools with the primitives in JAFS on their
own ; and they can choose design-tactics that mitigate these costs - for instance, they can build their own LRU cache of Volume objects that they maintain, they can build their own Finder methods to search through a cache of VLDB objects, etc. With Java, these things are very easy, with C++ they
are also doable, C, Perl, others - good luck!

Really the goal is to expose finely granular objects, and do it right, and
keep the scope narrow to deliver something... 

-- Jim