Friday, December 21, 2007

Current CMA Documentation Available

Coinciding with the availability of Fedora 3.0 Beta 1 this week, the first round of semi-official CMA (formerly called CMDA) documentation is now available: The Fedora Content Model Architecture. As Dan points out, we'll be doing some name changes before it's all said and done, but so far this is the most up-to-date diagram of the supporting object-object relationships:

As implemented, the BDef and BMech objects are basically unchanged. Here's what the new CModel control object looks like:

The DS-COMPOSITE-MODEL datastream specifies the structural requirements of member objects. The dsCompositeModel.xsd schema describes the expected format. For example, here's the DS-COMPOSITE-MODEL of info:fedora/fedora-system:ContentModel:

<dsCompositeModel xmlns="...">
<dsTypeModel ID="RELS-EXT">
<form MIME="text/xml">
<dsTypeModel>
<dsTypeModel ID="DC">
<form MIME="text/xml">
<dsTypeModel>
<dsTypeModel ID="DS-COMPOSITE-MODEL">
<form MIME="text/xml">
<dsTypeModel>
</dsCompositeModel>

Pretty simple. It says, member objects must have at least these datastreams, and each be in the form specified. If multiple forms are listed in a single dsTypeModel, the datastream may be in any of those forms.

Sunday, December 16, 2007

Fedora 3.0 - Where's the Binding Map?

Okay, I'm excited.

After several months of effort, Fedora Commons 3.0 Beta 1 should go live sometime this week. For most Fedora users, this Beta will be their first real exposure to the Content Model Dissemination Architecture, or CMDA. (this name is subject to change before 3.0-final)

Among other things, the CMDA allows people to attach runtime behaviors to digital objects at a class level. This architectural change has been a long time coming for Fedora, and we've worked hard to get the design right. Dan is working on the official design doc for publication with the software, but here's a simple overview of how it works:

The Fedora-defined CMDA relationships are expressed in RDF in the RELS-EXT datastream of each referring object. As long as all the necessary relationships exist, Fedora will use them to provide the desired behaviors for each data object. By design, the Resource Index does not need be enabled for this to work.

One question that will inevitably arise for those familiar with Fedora's traditional disseminators is, "Where's the Binding Map?". The short answer is, they no longer exist. For the long answer, continue reading.

Background
To support extensible "views" or "behaviors" on digital objects, prior versions of Fedora required each object to include a special piece of metadata called a disseminator. The disseminator included a reference to a "Behavior Definition" (an object that defines the behaviors), a "Behavior Mechanism" (an object that grounds the behaviors to a specific implementation), and lastly, a "Datastream Binding Map". The binding map's purpose was to map the datastream IDs in the object to specific input requirements of the BMech.

CMDA Implementation of Behaviors

With the CMDA, behavior subscription is now done at the content model level. Among other useful properties, this design allows people to significantly change behaviors for whole classes of objects without making changes to (or visiting) every single one.

Since the content model object would now appear to occupy the role of the old per-object disseminator, if a datastream-to-BMech-input mapping existed, it would go in the content model, right?

Actually, I don't think so. In general, a content model is intended to be a sharable object that survives through time. It a) describes a class of objects by their structure, and b) indicates which operations/behaviors they should have within a repository. In order for it to be as sharable and survivable as possible, the content model must not dictate *how* the operations are to be executed. That's the job of the BMech.

Part of the "how" is deciding which (if any) of the datastreams defined by the content model actually need to be given as input to the code that executes the behavior. At a high level, BMechs are bound to content models, and not vice-versa. The direction of the relation is important. It's the BMech's job to pick apart the content model it works with and decide how it's going to fulfill the contract with the given pattern of data.

Therefore the mapping, if necessary, is really a BMech implementation detail. But if a BMech only isContractor for one content model, then there's really no point to having the extra indirection...just make the part names in the BMech match the datastream IDs and be done with it. That's the simplest approach, and the one that I think will get people "up and running" with the CMDA the quickest.

But, you ask, what if you want to use the same BMech for content models that differ only in their datastream IDs? First, if possible, consider merging those content models. It'll make life easier for you in the long run. If that's impractical or doesn't make sense for your use case, then just create a BMech for each -- one that only differs in the part names used.

For Fedora 3.0b1, what this means in a practical sense is that people who have lots of variance in their datastream IDs will either need to "bring them in line" (which is a very practical thing to do in its own right, for ease of management), or will need to define different content models for them, which use different BMechs, even if they formerly used the same BMechs.

The migration tools (which I'm writing the docs for now) will do the latter automatically, creating Content Models and BMech copies with appropriate IDs automatically. If people want a "cleaner" upgrade, they need to invest some sweat in getting their datastream IDs consistent prior to running the analysis (the first of three phases of migration) so they don't end up with too-unmanageable a set of BMech copies.

3.0-final and Beyond
Two things absent from the Beta 1 release, which should be present 3.0-final are 1) the ability to assert object-object relationship constraints as part of the formal definition of a content model, and 2) a basic validator that can take a content model and an object that claims to adhere to it, and tell whether it actually complies or not.

For 3.0b1, we've kept the "Fedora Object Type" idea around. Viewed through this old lens, there are only four basic kinds of Fedora digital objects. We know that there is some overlap with the "typing" introduced by the CMDA. As the CMDA takes hold, I think the idea of "Fedora Object Type" can be gracefully subsumed by content model.

In future releases, the BMech will also evolve to something more flexible. We know people have got a lot of mileage out simple web service HTTP GET bindings, but other methods, protocols, and even in-VM code bindings are definitely called for. With the CMDA, we are now in a much better position to do these things.

Another idea that keeps popping up in CMDA discussions is, can an object be it's own content model? Or from a slightly different angle: Can a content model play the role of a Data Object, and thus act as a template? Also, what about multiple content models per object? Inheritance?

Blue Skies - CC Licensed - by Sybren Stüvel - http://www.flickr.com/photos/sybrenstuvel/520362534/
These questions hit on design, implementation, and best practices issues, all of which we are now in a much better position to discuss with the release of 3.0b1. I'm looking forward to it.