Monday, May 04, 2009

That's Classy

Here's a simple program to report on Java .class versions. I'm sure some variant of this has been written a thousand times, but Google wouldn't give me what I wanted right away, so here it is again :)

The program takes one argument: a path to a .class file, .jar file, or directory containing a mixture of both, and produces a report of each class file's major .class format version (50 for Java 6, 49 for Java 5, and so on). Handy if you want to track down those new fangled classes and avoid the dreaded java.lang.UnsupportedClassVersionError

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;

public abstract class ThatsClassy {

static void classyFile(File file) throws Exception {
if (file.isDirectory())
for (File child: file.listFiles())
classyFile(child);
else if (file.getName().endsWith(".jar"))
classyJar(file);
else if (file.getName().endsWith(".class"))
classyClass(file.getPath(), new FileInputStream(file), true);
}

static void classyJar(File jarFile) throws Exception {
JarInputStream jarStream = new JarInputStream(new FileInputStream(jarFile));
JarEntry entry = jarStream.getNextJarEntry();
while (entry != null) {
if (entry.getName().endsWith(".class"))
classyClass(jarFile.getName() + "#" + entry.getName(), jarStream, false);
entry = jarStream.getNextJarEntry();
}
jarStream.close();
}

static void classyClass(String id, InputStream in, boolean close) throws Exception {
in.skip(7);
int majorClassVersion = in.read();
if (close) in.close();
System.out.println(id + " " + majorClassVersion);
}

public static void main(String[] args) throws Exception {
classyFile(new File(args[0]));
}
}

Monday, February 16, 2009

dev8D Tweet Cloud

Wordle: Dev8D Tweets Here's my abbreviated trip report: dev8D was a big success -- any conference that gets developers together and avoids long monologues is a winner in my book. As a side note, I think twitter works pretty well as a backchannel. I noticed at least one person created a separate account to avoid spamming their regular followers. Not a bad idea.

Saturday, October 04, 2008

Multi-project Subversion Commit Notification

I recently had to set up commit notification for a repository hosting multiple projects and thought I'd write up my experience here.

There are several ways to set up commit notification in subversion. Each involves the use of the post-commit hook. Here's how it works: After the subversion repository successfully commits a change, if it finds an executable file, /path/to/svn/hooks/post-commit, it will be invoked with two arguments. The first is the path to the repository, and the second is the revision number of the commit.

The content of post-commit can be whatever you want. In practice, most people make it a shell script that just invokes a utility like svnnotify to get things done.

Since the repository I was working on is hosting multiple projects (ala apache), each top-level project has it's own codewatch mailing list. I don't want to spam each project with every change to unrelated projects in the repository. So, based on arguments passed to post-commit, I had to start by determining which project the change was relevant to. I used the svnlook utility for this, like so:
# Get the first top-level directory changed by the commit
# Note: svnlook's dirs-changed output is multi-line, and
# each line looks like "projname/trunk/etc"
PROJ=`/usr/bin/svnlook dirs-changed -r $2 $1|head -1|sed -e 's/\/.*//g'`
Once I had that information, the rest was straightforward. Here's the whole script.

Saturday, July 12, 2008

Fedora Commons Repository - Lines of Code

We're wrapping up our last branches before the 3.0 final code freeze. I got curious last night about how the maintenance branch (2.2.x line) and the trunk (3.0 line) compared in terms of lines of code.

So I decided to pull up the archives of past releases and do a per-release comparison of everything under src/java/fedora. Here's what LocMetrics and Gnuplot told me:




It's hard to draw any definitive conclusions about the SLOC metric, but it's safe to say it's directly related to maintenance cost. And it's interesting to see how certain features / architectural changes affect it.

Saturday, June 07, 2008

Installing Fedora in Two Minutes

Want to get a Fedora repository up and running as quickly as possible?

This screencast uses the installer's "quick" option to skip all the hard questions.


The "quick" option is useful if you've never installed Fedora before and just want to get acquainted. For more serious use, you'll want the "custom" option. And the installation guide :)

Friday, December 21, 2007

Current CMA Documentation Available

Coinciding with the availability of Fedora 3.0 Beta 1 this week, the first round of semi-official CMA (formerly called CMDA) documentation is now available: The Fedora Content Model Architecture. As Dan points out, we'll be doing some name changes before it's all said and done, but so far this is the most up-to-date diagram of the supporting object-object relationships:

As implemented, the BDef and BMech objects are basically unchanged. Here's what the new CModel control object looks like:

The DS-COMPOSITE-MODEL datastream specifies the structural requirements of member objects. The dsCompositeModel.xsd schema describes the expected format. For example, here's the DS-COMPOSITE-MODEL of info:fedora/fedora-system:ContentModel:

<dsCompositeModel xmlns="...">
<dsTypeModel ID="RELS-EXT">
<form MIME="text/xml">
<dsTypeModel>
<dsTypeModel ID="DC">
<form MIME="text/xml">
<dsTypeModel>
<dsTypeModel ID="DS-COMPOSITE-MODEL">
<form MIME="text/xml">
<dsTypeModel>
</dsCompositeModel>

Pretty simple. It says, member objects must have at least these datastreams, and each be in the form specified. If multiple forms are listed in a single dsTypeModel, the datastream may be in any of those forms.

Sunday, December 16, 2007

Fedora 3.0 - Where's the Binding Map?

Okay, I'm excited.

After several months of effort, Fedora Commons 3.0 Beta 1 should go live sometime this week. For most Fedora users, this Beta will be their first real exposure to the Content Model Dissemination Architecture, or CMDA. (this name is subject to change before 3.0-final)

Among other things, the CMDA allows people to attach runtime behaviors to digital objects at a class level. This architectural change has been a long time coming for Fedora, and we've worked hard to get the design right. Dan is working on the official design doc for publication with the software, but here's a simple overview of how it works:

The Fedora-defined CMDA relationships are expressed in RDF in the RELS-EXT datastream of each referring object. As long as all the necessary relationships exist, Fedora will use them to provide the desired behaviors for each data object. By design, the Resource Index does not need be enabled for this to work.

One question that will inevitably arise for those familiar with Fedora's traditional disseminators is, "Where's the Binding Map?". The short answer is, they no longer exist. For the long answer, continue reading.

Background
To support extensible "views" or "behaviors" on digital objects, prior versions of Fedora required each object to include a special piece of metadata called a disseminator. The disseminator included a reference to a "Behavior Definition" (an object that defines the behaviors), a "Behavior Mechanism" (an object that grounds the behaviors to a specific implementation), and lastly, a "Datastream Binding Map". The binding map's purpose was to map the datastream IDs in the object to specific input requirements of the BMech.

CMDA Implementation of Behaviors

With the CMDA, behavior subscription is now done at the content model level. Among other useful properties, this design allows people to significantly change behaviors for whole classes of objects without making changes to (or visiting) every single one.

Since the content model object would now appear to occupy the role of the old per-object disseminator, if a datastream-to-BMech-input mapping existed, it would go in the content model, right?

Actually, I don't think so. In general, a content model is intended to be a sharable object that survives through time. It a) describes a class of objects by their structure, and b) indicates which operations/behaviors they should have within a repository. In order for it to be as sharable and survivable as possible, the content model must not dictate *how* the operations are to be executed. That's the job of the BMech.

Part of the "how" is deciding which (if any) of the datastreams defined by the content model actually need to be given as input to the code that executes the behavior. At a high level, BMechs are bound to content models, and not vice-versa. The direction of the relation is important. It's the BMech's job to pick apart the content model it works with and decide how it's going to fulfill the contract with the given pattern of data.

Therefore the mapping, if necessary, is really a BMech implementation detail. But if a BMech only isContractor for one content model, then there's really no point to having the extra indirection...just make the part names in the BMech match the datastream IDs and be done with it. That's the simplest approach, and the one that I think will get people "up and running" with the CMDA the quickest.

But, you ask, what if you want to use the same BMech for content models that differ only in their datastream IDs? First, if possible, consider merging those content models. It'll make life easier for you in the long run. If that's impractical or doesn't make sense for your use case, then just create a BMech for each -- one that only differs in the part names used.

For Fedora 3.0b1, what this means in a practical sense is that people who have lots of variance in their datastream IDs will either need to "bring them in line" (which is a very practical thing to do in its own right, for ease of management), or will need to define different content models for them, which use different BMechs, even if they formerly used the same BMechs.

The migration tools (which I'm writing the docs for now) will do the latter automatically, creating Content Models and BMech copies with appropriate IDs automatically. If people want a "cleaner" upgrade, they need to invest some sweat in getting their datastream IDs consistent prior to running the analysis (the first of three phases of migration) so they don't end up with too-unmanageable a set of BMech copies.

3.0-final and Beyond
Two things absent from the Beta 1 release, which should be present 3.0-final are 1) the ability to assert object-object relationship constraints as part of the formal definition of a content model, and 2) a basic validator that can take a content model and an object that claims to adhere to it, and tell whether it actually complies or not.

For 3.0b1, we've kept the "Fedora Object Type" idea around. Viewed through this old lens, there are only four basic kinds of Fedora digital objects. We know that there is some overlap with the "typing" introduced by the CMDA. As the CMDA takes hold, I think the idea of "Fedora Object Type" can be gracefully subsumed by content model.

In future releases, the BMech will also evolve to something more flexible. We know people have got a lot of mileage out simple web service HTTP GET bindings, but other methods, protocols, and even in-VM code bindings are definitely called for. With the CMDA, we are now in a much better position to do these things.

Another idea that keeps popping up in CMDA discussions is, can an object be it's own content model? Or from a slightly different angle: Can a content model play the role of a Data Object, and thus act as a template? Also, what about multiple content models per object? Inheritance?

Blue Skies - CC Licensed - by Sybren Stüvel - http://www.flickr.com/photos/sybrenstuvel/520362534/
These questions hit on design, implementation, and best practices issues, all of which we are now in a much better position to discuss with the release of 3.0b1. I'm looking forward to it.