Your Metadata Sucks

Thursday, December 03, 2009

Dot Plan from 1995

Before the inter-twitter-facebook-blogweb, or whatever you kids call it, there was Finger. Finger was cool because only geeks knew about it. You'd post your status to your .plan file and people anywhere in the world could type "finger some-obsure-userid@some-obsure-host.edu" to see it.

It was like blogging, but with an even slimmer chance of having an audience. Great stuff.

Anyway, I was rooting around my old account at csh tonight and found this my .plan:

class CS2

creation
    brain_washing

feature -- Global variables

    student: STUDENT
    clean: INTEGER is unique
    warped: INTEGER is unique

feature -- Main program

    brain_washing is
    do
        from
            !!student.make
            student.mind := clean
        until
            student.mind = warped or world.end_of
        loop
            student.io.putstring( "EIFFEL is Good%N" )
            student.io.putstring( "Don't worry that your executables " )
            student.io.putstring( "are usually over 20,000 times larger " )
            student.io.putstring( "than the source code.%N" )
            if student.resists then
                student.attend_lecture
                student.attend_lecture
                student.attend_lecture
                student.attend_lab
            end
        end -- loop
    end -- brain_washing

end -- CS2

Clearly, this is an important digital artifact to preserve.

By posting it here, I feel I have played an important role in format migration for future generations. Thank you.

Friday, September 04, 2009

An extra cent?

It often happens that my flight price goes up while I'm in the process of booking. I thought it was pretty shady the first few times it happened. Now I just accept it and move on. But I thought this one was a little bizarre today:

I can't help but wonder if Peter Gibbons is behind this in some way.

Monday, August 31, 2009

Discovery of content metadata on the web

A thought experiment...

I recently read an entertaining old article on various things people have been shoving into http response headers. Some for utility (X-XRDS-Location), and some for fun (slashdot's random X-Fry and X-Bender quotes). One site actually put a bunch of DC.title, DC.etc headers in their responses. Not that anyone's looking for them there, but *just* in case...

This got me thinking (again) about ways to provide richer metadata, especially RDF, about resources on the web. We have RDFa now, which is a big step forward, but there are a couple key problems we still don't have worked out:

ISSUE 1: How do we discover publisher-sanctioned resource descriptions for arbitrary resources on the web? (e.g., non-XHTML)

I think the http Link: response header is the right way forward on this: An isDescribedBy link, pointing to a resource whose representation encodes an RDF graph describing this resource.

ISSUE 2: Given that a resource and the content of a representation of that resource are distinct things, how do we make statements about the latter on the web?

This one deserves more explanation.

If I access http://example.org/Picture1, and my browser uses content negotiation to request the image/jpeg representation, and gets it, I want to be able to discover this kind of info:

@prefix    : <http://dear.lazyweb/please/write/this/ontology/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>

# The file is a JPEG and here's some basic info about it

_:myFile  a            :OctetStream;
        :name        "Picture1.jpg";
        :mediaType   "image/jpeg";
        :format      <info:pronom/fmt/42>
        :length      105124;
        :md5sum      "7846df5ced300e9543a267a856c4ab6e";
        :sha1sum     "e3b5112b24e793f41fc5b843a505a83a80aaf776";
        :created     "2009-08-31T10:12.342Z"^^xsd:dateTime;
        :modified    "2009-08-31T16:28.921Z"^^xsd:dateTime;
        :renditionOf <http://example.org/someImage>

# The file is one of any number of renditions of a picture

<http://example.org/Picture1>
        dc:title       "Best Picture Ever";
        dc:description "This is a picture of my cat, Lucky"
        dc:creator     "Bob Dobbs".

What would be cool is if my browser knew about the http Link response header, and the metadata was just a click away, in an RDFa document.

The trick would be for user-agents to be able to associate the particular rendition I got by GETting the resource with the appropriate resource in this graph. Notice it's a bNode in the example above. It might have a URI, it might not; but the URI of the rendition isn't known by the user-agent when it retrieves this graph....and the relation expressed by the http Link header is to be interpreted as "(the resource identified by this URI) isDescribedBy (the graph resource over there)"

So, absent some additional information, in the general case, the user-agent is going to have to do the association via some distinctive property matching: Did the response of the original GET request on the picture include a Content-MD5 header? If so, that's a good clue. Hmmm.

Monday, May 04, 2009

That's Classy

Here's a simple program to report on Java .class versions. I'm sure some variant of this has been written a thousand times, but Google wouldn't give me what I wanted right away, so here it is again :)

The program takes one argument: a path to a .class file, .jar file, or directory containing a mixture of both, and produces a report of each class file's major .class format version (50 for Java 6, 49 for Java 5, and so on). Handy if you want to track down those new fangled classes and avoid the dreaded java.lang.UnsupportedClassVersionError


import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;

public abstract class ThatsClassy {

 static void classyFile(File file) throws Exception {
   if (file.isDirectory())
     for (File child: file.listFiles())
       classyFile(child);
   else if (file.getName().endsWith(".jar"))
     classyJar(file);
   else if (file.getName().endsWith(".class"))
     classyClass(file.getPath(), new FileInputStream(file), true);
 }

 static void classyJar(File jarFile) throws Exception {
   JarInputStream jarStream = new JarInputStream(new FileInputStream(jarFile));
   JarEntry entry = jarStream.getNextJarEntry();
   while (entry != null) {
     if (entry.getName().endsWith(".class"))
       classyClass(jarFile.getName() + "#" + entry.getName(), jarStream, false);
     entry = jarStream.getNextJarEntry();
   }
   jarStream.close();
 }

 static void classyClass(String id, InputStream in, boolean close) throws Exception {
   in.skip(7);
   int majorClassVersion = in.read();
   if (close) in.close();
   System.out.println(id + " " + majorClassVersion);
 }

 public static void main(String[] args) throws Exception {
   classyFile(new File(args[0]));
 }
}

Monday, February 16, 2009

dev8D Tweet Cloud

Here's my abbreviated trip report: dev8D was a big success -- any conference that gets developers together and avoids long monologues is a winner in my book. As a side note, I think twitter works pretty well as a backchannel. I noticed at least one person created a separate account to avoid spamming their regular followers. Not a bad idea.

Saturday, October 04, 2008

Multi-project Subversion Commit Notification

I recently had to set up commit notification for a repository hosting multiple projects and thought I'd write up my experience here.

There are several ways to set up commit notification in subversion. Each involves the use of the post-commit hook. Here's how it works: After the subversion repository successfully commits a change, if it finds an executable file, /path/to/svn/hooks/post-commit, it will be invoked with two arguments. The first is the path to the repository, and the second is the revision number of the commit.

The content of post-commit can be whatever you want. In practice, most people make it a shell script that just invokes a utility like svnnotify to get things done.

Since the repository I was working on is hosting multiple projects (ala apache), each top-level project has it's own codewatch mailing list. I don't want to spam each project with every change to unrelated projects in the repository. So, based on arguments passed to post-commit, I had to start by determining which project the change was relevant to. I used the svnlook utility for this, like so:

# Get the first top-level directory changed by the commit
# Note: svnlook's dirs-changed output is multi-line, and
#       each line looks like "projname/trunk/etc"
PROJ=`/usr/bin/svnlook dirs-changed -r $2 $1|head -1|sed -e 's/\/.*//g'`

Once I had that information, the rest was straightforward. Here's the whole script.

Saturday, July 12, 2008

Fedora Commons Repository - Lines of Code

We're wrapping up our last branches before the 3.0 final code freeze. I got curious last night about how the maintenance branch (2.2.x line) and the trunk (3.0 line) compared in terms of lines of code.

So I decided to pull up the archives of past releases and do a per-release comparison of everything under src/java/fedora. Here's what LocMetrics and Gnuplot told me:

It's hard to draw any definitive conclusions about the SLOC metric, but it's safe to say it's directly related to maintenance cost. And it's interesting to see how certain features / architectural changes affect it.