Your Metadata Sucks: Discovery of *content* metadata on the web

A thought experiment...

I recently read an entertaining old article on various things people have been shoving into http response headers. Some for utility (X-XRDS-Location), and some for fun (slashdot's random X-Fry and X-Bender quotes). One site actually put a bunch of DC.title, DC.etc headers in their responses. Not that anyone's looking for them there, but *just* in case...

This got me thinking (again) about ways to provide richer metadata, especially RDF, about resources on the web. We have RDFa now, which is a big step forward, but there are a couple key problems we still don't have worked out:

ISSUE 1: How do we discover publisher-sanctioned resource descriptions for arbitrary resources on the web? (e.g., non-XHTML)

I think the http Link: response header is the right way forward on this: An isDescribedBy link, pointing to a resource whose representation encodes an RDF graph describing this resource.

ISSUE 2: Given that a resource and the content of a representation of that resource are distinct things, how do we make statements about the latter on the web?

This one deserves more explanation.

If I access http://example.org/Picture1, and my browser uses content negotiation to request the image/jpeg representation, and gets it, I want to be able to discover this kind of info:

@prefix    : <http://dear.lazyweb/please/write/this/ontology/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>

# The file is a JPEG and here's some basic info about it

_:myFile  a            :OctetStream;
        :name        "Picture1.jpg";
        :mediaType   "image/jpeg";
        :format      <info:pronom/fmt/42>
        :length      105124;
        :md5sum      "7846df5ced300e9543a267a856c4ab6e";
        :sha1sum     "e3b5112b24e793f41fc5b843a505a83a80aaf776";
        :created     "2009-08-31T10:12.342Z"^^xsd:dateTime;
        :modified    "2009-08-31T16:28.921Z"^^xsd:dateTime;
        :renditionOf <http://example.org/someImage>

# The file is one of any number of renditions of a picture

<http://example.org/Picture1>
        dc:title       "Best Picture Ever";
        dc:description "This is a picture of my cat, Lucky"
        dc:creator     "Bob Dobbs".

What would be cool is if my browser knew about the http Link response header, and the metadata was just a click away, in an RDFa document.

The trick would be for user-agents to be able to associate the particular rendition I got by GETting the resource with the appropriate resource in this graph. Notice it's a bNode in the example above. It might have a URI, it might not; but the URI of the rendition isn't known by the user-agent when it retrieves this graph....and the relation expressed by the http Link header is to be interpreted as "(the resource identified by this URI) isDescribedBy (the graph resource over there)"

So, absent some additional information, in the general case, the user-agent is going to have to do the association via some distinctive property matching: Did the response of the original GET request on the picture include a Content-MD5 header? If so, that's a good clue. Hmmm.

Your Metadata Sucks

Monday, August 31, 2009

Discovery of content metadata on the web

No comments:

About Me

Me Elsewhere

My GPG Fingerprints

History