Sunday, January 28, 2007

Resources, Representations, Repositories, and RDF

Last week, Carl Lagoze gave an update on the OAI-ORE work at Open Repositories '07. ORE is a new project that intends to specify how heterogeneous repositories can exchange information about the digital objects they hold. Although they're not necessarily going after a new protocol, I still think of it as taking OAI-PMH to the next level. It's not just about metadata anymore.

For me, the most interesting parts of the talk were webarch-related. It all started with the statement (to paraphrase) "we must build on the web architecture". Carl then pointed out how representations are essentially second-class citizens on the web.

That got me thinking. At the most basic level, repositories are all about managing bitstreams (whether they're considered data or metadata). In webarch, bitstreams seem to equate to what they call "representation data". And a representation is defined by how it relates to a resource:
"A representation is data that encodes information about resource state."
So, in w3c-speak, a repository manages representation data. Okay, that's just a terminology change. But what about this statement:
"For robustness, Web architecture promotes independence between an identifier and the state of the identified resource."
That makes a whole lot of sense for the web when you consider how often web pages change. But what does it mean for repositories? How do we manage bitstreams if we can't identify them? The answer must be one of the following:
  • Indirect identification. Identify the associated "resource" in order the work with the bitstream(s).
  • Reification. Elevate the bitstream to a "resource" so we can talk about it.
How about if we want to model the repository as an RDF graph? Well, we know that representations can have metadata in addition to the payload. So in order to do this modeling, we need to reify. Internal to a repository, representation triples might look something like:

representationA represents urn:example:someTextFile
representationA contentType "text/plain"
representationA payloadLocation "/path/to/someTextFile.txt"

I think the OAI-ORE work is going to attempt something like the above: a model (and maybe a format?) for expressing resource-representation information in a repository-neutral way. It will be interesting to see what pops out.