A crucial component of the app I’m developing for Science Hack Day 2010 involves annotation of XML+MathML documents directly in the browser. The ultimate goal of annotation is to provide enough metadata in the document to fully and accurately describe it’s contents. In particular this means each document should have metadata about the various sections within it, the types of section (definitions, lemmas, theorems, proofs, &c.), the proof technique(s) used and a statement about whether or not the conclusion of the proof actually holds given the arguemnts used. It’s important to be clear about the level of details I’m aiming to capture: I just want to markup technique names and not their constituent steps.
There are actually two types of annotation I will need to support:
- RDF Metadata Annoations
- OWL Ontology Building Annotations
In the first case a user marks up a document with rdf metadata to describe the proof techniques the author has used based on a currently known description of that technique form the proof techniques ontology and in the latter case the user marks up the document to indicate the salient parts of a technique an author has used, usually when a new technique is encountered for which there is no adequate description in the current ontology.
In both cases it’s important that a user should be able to select any arbitrary subset of the document, i.e. not just those segmented by the pre-existing tags supplied by the original document author. For example, the document might contain a large paragraph of text containing several distinct lines of argument but without any XHTML markup within it.
Tyring to get my head around annotations consumed a large proportion of my time at HackCamp last week and so over the last few days I’ve done quite a bit of research about various frameworks that might be useful for this project. A good overview of semantic web annotations can be found here. An important note from that article is that annotation is the most labour intensive part of the the process and so without an effective collaboration space and the tools to make it work it can become a bottleneck in the system. For this reason I feel it’s crucial that users should be able to do annotations in the browser without needing to install any third party program of add-on.
Another issue raised in that article is about storage of annotations. Given the end goal of annotations stated at the start of this blog post there are several ways of achiving this. This means I should design my system in such a way that it will accommodate any other (hopefully more effective) way of adding metadata as the system evolves. For now however I have made the design decision that the system will not be automated but will crowdsource these annotations by allowing per-user annotations for each document in the system. The alternative would be a wiki-style set of annotations, i.e. each page would have a single (current) set of annotations which would be version-controlled. The advantage of my proposed scheme is that one can determine whether of not a particular theorem or lemma was validly proved by a majority vote rather than a single continuously flip-flopping flag in the current annotation for that section.
This implies a seperation of the annotations from the document, although certains views of the document (such as the set of annotations from a specific user) will be served from database with inline RDFa by applying various queries and style sheets to the underlying data. From this point of view the Annotea Protocol is ideal as a basis for storage of annotations in this project.
I have yet to come across an implementation of an Annotea-compliant anotation server I can deploy locally (though there is the W3C test server) so this is potentially something I’ll have to build myself. In contrast, there are many examples of semantic web annotation clients. A list of these implementations can be found here and I shall describe below some of the more interesting ones.
At the heart of the Annotea protocol are the annotaions themselves which are described using RDF and XPointer. XPointer is a crucial technology for the sorts of free-form annotations I’m looking to build into my app since they allow capture these positional specifications very well. (For instance, compare with the JISC-funded RDF-based annotation system Caboto which appears to only allow Delicious-style whole page annotations.) Thus to build a client it must have some form of access to information that allows it to specify positions using XPointer.
I now briefly described some of the systems that didn’t make the cut. Amaya has native support for Annotea-compliant anotation servers and XPointer but is a stand-alone package. Similarly OntoMat-Annotizer is another Java-based web browser that provides annotation support focussed on Ontologies specifically. One framework whose aims are very close my ones for this project is sClippy. However their implementation is an Eclipse plugin for which no source has been released. Also their annotations are inline with the text being annotated and not in a seperate Annotea-compliant database.
- COHSE Annotator is for Microsoft Internet Explorer. Includes an example annotations server, although it’s not clear to what extent it supports Annotea.
- Annozilla Firefox add-on supports the Annotea protocol and builds upon XPointerLib.
- Yawas (Yet Another Web Annotations System) is an add-on for Firefox/Chrome that uses Google Bookmarks as an annotation server.
- The Semantic Web Widget Library includes an annotator example. The author’s (Jim Hollenbach’s) approach to the annotator is described in his thesis, which was recently submitted. As such the code is rather difficult to work with a present although there is some documentation available. Annotation storage is to a custom annoation server since the aim was to try out certain security features.
One other possibility is to fake XPointers by being less precise, given a user selection, about the range selected. This could be implemented in several browsers using the DOM Range capability together with broweser intereaction objects.
The survey article I talked aout above goes on to talk about ontology evolution (i.e. version control for ontologies) and as I’ve indicated above the ability to identify new proof techniques and add them to the onotlogy is a fundamtal part of the system therefore in a later blog post I shall describe ways of doing version control for ontologies.