Jena2 reification API proposal


1 introduction
  1.1 status
  1.2 context
2 presentation API
  2.1 retrieval
  2.2 creation
  2.3 equality
  2.4 isReified
  2.5 fetching
  2.6 listing
  2.7 removal
  2.8 input and output
3 performance

1 introduction

1.1 status

This document describes the reification API in Jena2, following discussions based on the 0.5a document. The essential decision made during that discussion is that reification triples are captured and dealt with by the Model transparently and appropriately.

1.2 context

The first Jena implementation made some attempt to optimise the representation of reification. In particular it tried to avoid so called 'triple bloat', ie requiring four triples to represent the reification of a statement. The approach taken was to make a Statement a subclass of Resource so that properties could be directly attached to statement objects.

There are a number of defects in the Jena 1 approach.

However, there are some supporters of the approach. They liked:
Since Jena was first written the RDFCore WG have clarified the meaning of a reified statement. Whilst Jena 1 took a reified statement to denote a statement, RDFCore have decided that a reified statement denotes an occurrence of a statement, otherwise called a stating. The Jena 1 .equals() methods for Statements is thus inappropriate for comparing reified statements.

The goal of reification support in the Jena 2 implementation are:

2 presentation API

Statement will no longer be a subclass of Resource. Thus a statement may not be used where a resource is expected. Instead, a new interface ReifiedStatement will be defined:

public interface ReifiedStatement extends Resource
    {
    public Statement getStatement();
    // could call it a day at that or could duplicate convenience
    // methods from Statement, eg getSubject(), getInt().
    ...
    }
The Statement interface will be extended with the following methods:
public interface Statement
    ...
    public ReifiedStatement createReifiedStatement();
    public ReifiedStatement createReifiedStatement(String URI);
/* */
    public boolean isReified();
    public ReifiedStatement getAnyReifiedStatement();
/* */
    public RSIterator listReifiedStatements();
/* */
    public void removeAllReifications();
    ...
RSIterator is a new iterator which returns ReifiedStatements. It is an extension of ResourceIterator.

The Model interface will be extended with the following methods:

public interface Model
    ...
    public ReifiedStatement createReifiedStatement(Statement stmt);
    public ReifiedStatement createReifiedStatement(String URI, Statement stmt);
/* */
    public boolean isReified(Statement st);
    public ReifiedStatement getAnyReifiedStatement(Statement stmt);
/* */
    public RSIterator listReifiedStatements();
    public RSIterator listReifiedStatements(Statement stmt);
/* */
    public void removeReifiedStatement(reifiedStatement rs);
    public void removeAllReifications(Statement st);
    ...
The methods in Statement are defined to be the obvious calls of methods in Model. The interaction of those models is expressed below. Reification operates over statements in the model which use predicates rdf:subject, rdf:predicate, rdf:object, and rdf:type with object rdf:Statement.

statements with those predicates are, by default, invisible. They do not appear in calls of listStatements, contains, or uses of the Query mechanism. Adding them to the model will not affect size(). Models that do not hide reification quads will also be available.

2.1 retrieval

The Model::as() mechanism will allow the retrieval of reified statements.

someResource.as( ReifiedStatement.class )
If someResource has an associated reification quad, then this will deliver an instance rs of ReifiedStatement such that rs.getStatement() will be the statement rs reifies. Otherwise a DoesNotReifyException will be thrown. (Use the predicate canAs() to test if the conversion is possible.)

It does not matter how the quad components have arrived in the model; explicitly asserted or by the create mechanisms described below. If quad components are removed from the model, existing ReifiedStatement objects will continue to function, but conversions using as() will fail.

2.2 creation

createReifiedStatement(Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a blank node.

createReifiedStatement(String URI, Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a Resource with the URI given.

2.3 equality

Two reified statements are .equals() iff they reify the same statement and have .equals() resources. Thus it is possible for equal Statements to have unequal reifications.

2.4 isReified

isReified(Statement st) is true iff in the Model of this Statement there is a reification quad for this Statement. It does not matter if the quad was inserted piece-by-piece or all at once using a create method.

2.5 fetching

getAnyReifiedStatement(Statement st) delivers an existing ReifiedStatement object that reifies st, if there is one; otherwise it creates a new one. If there are multiple reifications for st, it is not specified which one will be returned.

2.6 listing

listReifiedStatements() will return an RSIterator which will deliver all the reified statements in the model.

listReifiedStatements( Statement st ) will return an RSIterator which will deliver all the reified statements in the model that reifiy st.

2.7 removal

removeReifiedStatement(ReifiedStatement rs) will remove the reification rs from the model by removing the reification quad. Other reified statements with different resources will remain.

removeAllReifications(Statement st) will remove all the reifications in this model which reify st.

2.8 input and output

The writers will have access to the complete set of Statements and will be able to write out the quad components.

The readers need have no special machinery, but it would be efficient for them to be able to call createReifiedStatement when detecting an reification.

3 performance

Jena1's "statements as resources" approach avoided triples bloat by not storing the reification quads. How, then, do we avoid triple bloat in Jena2?

The underlying machinery is intended to capture the reification quad components and store them in a form optimised for reification. In particular, in the case where a statement is completely reified, it is expected to store only the implementation representation of the Statement.

createReifiedStatement is expected to bypass the construction and detection of the quad components, so that in the "usual case" they will never come into existance.

The details of this are described in a companion document.