Jena - FAQ

home » documentation » faq

Index

SPARQL and query processing

See the ARQ documentation and ARQ FAQ. See also the ARQ web site for new versions of ARQ.

How do I do test for substrings of literals?

Answers

General

Q: Why do I get a java.lang.NoClassDefFoundError when I run a Jena application?
A: This means that one or more of the libraries that Jena depends on is not on your classpath. Typically, all of the libraries (.jar files) in $JENA/lib, where $JENA refers to the directory in which you installed Jena, should be on your classpath. Consult the documentation for your JDK for details on setting the classpath for your system. There are also a number of on-line tutorials for setting the Java classpath. Consult Google or see here.

Q: What does the error 'java.lang.NoSuchFieldError: actualValueType' mean?
A: This is almost always due to using the wrong version of the Xerces library. Jena makes use of XML Schema support that changed at Xerces 2.6.0 and is not compatible with earlier versions. At the time of writing Jena ships with Xerces 2.6.1.
In some situations your runtime environment may be picking up an earlier version of Xerces from an "endorsed" directory and you will need to either disable use of that endorsed library or replace it by a more uptodate version of Xerces. This occurs with tomcat 5.* and certain configurations of jdk 1.4.1.

Q: What versions of library jars does Jena require?
A: Jena makes use of several third party Java libraries. Copies of each of these is included in the $JENA/lib area of the distribution and we recommend including all of these jars in your classpath. In some circumstances applications already make use of specific versions of these libraries (e.g. Xerces) and need to check if the version they are using is compatible with those shipped by Jena. The current library versions used by Jena are given here.

Q: How do I get the most up-to-date version of Jena?
A: Released versions of Jena are available from the downloads page on SourceForge. However, there may be changes and bug fixes that have been added to the Jena codebase that are not yet available as part of a release. To get the most up-to-date version of Jena, download the source code from CVS (instructions for this are available here). To compile the source code and generate a new jena.jar file, Ant must be installed and on the path. The command to build Jena is the default Ant target, so it only necessary to cd to the Jena root directory, which should contain the file build.xml, and issue the command ant.

To test that the new compiled version of Jena is working correctly, run the script test.bat (Windows) or test.sh (Linux or Cygwin) to run the full suite of Jena regression tests. There should be no test failures. Note: however that non-release versions of Jena may not be as fully tested and stable as the formal releases.

Q: How do I reduce the space needed to deploy Jena (give Jena a smaller footprint)?
A: Jena attempts to follow the various relevant specifications very closely, which means that both Jena itself, and the libraries (.jar files) it depends on, are quite large. For example, the icu4j library assists with the correct interpretation of URI's encoded in international character sets. For many applications the size of the Jena deployment is not a problem. There are some circumstances, however, where reducing the size of the installed libraries is important - for example, when installing a semantic web application on a mobile device. To reduce the storage space required for Jena itself you can build a different version of jena.jar, using the command:

    ant jar-optimised

This builds a version of Jena with no symbols or other debugging information (see above for instructions on using ant). Note that this will make error stacktraces less informative.

The library jenatest.jar contains Jena's unit test suite. This does not have to be included when deploying a Jena application. Other libraries from the Jena lib directory may be left out, with caution, if the functionality of that library is not required. For example, if the application is only going to handle ascii text, it should not be necessary to install the icu4j library. However, users should be aware that the only supported configuration of Jena is to deploy with all of the .jar files from the lib/ directory, except for jenatest.jar or an alternative optimised jena.jar. Other configurations may well work, but are at the user's own risk.

RDF model API

Q. Why does the localname part of my URI look wrong?
A: In Jena it is possible to retrieve the localname part of a Resource or Property URI. Sometimes developers create Resources with a full URI reference but find that the result of a getLocalName call is not quite what they expected. This is usually because the URI is ill-formed or cannot be correctly split in the way you expected. The only reason for separating namespace and local name is to support the XML serialization in which qnames are used for properties and classes. Thus the main requirement of the split is that the localname component must be a legal XML NCName. This means it must start with a letter or _ character and can only contain limited punctuation. In particular, they can't contain spaces, but then spaces are not legal in URI references anyway. In general, it is best to not use the localname split to encode any information, you should only be concerned with it if you are coding a parser or writer.

Q. How do I change the URI or localName of a Resource?
A: In Jena, the URI of a resource is invariant. So there is no setLocalName(), or setURI() method, and there will never be one.
The only way to "rename" a resource is to remove all of the statements that mention resource R, add add new statements with R replaced by R'. A utility for doing this is provided:
com.hp.hpl.jena.util.ResourceUtils.renameResource()
If you are working with inference or ontology models, you need to be careful to do this on the base model, not the entailment (aka inference) model.

Why do I see the warning message 'Detected a NaN anomaly believed to be due to use of JDK 1.4.1'?
A: You're using an obsolete version of Jena and JDK 1.4.1. As a side effect of some changes post-Jena 2.1 we started seeing random error message of the form 'Illegal load factor: NaN' when creating a HashMap. This appears to be a JDK bug in that the call is perfectly legal and the error message is seen frequently under JDK 1.4.1 but has never been seen under 1.3.1 or 1.4.2. We provided a work around which simply tries again to create the HashMap and this log message indicates that the work around has been triggered. Later versions of Jena ceased using that code.

To prevent it occuring switch to JDK 1.4.2 or later and upgrade your Jena. If you continute to see the message please let us know.

Reasoner and inference models

Q. I want to develop my own rules, how do I get started?
A: The GenericRuleReasoner is the place to start. You can create instances of this reasoner by supplying either an explicit set of Rule objects or a configuration description (as a Jena Model) that points to a local rule file. See the inference documentation for more details:
inference/index.html#rules

Q. Why are there two different arrows ( -> and <- ) in the rule syntax?
A: As explained in the documentation there are two rule systems available - a forward chainer and a backward chainer. You can chose to use either or use the two together in a hybrid mode.
So if we use Ti as short hand for triple patterns like (?x rdf:type ?C), and if we ignore functors and procedural call out for now, then the syntax:
     T1, T2, ... TN -> T0 .
means that if the triple patterns T1 to TN match in the data set then then the triple T0 can be deduced as a consequence. Similarly
     T0 <- T1, T2, ... TN .
means the same thing - the consequence is always on the "pointy" end of the arrow.
Now if you are just using pure forward or backward rules then you could chose to use either syntax interchangeably. This allows you to write a rule set and use it in either mode. Though in practice "->" is the more conventional direction in forward systems and "<-" is the more conventional one in backward systems.
The hybrid configuration allows you to create new backward rules as a result of forward rules firing so that the syntax:
     T1, T2 -> [T0 <- T3, T4] .
Is saying that if both T1 and T2 match in the dataset then add the backward rule "[T0 <- T3, T4]" after instantiating any bound variables.

Q. The domain and range inferences look wrong, is that a bug?
A: The way rdfs range and domain declarations work is completely alien to anyone who thinks of RDFS and OWL as being a bit like a type system for a programming language, especially an object oriented language. Whilst there may be bugs in the inference rule sets the most common explanation for surprising results, when listing inferred domains and ranges, is this mismatch in expectations.

Suppose we have three classes eg:Man is an rdfs:subClassOf eg:Person is an rdfs:subClassOf eg:Animal. Suppose we have a property eg:personalName which is declared to have rdfs:domain eg:Person. Now the question is what other values can be inferred for the rdfs:domain of eg:personalName?

In pure RDFS no additional conclusions can be made. The definition of domain and range is intensional not extensional. It only works forward. Declaring <eg:personalName rdfs:domain eg:Person> means that anything to which eg:personalName is applied can be concluded to be of type eg:Person. It does not work backward - if you somehow knew that all things to which eg:personalName applied were also Foo's you cannot conclude that <eg:personalName rdfs:domain Foo>.

However, RDFS permits systems to strengthen the meaning of domain and range to be extensional, so that valid domain and range deductions can be made. OWL makes use of this option. So in OWL, then in our example we can also deduce that <eg:personalName rdfs:domain eg:Animal>. If you are used to object oriented programming this may look wrong. It is tempting, but incorrect, to think of rdfs:domain as meaning this is the class of objects to which this property can be applied. With that mindset you might expect to find that <eg:personalName rdfs:domain eg:Man>, after all every eg:Man is an eg:Person so it is always "legal" to apply eg:personalName to an eg:Man. That is true, it is legal, any eg:Man is allowed to have a a eg:personalName but rdfs:domain does not describe what is legal. The statement <P rdfs:domain C> just means all things to which P is applied can be inferred to have class C. You can see that if we tried to infer <eg:personalName rdfs:domain eg:Man> then we would start concluding that anything with a name was a man which is not right - every Man can have a name but non-Man Persons are also allowed to have names in this example.

Q: Why do I get a warning: Creating OWL rule reasoner working over another OWL rule reasoner
A: If you create an inference graph explicitly from an OWL reasoner or implicitly (by using OntModelSpec.OWL_*_RULE) then it is best if the argument models (data and schema) are plain models. It is easy to accidentally misuse the API and create an inference model working over the results of another inference model. This is a redundancy which significantly affects performance to no useful effect. To help detect this situation we have added a warning message. The best way to stop the message is to change your model construction code so that only the final InfModel/OntModel is specified to use OWL inference. If this is not appropriate for some reason you can disable the check and warning messages using the global flag com.hp.hpl.jena.shared.impl.JenaParameters.enableOWLRuleOverOWLRuleWarnings.

Q: Why do I get out of memory errors when working with the wine ontology?
A: The wine/food ontology is specifically designed to exercise all OWL/DL constructs. The Jena rule-based reasoner only supports the OWL/lite subset of OWL/Full and has scaling problems with some of the constructs used in the wine ontology. If you need full reasoning support for the wine (or similar) ontologies then use a full DL reasoner such as Pellet, which can be accessed via the DIG interface or directly using the Pellet-provided OntModelSpec. If you only need to do things like traverse the class hiearchy and inference over RDFS plus OWL property relations is enough for you, then the OWL micro reasoner may be an option.

Q: What causes the error "java.lang.UnsupportedOperationException: this is not a URI node" in the DIG reasoner?
A: This is a known problem with the Jena 2.1 release. Please get the latest version of Jena (a later release, if there is one, or get the sources from CVS and build a new copy of jena.jar.)

Q: I want to use my own custom rules to extend an existing RDFS or OWL Schema, what do I do?
A: The easiest way to do this is to define your rule set. You can use the @include directive at the top of your rules to include the RDFS (or OWL) rules first. Then create a GenericRuleReasoner which you can use to build an InfModel such as an OntModel (by attaching the reasoner to your OntModelSpec). See GenericRuleReasoner configuration for an example of how to parse custom rules.

Some important guidelines:

if you're using OWL rules: setOWLTranslation(true) on the reasoner
if you're using RDFS or OWLMicro: setTransitiveClosureCaching(true)
make your own rules backwards unless you know what you are doing

You may only use backward rules in this configuration because the RDFS and OWL rules use a mix of forward and backward chaining and the rule system architecture is a pure dataflow - the forward rules don't call the backward rules. Thus any forward rules will only see those parts of the RDFS/OWL inferences which are computed forwards. Rather than have to be familar with those details it is easiest to simply write your own rules as backward ones.

An alternative is to use a layered architecture - build your generic rule InfModel on top of a separate RDFS/OWL InfModel. That has higher overhead but then your own rules are unrestricted.

Ontology API

Q: Why doesn't listClasses() (or listProperties()/listIndividuals(), etc) work?
A: It does work. Extensive unit tests are used to check the correctness of Jena, and are included in the downloaded source code for your reference. If listClasses(), or a similar method, is not producing the answers you expect, or no answers at all, you should first check that your model is correctly defined. Print a copy of your model as a debug step, to see if the URI's match up (e.g, if you are expecting resource x to be an individual of class Y, check that the rdf:type of x is the same as the URI of the class declaration for Y). A common problem is that relative URI's change depending where you read the model from. Try adding an xml:base declaration to the document to ensure that URI's are correctly specified.

Why doesn't listProperties() return any results when listObjectProperties() (or listDatatypeProperties()) does?
A: The method OntModel.listObjectProperties() returns those resources from the OntModel with rdf:type rdf:Property. Under the OWL semantic theory, this is true of owl:ObjectProperty since ObjectProperty is a sub-class of Property. However, unless an OWL reasoner is used with an OWL model (or a DAML reasoner with a DAML model, etc), this inferred rdf:type statement is not visible. Therefore, with no reasoner, the OntModel cannot tell that an ObjectProperty is a Property. The solution is to construct the OntModel with an appropriate reasoner. If, for some application reason, using a reasoner is not possible then users should be prepared to list the various property types separately. Note also the next question.

Why are my transitive properties (or symmetric properties or inverse functional properties) missing when I call listObjectProperties()?
A: This is essentially the same problem as the previous FAQ. Without an OWL reasoner, the model cannot tell that an owl:TransitiveProperty is also an owl:ObjectProperty and an RDF property. The same solution advice applies as with the previous question.

Q: Why does .as( OntProperty.class ) fail with ConversionException on SymmetricProperty (or other property types)?
A: This is a slightly tricky issue. Internally, .as() calls the supports check, which tests whether the node that is being converted is a common flavour of property. Strictly, the only necessary test should be 'has rdf:type rdf:Property', because that is entailed by all of the other property types. However, that requires the user to use a model with a reasoner, and some don't want to (for good reasons, e.g. building an editor). The other position is to test for all the possible variants of property: object property, datatype property, annotation, ontology, transitive, functional, inverse functional, etc etc. The problem with this is that it duplicates the work of the reasoner, and my expectation was that most people would be running with a reasoner. Thus my code would be duplicating the functionality of the reasoner, which is bad design. The compromise solution was to make the supports check test for the common (top level) property types. Users who aren't using the reasoner, can either test explicitly for the other property types they expect to encounter (e.g. SymmetricProperty), or can turn off the supports check by setting strict mode to false on the model.

Q: Why doesn't the ontology API handle sub-class (or sub-property, domain, range, etc) relationships in a DAML model?
A: These relationships are handled correctly, but the results you see are dependent on the model configuration. The DAML specification includes a number of aliases for RDFS constructs to copy them into the DAML+OIL namespace. This means that, for a DAML processor, daml:subClassOf and rdfs:subClassOf are equivalent. This is declared by means of a daml:samePropertyAs in the daml+oil.daml specification document. Without a reasoner attached to the model, the ontology API will not recognise the equivalence with rdfs: properties. Thus, if you are not seeing the expected results when processing a DAML ontology, it is likely that your ontology file contains, for example,
<daml:Class rdf:ID="A"> <rdfs:subClassOf rdf:resource="B" /> ...
To fix this, either ensure that the ontology consistently uses daml: relationships, or declare the ontology model with the DAML micro rule-reasoner:
OntModel m = ModelFactory.createOntologyModel( OntModelSpec.DAML_MEM_RULE_INF, null );

Q: I don't understand very clearly the difference between the various OntModel model profiles.
A: OK, here's how it works. The ontology API is designed to provide a single set of convenient programming abstractions for a Jena model that contains an ontology in either RDFS, DAML or (the various flavours of) OWL. Each of these languages is structurally similar, but differ in detail. So, a class is declared variously as owl:Class, rdfs:Class and daml:Class. Hence one role of the ont model profiles (i.e. OntModelSpec objects) is to specify the details of which syntax is being used.

Second, ontology models can be composed of many sub-models when an ontology imports another ontology. These sub-models have to be stored somewhere, perhaps in memory or in a database. The profile contains a ModelMaker, which provides the OntModel with new sub-models on demand, to contain the imported ontology documents.

Third, ontologies can be made richer by including the entailments of the ontology assertions, given the semantics of the language. To do this, you need a reasoner. Since Jena provides an open, extensible architecture for adding reasoners, and some built-in pre-defined reasoners, the model profile specifies which reasoner, if any, that model will use.

These are the main components of an OntModelSpec. You can construct each of these elements independently, programatically or with RDF, but we have anticipated some common choices. So we provide some built-in standard profiles. These have names like OWL_MEM, or RDFS_MEM_RDFS_INF. The first component of the name is the syntax (OWL, RDFS etc). The second component is the model-maker strategy (MEM means in-memory models). The third component, which may be absent, specifies the reasoner. OWL_MEM has no reasoner, RDFS_MEM_TRANS_INF uses a simple reasoner that computes transitive closure on the class and property hierarchies, but nothing else.

Database and persistence

Q: Why do I get an exception when trying to create a new persistent model?
A: If the exception has to do with the database lock, see the question on locking. Otherwise, assuming that your program uses correct methods to create the model (see examples in the database How To Create Persistent Models), it may be that your database files are corrupted. Jena2 does not do a good job in checking the validity of the database. It makes a cursory check that some required tables exist but does not check that the tables contain valid data. If you suspect your database has been corrupted, you may invoke cleanDB() on a DBConnection object prior to creating your model. This removes all Jena2 tables from a database. Warning: this removes any other existing Jena2 models from the database so make sure that this is what you want to do.

Q: Why do I run out of memory when trying to list statements in a persistent model?
A: Jena2 uses the JDBC interface for accessing databases. The JDBC specification has no cursors. Consequently, when a query is processed by JDBC, the entire result set is returned from the database at once and the application program then iterates over the in-memory result set. If the result set is large, as is often the case when listing all statements of a large model, it may exceed the heap size of the Java virtual machine. If you suspect this is happening, you might try to increase the heap size of the Java virtual machine (-vmargs -Xmx500M for a 500 MB heap size). If this does not help, there is no other work-around and the program should be recoded.

Q: Has Jena2 persistence been ported to other database engines and platforms besides those officially supported?
A: The Jena team supports Jena2 persistence on the databases and operating systems listed in the Database documentation. These include MySQL, HSQLDB, PostgreSQL, Oracle, SQL Server. Other users have had success porting Jena2 to other databases and platforms. Jena2 has been ported to IBM's DB2 database. Contact Liang-Jie Zhang for details.

Q: Is there a limit on the number of models in a database?
A: The limit depends on the Jena database (schema) configuration and the database engine (MySQL, PostgreSQL, Oracle, etc). Recall that a Jena model may either be stored separately in its own database tables (the default) or, alternatively, in tables that are shared with other models (see StoreWithModel in the options for persistent models). Also, a Jena model is identified internally by a 32 bit integer. Consequently the maximum number of models is limited either by the maximum number of tables allowed in a database (which depends on the database engine) or by the maximum value of a 32 bit integer, i.e., 2G.

Q: Why am I getting an exception on failure to lock or unlock the database?
A: The Jena2 storage subsystem uses a lock internally to implement a critical section for operations that modify the database structure (create/delete tables). The lock is implemented as a database table, i.e., if the table exists in the database, the lock is held. Normally, this lock should be transparent to applications. But if an application has an exception while in a critical section, the database may remain locked for subsequent applications. In this case, a user must manually unlock the database either by calling DriverRDB.unlockDB() or by deleting the table (Jena_Mutex) from the database.

Q: How do I access the Jena database tables?
A: The Jena2 database tables are not intended for direct access by Jena users or applications. The database tables are created, deleted and manipulated through the Jena API methods. For example, creating a database model may cause tables to be added to tbe database. So, the user need not directly view or access the Jena database. Also, Jena encodes RDF statements, resources and literals in a way that makes them difficult to view or query using conventional (SQL) database tools. Users who are interested in the Jena2 database structure and value encoding can find details in the layout documentation.

XML serialisation (reading and writing)

Q: Why does my output use <rdf:Description ...> when I want output like <owl:Class ...>?
A: This is the raw form of the RDF serialisation into XML. In terms of RDF's information model, it expresses the same semantics as the compressed form. So the following fragments are equivalent (in RDF terms):

  <rdf:Description rdf:about="#foo">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class">
  </rdf:Description>

  <owl:Class rdf:about="#foo" />

However, the second is considerably easier for human readers to read, and tends to be the form most people come across when reading OWL or DAML ontologies, for example. The default output format is RDF/XML, the abbreviated format is RDF/XML-ABBREV. To change from the default output style, pass the required output format to the Model.write method:

  yourModel.write( yourOutputStream, "RDF/XML-ABBREV" );

Note that the abbreviated form requires the writer to do much more work (multiple passes are needed over the RDF model, to see which abbreviation rules can apply). Hence it may be inappropriate for large models. In particular, the abbreviated form is not recommended for serialising large models from a persistent database to RDF XML. More details on controlling the precise behaviour of the writer, including turning on and off abbreviation rules, are in the I/O howto.

Q: Why does my XML output contain strange prefixes j.0, j.1, etc?
A: XML's namespace mechanism is used in serialised RDF/XML to make legal XML element names from URI's. XML elements are not permitted to contain certain characters, many of which are required when making URI's. For example, http://example.com/test#SomeClass is not a legal element name. We can make the name XML-legal by ensuring all of the non-ncname characters (ncname denotes characters that can form legal XML element names) appear in the XML namespace prefix. So, <http://example.com/test#TestClass> is not legal, but <ns:TestClass xmlns="http://example.com/test#"> is legal. Jena's XML writer will add xmlns prefixes as necessary to make your XML output conform to the rules of correct XML. This may mean creating new prefix names. Jena's convention is to name these new prefixes j.0, j.1, etc. If you want these prefixes to have more meaningful names, before you write the model call setNsPrefix on your model to assign your preferred prefix to a given URI.

SPARQL and query processing

See the SPARQL Tutorial

Q: How do I do test substrings of literals?
A: SPARQL provides REGEX. See the ARQ FAQ for details.