An RDF Schema for Thesauri (June 2003)

Background

The Resource Description Framework (RDF) defines an abstract language whose purpose is to describe semantics in a machine-understandable way. It carefully distinguishes the resource that is being described ("subject"), the property of this resource ("predicate"), and the value of this property ("object"). A sample RDF triple is 'Das Kapital' 'hasAuthor' 'Karl Marx'. As it is an abstract language it does not come with its own vocabulary, i.e. subject and object (together also referred to as classes) and properties have to be defined by the application. However, the RDF vocabulary description language (RDFS) specifies how a vocabulary that uses the RDF grammar has to be defined. Among the most popular vocabularies that are used in connection with RDF is for instance the well-known Dublin Core metadata set.

Research Questions and Methods

We assess the applicability of RDFS for describing another well-known kind of vocabulary that is used for some time in information science: a thesaurus. We do so by describing the relationships of a thesaurus against the background of RDF's grammar ('Marxism' 'usedFor' 'Historical Materialism'). Afterwards we analyze the widely used representation of thesauri in XML/DTD and point out its drawbacks. Finally we use RDFS for representing thesauri and compare our result with the result of the CERES/NBII project in 1998. Our thesis results into an RDF schema for thesauri.

Results

RDFS can be used for describing thesauri with some limitations: As RDF intrinsically only supports binary relationships, it is not possible to represent the relationship between compound terms and single-word descriptors. RDFS' crucial advantage is that classes and properties can be defined independently of each other, i.e. instead of having to confine oneself to the use of a single vocabulary, which is the major drawback of XML/DTD, one can combine classes and properties of various vocabularies in order to describe the required semantic. This turns out to be extremely useful when it comes to the integration of various thesauri into a single thesaurus. In particular our RDF Schema could contribute to the integration of bibliographic data that were described by different thesauri.

Discussion

There are well-known alternatives to RDF such as richer ontology languages. Nevertheless RDF deserves more attention when it comes to thesaurus integration, for its approach can be considered the semantic counterpart to the revolutionary hyperlink concept of the Web. Alternative approaches have to be assessed against the background of RDF's advantages.

Full Text and RDF Schema