Processing RDF with SPARQL

Summary

This chapter introduces the sparql-maven-plugin.

Purpose

  • The plugin enables within a pom.xml file the generate, transformation, enrichment, and repair of any type of RDF data.
  • What makes the combination of SPARQL and Maven especially powerful is the interplay of maven properties with SPARQL:
    • Maven metadata can be used to interpolate SPARQL query strings.
    • Conversely, SPARQL function extensions make it possible to query the pom.xml model using XPath.

Use Cases

  • The maven-sparql-plugin is a powerful tool that can be used for many use cases. A typical use case is to generate metadate in the following models:
    • Vocabulary of Interlinked Datasets (VoID): Captures statistical information about an RDF dataset, such as the frequences of classes and properties.
    • Provenance Ontology (PROV-O): Can be used to capture the plan that was used to produced a dataset - with the maven4data paradigm this plan is the pom.xml file - which can be referenced by its maven coordinate!
    • Data Catalog (DCAT) Vocabulary: Captures versions, publishers and distributions (= means of access) of a dataset.

Limitations

  • Currently the SPARQL functions do not support generating the effective pom.xml. This is future work and a contribution would be welcome.

Basic Approach

TODO Adapt and test example.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.aksw.data.config</groupId>
  <artifactId>dcat-generator</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <input.url>pom.xml</input.url>

    <output.type>ttl.bz2</output.type>
    <output.classifier>dcat</output.classifier>
    <output.path>${project.build.directory}/${project.artifactId}-${output.classifier}.${output.type}</output.path>
    
    <description>This description is shared between maven and SPARQL.</description>
  </properties>

  <descriptions>${description}</description>

  <build>
    <plugins>
      <plugin>
        <groupId>org.aksw.maven.plugins</groupId>
        <artifactId>sparql-maven-plugin</artifactId>
        <version>0.0.1-SNAPSHOT</version>
        <executions>
          <execution>
            <id>generate-metadata</id>
            <phase>process-resources</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <!-- TDB2 is a disk-based engine. When engine is omitted, then the in-memory one will be used -->
              <engine>tdb2</engine>
              <outputFile>${output.path}</outputFile>
              <outputFormat>${output.type}</outputFormat>
              <env>
                <DATASET>${input.urn.dataset}</DATASET>
                <BASE>${input.urn.base}#</BASE>
                <POM>${input.pom.path}</POM>
              </env>
              <args>
                <!-- -->
                <arg>${input.data.path}</arg>

                <arg>void/sportal/compact/qb2.rq</arg>

<!-- A construct query string whose property references are interpolated before the query is evaluated -->
<arg><![CDATA[
CONSTRUCT {
  <urn:mvn:${groupId}:${artifactId}:${version}#dataset>
    a dcat:Dataset ;
    rdfs:comment """${description}"""@en ;
    .
}
WHERE {
}
]]></arg>
                
              </configuration>
            </execution>
          </executions>
      </plugin>
    </plugins>
  </build>
</project>