SPARQL extensions for heterogeneous data
This module enhances Jena’s SPARQL engine with additional datatypes, functions and property functions for accessing and processing JSON, CSV and XML data.
Simply by including the following maven dependency will automatically make all features available. This works because the JAR file provides appropriate metadata which gets picked up by Jena’s plugin system.
<dependency>
<artifactId>org.aksw.jenax</artifactId>
<groupId>jenax-arq-plugins-bundle</groupId>
</dependency>
Data formats
JSON
The following table lists the featured extensions for JSON processing within SPARQL. Note, that while there is streaming support for the CSV format, at present there is no streaming support for JSON arrays.
Category | Synopsis | Description |
---|---|---|
Namespaces | json: <http://jsa.aksw.org/fn/json/> | Namespace for json (property) function extensions |
Datatypes | xsd:json | The URI for JSON literals. Hijacks the xsd namespace for convenience. Note, that the datatype can be used as a cast expression in order to parse strings to json literals. |
Functions | literal json:path(xsd:json json, string literal queryString) | Evaluates a JSON path expression on the given json argument. The type of the returned literal depends on the evaluation result. See the mapping table below. |
Property Functions | ?jsonArray json:unnest ?jsonItem | Yields bindings where the variable in the object position is consecutively bound to the items of the json array in subject position. Subject must evaluate to a xsd:json literal. |
?jsonArray json:unnest (?jsonItem ?index) | Same as above but in addition binds ?index to the zero-based index of each item. | |
?jsonObject json:bind () | Make all top-level key-value pairs of the given json object available as a SPARQL binding. I.e. all keys can be accessed via SPARQL variables. |
Json Path Expressions
Please refer to the JsonPath project GitHub page for path expression examples and the specification.
JSON <-> SPARQL datatype mapping
JSON | RDF |
---|---|
string | xsd:string |
number | xsd:double |
boolean | xsd:boolean |
json object | xsd:json |
json array | xsd:json |
null | (unbound) |
Example
PREFIX ex: <http://example.org/>
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
CONSTRUCT {
?s
a ex:Pub ;
rdfs:label ?l ;
wgs:long ?x ;
wgs:lat ?y ;
.
}
{
"[{\"id\": \"pub1\", \"name\": \"MyFavoritePub\", \"lon\": \"50\", \"lat\": \"10\" }]"^^xsd:json json:unnest ?item .
BIND("http://example.org/" AS ?ns) .
BIND(URI(CONCAT(?ns, ENCODE_FOR_URI(json:path(?item, "$.id")))) AS ?s) .
BIND(json:path(?item, "$.name") AS ?l) .
BIND(STR(json:path(?item, "$.lon")) AS ?x) .
BIND(STR(json:path(?item, "$.lat")) AS ?y) .
}
Output:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex: <http://example.org/> .
@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
ex:pub1 a ex:Pub ;
rdfs:label "MyFavoritePub" ;
wgs:lat "10" ;
wgs:long "50" .
CSV
The main principle of the CSV extensions is to parse each row of the input data into a corresponding JSON object. This means, that all JSON extensions can be used for further processing, and no dedicated CSV datatype is necessary.
Category | Synopsis | Description |
---|---|---|
Namespaces | csv: <http://jsa.aksw.org/fn/csv/> | Namespace for CSV (property) function extensions |
Functions | xsd:json csv:parse(string literal, "optionsString") | Parse CSV data provided as a string literal into a xsd:json literal holding the corresponding JSON array of rows. |
Property Functions | ?csvUrl csv:parse (?rowJson "optionsString") | Parse the data at the URL in the subject position as a CSV resource with the given options. Streams each row and binds it as a JSON element to ?rowJson . The options control whether rows are represented as JSON arrays (default) or objects. |
?csvUrl csv:parse ?rowJsonObj | Equivalent to ?csvUrl csv:parse (?rowJsonObj "excel") |
CSV option strings
[base format] [-s] [-d] [-e] [-h]
- Base format: One of the predefined CSV Formats defined by Apache Commons CSV. Defaults to
excel
. -d
: Field delimiter, such as,
-q
: Optional field quote character, such as"
.-e
: Optional escape charater, such as\
-h
: User first row as headers, defaults tofalse
.-o
: Enables object mode: rows are then returned as JSON objects rather than arrays.
Notes:
- The defaults are determined by the base format and can be overridden using the options.
- Without headers, the row json object’s keys are labeled
col0
tocoln
. - Using the options
-q
or-e
without argument sets the corresponding character in the base format to NULL.
XML
At present there is XPath integration.
Example
BIND(xml:path(?o, "//html/text()") AS ?x)
Related projects:
- TARQL
- JARQL
- SPARQL Generate: This project introduces its own SPARQL-based syntax for facilitating data integration.