Anzo 3.1 includes the Glitter engine for executing SPARQL queries against the Anzo RDF repository or other underlying data sources.

Glitter supports nearly the full  SPARQL specification, lacking only some of the mandated datatype cast functions. In addition, in Anzo 3.1 Glitter provides the following SPARQL extensions:

Functional predicates

A functional predicate (also known in some circles as  property functions or "magic predicates") allows a predicate within a triple pattern to have a special meaning. Glitter supports two property functions:

textlike

The predicate http://openanzo.org/ontologies/2008/07/Anzo#textlike can be used within a SPARQL query to find literals that match a certain wildcard pattern. This is similar to and uses the same syntax as SQL's % operator. For example, the following query finds articles whose title contains the word Anzo:

PREFIX fp: <http://openanzo.org/ontologies/2008/07/Anzo#>
...
SELECT ?title {
  ?article dc:title ?title .
  ?title fp:textlike "%Anzo%" .
}

The same query could be written using SPARQL's standard regex filter function:

SELECT ?title {
  ?article dc:title ?title .
  FILTER(regex(str(?title), "Anzo")) .
}

In Anzo 3.0, however, regex acts on an in-memory result set whereas the textlike functional predicate is executed directly against the underlying relational database. As such, most simple text-search queries are more efficient when using textlike.

textmatch

The predicate http://openanzo.org/ontologies/2008/07/Anzo#textmatch allows a SPARQL query access to Anzo's text-indexing capabilities. textmatch is similar to textlike in that it relates a literal value to a search string; however, textmatch uses specialized text indexes that allow a broader range of search options. Further, because textmatch is executed against a specialized text index that also contains 'nearby' subject and property data, queries using textmatch can often be executed without going to the underlying database at all.

@@ example of more complex search using textmatch

SELECT expressions

In addition to selecting the values of variables bound in a SPARQL query, Glitter supports selecting calculated values. Glitter extends SPARQL's SELECT clause to take parenthesized expressions that specify how to compute a projected value. An expression is followed by the AS keyword and then a variable that will be used to store the computed values in the query's result set.

For example, SELECT expressions can be used to calculate the cost of the line items of a particular order:

PREFIX ex: <...>
SELECT ?item ?quantity (?quantity * ?unitprice AS ?cost)
{
    ?order a        ex:Order ;
           ex:item  ?item ;
           ex:qty   ?quantity .
    ?item  ex:price ?unitprice .
}

Each result from this query would contain bindings for three variables: ?item (the URI of the item being ordered), ?quantity (the number of units of the item being ordered), and ?cost (the total cost of this item in this order, calculated as the quantity multiplied by the unit cost of the item).

SPARQL functions (including extension functions) can also be used. The following query finds all literals in the dataset and breaks them into their lexical form, datatype and language:

SELECT DISTINCT (str(?literal) AS ?lexical) (datatype(?literal) AS ?datatype) (lang(?literal) AS ?lang)
{
  ?s ?p ?literal .
}

Aggregates and GROUP BY

Glitter supports aggregate queries: queries that group sets of results together and return one row for each group of results. Such queries can also project the results of aggregate functions, functions that act on an entire group of results to produce a value. Glitter treats a query as an aggregate query if either:

  • It contains an explicit GROUP BY clause
  • Its SELECT clause contains at least one expression that involves an aggregate function

If an aggregate query does not contain a GROUP BY clause then the entire (pre-aggregated) result set is treated as a single group of results. In this case, the aggregate query's result set will have a single row representing the aggregated results of the entire query. When there is no GROUP BY clause, each element in the SELECT clause must involve an aggregate function.

Alternatively, the GROUP BY clause can list one or more variables that are used to break a (pre-aggregated) result set into multiple groups of results. There will be one group of results for each distinct combination of bindings for the variables in the GROUP BY clause. The GROUP BY clause is added after the query pattern (the WHERE clause) and before any ORDER BY clause. A query with a GROUP BY clause may project out variables (and expressions involving variables) that are mentioned in the GROUP BY clause.

Glitter supports the following aggregate functions:

COUNT

COUNT returns the number of solutions in a group.

For example, if:

SELECT ?dept ?title
WHERE {
 ?dept foaf:member ?person .
 ?person foaf:title ?title .

returns

{
 {?dept='engineering', ?title="manager"},
 {?dept='engineering', ?title="manager"},
 {?dept='engineering', ?title="engineer"},
 {?dept='engineering', ?title="engineer"}
}

then

SELECT ?dept ?title (COUNT(*) AS ?count)
WHERE {
 ?dept foaf:member ?person .
 ?person foaf:title ?title .
}
GROUP BY ?dept ?title

would return

{
 {?dept='engineering', ?title="manager", ?count=2},
 {?dept='engineering', ?title="engineer", ?count=2}
}

There are a few different versions of COUNT:

  • COUNT(*) (as above) returns the number of solutions in each group
  • COUNT(?var) returns the number of solutions in the group in which ?var is bound
  • COUNT(DISTINCT *) returns the number of distinct solutions in each group
  • COUNT(DISTINCT ?var1 ?var2 ?var3) returns the number of distinct combinations of the three variables in each group, excluding any solutions in which all three variables are unbound

MAX

Returns the largest value for the given variable in the group, as compared with >

MIN

Returns the smallest value for the given variable in the group, as compared with <

AVG

Returns the arithmetic mean of the numeric values of the given variable in the group.

SAMPLE

Returns an arbitrary value of the given variable from within the group.

GROUP_CONCAT

Concatenates together all of the values for a given variable within the group.

The below example returns one row for every state in which a person lives. Along with the state, it returns a single string (?residents) which is a newline-separated concatenation of the names of everyone who lives in the state.

PREFIX : <http://example.org/>
SELECT ?state (GROUP_CONCAT(?name SEPARATOR '\n') AS ?residents) 
{
  ?person :name ?name .
  ?person :lives_in ?state
}
GROUP BY ?state

Functions

SPARQL supports a variety of utility functions. They are all in the http://openanzo.org/glitter/builtin/functions# namespace:

  • datePart - returns the date part of an xsd:date or xsd:dateTime as an xsd:date
  • timePart - returns the time part of an xsd:time or xsd:dateTime as an xsd:time
  • partitionIndex - given a target value, a start value, and an interval, returns the "bucket" index in which the target value falls, given partitions beginning at the start value and increasing by the interval for each bucket
  • unboundAsMaxValue - converts unbound values to an internal symbol that compares greater than all other values; acts as the identity function otherwise. This is useful for achieving sort orders that, e.g., always sort unbound values at the end of the list, whether using ascending or descending sort.

Subqueries

Anzo extends the core SPARQL language with support for subqueries. A subquery goes within a graph pattern and is enclosed by curly braces. Any variables in a subquery that are not projected out of it are completely local to the subquery. The results of the subquery are joined with the results of evaluating other graph patterns, as normal.

Currently, Anzo supports only SELECT subqueries; future versions will include support for ASK queries within FILTERs.

The following query returns the top-5 newspapers by circulation and, for each newspaper, all of the cities that it serves.

PREFIX : <http://example.org/>
SELECT ?newspaper ?city
WHERE {
  ?newspaper :serves_city ?city .
  { 
    SELECT ?newspaper { 
      ?newspaper a :Newspaper .
      ?newspaper :circulation ?circulation .
    } ORDER BY DESC(?circulation) LIMIT 5
  }
}	 

It might return a result set such as:

{
 {?newspaper=:USAToday, ?city="Los Angeles"},
 {?newspaper=:USAToday, ?city="Washington DC"},
 {?newspaper=:USAToday, ?city="Chicago"},
 {?newspaper=:WashingtonPost, ?city="Washington DC"},
 {?newspaper=:WashingtonPost, ?city="Baltimore"},
 {?newspaper=:NewYorkTimes, ?city="NewYork"},
 {?newspaper=:ChicagoTribune, ?city="Chicago"},
 {?newspaper=:WallStreetJournal, ?city="NewYork"},
 {?newspaper=:WallStreetJournal, ?city="Chicago"},
 {?newspaper=:WallStreetJournal, ?city="SanFrancisco"},
 {?newspaper=:WallStreetJournal, ?city="Miami"}
}

(Note that's what unique about this is that we are able to limit ourselves to five newspapers, but still retrieve multiple rows per newspaper (one per city served).)

Query validity

Glitter rejects some queries that are legal according to the SPARQL specification but are likely to be incorrectly written or would have unexpected results.

The following query is rejected because the variable ?x does not appear anywhere in the query pattern:

SELECT ?x ?y ?z 
{
  ?y a ?z .
}

The following query is rejected because it is an aggregate query that attempts to project a variable that is not part of the result grouping:

PREFIX ex: <...>
SELECT ?customer ?order COUNT(*)
{
  ?customer ex:order ?order .
} GROUP BY ?customer

The following query is rejected because the double use of ?s is likely an error by the query writer:

SELECT (str(?o) AS ?s)
{
  ?s rdfs:seeAlso ?o .
}

Assignments

@@

Named datasets