Anzo 3.1 includes the Glitter engine for executing SPARQL queries against the Anzo RDF repository or other underlying data sources. Glitter supports nearly the full [http://www.w3.org/TR/rdf-sparql-query/ SPARQL specification], lacking only some of the mandated datatype cast functions. In addition, in Anzo 3.1 Glitter provides the following SPARQL extensions: == Functional predicates == A functional predicate (also known in some circles as [http://jena.sourceforge.net/ARQ/extension.html#propertyFunctions property functions or "magic predicates"]) allows a predicate within a triple pattern to have a special meaning. Glitter supports two property functions: === textlike === The predicate `http://openanzo.org/ontologies/2008/07/Anzo#textlike` can be used within a SPARQL query to find literals that match a certain wildcard pattern. This is similar to and uses the same syntax as SQL's `%` operator. For example, the following query finds articles whose title contains the word `Anzo`: {{{ PREFIX fp: ... SELECT ?title { ?article dc:title ?title . ?title fp:textlike "%Anzo%" . } }}} The same query could be written using SPARQL's standard `regex` filter function: {{{ SELECT ?title { ?article dc:title ?title . FILTER(regex(str(?title), "Anzo")) . } }}} In Anzo 3.0, however, `regex` acts on an in-memory result set whereas the `textlike` functional predicate is executed directly against the underlying relational database. As such, most simple text-search queries are more efficient when using `textlike`. === textmatch === The predicate `http://openanzo.org/ontologies/2008/07/Anzo#textmatch` allows a SPARQL query access to Anzo's text-indexing capabilities. `textmatch` is similar to `textlike` in that it relates a literal value to a search string; however, `textmatch` uses specialized text indexes that allow a broader range of search options. Further, because `textmatch` is executed against a specialized text index that also contains 'nearby' subject and property data, queries using `textmatch` can often be executed without going to the underlying database at all. @@ example of more complex search using textmatch == SELECT expressions == In addition to selecting the values of variables bound in a SPARQL query, Glitter supports selecting calculated values. Glitter extends SPARQL's `SELECT` clause to take parenthesized expressions that specify how to compute a projected value. An expression is followed by the `AS` keyword and then a variable that will be used to store the computed values in the query's result set. For example, `SELECT` expressions can be used to calculate the cost of the line items of a particular order: {{{ PREFIX ex: <...> SELECT ?item ?quantity (?quantity * ?unitprice AS ?cost) { ?order a ex:Order ; ex:item ?item ; ex:qty ?quantity . ?item ex:price ?unitprice . } }}} Each result from this query would contain bindings for three variables: `?item` (the URI of the item being ordered), `?quantity` (the number of units of the item being ordered), and `?cost` (the total cost of this item in this order, calculated as the quantity multiplied by the unit cost of the item). SPARQL functions (including extension functions) can also be used. The following query finds all literals in the dataset and breaks them into their lexical form, datatype and language: {{{ SELECT DISTINCT (str(?literal) AS ?lexical) (datatype(?literal) AS ?datatype) (lang(?literal) AS ?lang) { ?s ?p ?literal . } }}} == Aggregates and GROUP BY == Glitter supports aggregate queries: queries that group sets of results together and return one row for each group of results. Such queries can also project the results of ''aggregate functions'', functions that act on an entire group of results to produce a value. Glitter treats a query as an aggregate query if either: * It contains an explicit `GROUP BY` clause * Its `SELECT` clause contains at least one expression that involves an aggregate function If an aggregate query does not contain a `GROUP BY` clause then the entire (pre-aggregated) result set is treated as a single group of results. In this case, the aggregate query's result set will have a single row representing the aggregated results of the entire query. When there is no `GROUP BY` clause, each element in the `SELECT` clause must involve an aggregate function. Alternatively, the `GROUP BY` clause can list one or more variables that are used to break a (pre-aggregated) result set into multiple groups of results. There will be one group of results for each distinct combination of bindings for the variables in the `GROUP BY` clause. The `GROUP BY` clause is added after the query pattern (the `WHERE` clause) and before any `ORDER BY` clause. A query with a `GROUP BY` clause may project out variables (and expressions involving variables) that are mentioned in the `GROUP BY` clause. Glitter supports the following aggregate functions: === COUNT === `COUNT` returns the number of solutions in a group. For example, if: {{{ SELECT ?dept ?title WHERE { ?dept foaf:member ?person . ?person foaf:title ?title . }}} returns {{{ { {?dept='engineering', ?title="manager"}, {?dept='engineering', ?title="manager"}, {?dept='engineering', ?title="engineer"}, {?dept='engineering', ?title="engineer"} } }}} then {{{ SELECT ?dept ?title (COUNT(*) AS ?count) WHERE { ?dept foaf:member ?person . ?person foaf:title ?title . } GROUP BY ?dept ?title }}} would return {{{ { {?dept='engineering', ?title="manager", ?count=2}, {?dept='engineering', ?title="engineer", ?count=2} } }}} There are a few different versions of COUNT: * `COUNT(*)` (as above) returns the number of solutions in each group * `COUNT(?var)` returns the number of solutions in the group in which `?var` is bound * `COUNT(DISTINCT *)` returns the number of distinct solutions in each group * `COUNT(DISTINCT ?var1 ?var2 ?var3)` returns the number of distinct combinations of the three variables in each group, excluding any solutions in which all three variables are unbound === MAX === Returns the largest value for the given variable in the group, as compared with `>` === MIN === Returns the smallest value for the given variable in the group, as compared with `<` === AVG === Returns the arithmetic mean of the numeric values of the given variable in the group. === SAMPLE === Returns an arbitrary value of the given variable from within the group. === GROUP_CONCAT === Concatenates together all of the values for a given variable within the group. The below example returns one row for every state in which a person lives. Along with the state, it returns a single string ({{{?residents}}}) which is a newline-separated concatenation of the names of everyone who lives in the state. {{{ PREFIX : SELECT ?state (GROUP_CONCAT(?name SEPARATOR '\n') AS ?residents) { ?person :name ?name . ?person :lives_in ?state } GROUP BY ?state }}} == Functions == SPARQL supports a variety of utility functions. They are all in the `http://openanzo.org/glitter/builtin/functions#` namespace: * `datePart` - returns the date part of an `xsd:date` or `xsd:dateTime` as an `xsd:date` * `timePart` - returns the time part of an `xsd:time` or `xsd:dateTime` as an `xsd:time` * `partitionIndex` - given a target value, a start value, and an interval, returns the "bucket" index in which the target value falls, given partitions beginning at the start value and increasing by the interval for each bucket * `unboundAsMaxValue` - converts unbound values to an internal symbol that compares greater than all other values; acts as the identity function otherwise. This is useful for achieving sort orders that, e.g., always sort unbound values at the end of the list, whether using ascending or descending sort. == Subqueries == Anzo extends the core SPARQL language with support for subqueries. A subquery goes within a graph pattern and is enclosed by curly braces. Any variables in a subquery that are not projected out of it are completely local to the subquery. The results of the subquery are joined with the results of evaluating other graph patterns, as normal. Currently, Anzo supports only {{{SELECT}}} subqueries; future versions will include support for {{{ASK}}} queries within {{{FILTER}}}s. The following query returns the top-5 newspapers by circulation and, for each newspaper, all of the cities that it serves. {{{ PREFIX : SELECT ?newspaper ?city WHERE { ?newspaper :serves_city ?city . { SELECT ?newspaper { ?newspaper a :Newspaper . ?newspaper :circulation ?circulation . } ORDER BY DESC(?circulation) LIMIT 5 } } }}} It might return a result set such as: {{{ { {?newspaper=:USAToday, ?city="Los Angeles"}, {?newspaper=:USAToday, ?city="Washington DC"}, {?newspaper=:USAToday, ?city="Chicago"}, {?newspaper=:WashingtonPost, ?city="Washington DC"}, {?newspaper=:WashingtonPost, ?city="Baltimore"}, {?newspaper=:NewYorkTimes, ?city="NewYork"}, {?newspaper=:ChicagoTribune, ?city="Chicago"}, {?newspaper=:WallStreetJournal, ?city="NewYork"}, {?newspaper=:WallStreetJournal, ?city="Chicago"}, {?newspaper=:WallStreetJournal, ?city="SanFrancisco"}, {?newspaper=:WallStreetJournal, ?city="Miami"} } }}} (Note that's what unique about this is that we are able to limit ourselves to five newspapers, but still retrieve multiple rows per newspaper (one per city served).) == Query validity == Glitter rejects some queries that are legal according to the SPARQL specification but are likely to be incorrectly written or would have unexpected results. The following query is rejected because the variable `?x` does not appear anywhere in the query pattern: {{{ SELECT ?x ?y ?z { ?y a ?z . } }}} The following query is rejected because it is an aggregate query that attempts to project a variable that is not part of the result grouping: {{{ PREFIX ex: <...> SELECT ?customer ?order COUNT(*) { ?customer ex:order ?order . } GROUP BY ?customer }}} The following query is rejected because the double use of ?s is likely an error by the query writer: {{{ SELECT (str(?o) AS ?s) { ?s rdfs:seeAlso ?o . } }}} == Assignments == @@ == Named datasets ==