The Open Anzo Project

Semantic Application Middleware

<< Back to AnzoJSDesign

Open Anzo Overview

Open Anzo is a library and a server that provide convenient, powerful access to semantic web data. Semantic web data is typically modeled in using the RDF data model. In that model, data takes the form of statements about resources that are combined into a graph. Open Anzo gives developers an API that manipulates resources, statements, and graphs.

The Open Anzo library supports doing more than simply adding and removing statements to graphs in memory. Open Anzo lets you group changes to graphs into transactions. It lets you easily save graphs to a server for storage. It allows you to retrieve local replicas of graphs. And it can send notifications of changes to graphs. Open Anzo lets you query graphs with the the powerful RDF query language, SPARQL. At the same time, each graph in Open Anzo can be protected with an access control list for security.

The Many Languages of Open Anzo

There are Open Anzo client libraries available for multiple programming languages.

  • Java
  • JavaScript
  • C# (.NET)

Each of the libraries is very similar. The main concepts and behavior are similar across languages. Not all features are supported by all libraries. Each library also may have slightly different interpretations to fit into conventions used in each language.

This document describes the Open Anzo client library mostly using examples in JavaScript. However, the same concepts can be applied to the other languages.

Basic Manipulating of Graphs

Open Anzo at its heart is a library for manipulating sets of statements, or graphs. As such, the most important object the library provides is the INamedGraph interface. The simplest implementation of the interface is the NamedGraph object.

  var myGraph = new anzo.rdf.NamedGraph("http://example.org/graphs/people");
  var statement = anzo.createStatement("http://example.org/people/bob", 
                                       "http://xmlns.com/foaf/0.1/phone",
                                       "tel:+1-555-555-5555");
  myGraph.add( statement );

That snippet creates a NamedGraph object and adds a statement. Note that the NamedGraph constructor takes a URI. That URI is the name of the graph. All graphs in Open Anzo have names. The names are later useful when saving graphs to a server or replicating them locally.

The anzo.createStatement function is a convenient way to create a Statement object by supplying the subject, predicate, and object of the statement. The example adds a statement that describes a person's phone number using the standard FOAF format.

The INamedGraph interface is in Open Anzo has methods to add, remove, and find statements in the graph. These are some of the main methods used to manipulate graphs in Open Anzo:

  • find
  • contains
  • add
  • remove
  • clear
  • isEmpty
  • size

The find and contains methods allow the use of patterns to search for information in the graph. Passing null for any of the arguments in find, contains, or remove will treat that portion as a wildcard.

// Lookup all of the people that Bob 'knows'
var peopleBobKnows = myGraph.find("http://example.org/people/bob",
                                  "http://xmlns.com/foaf/0.1/knows", 
                                  null);

// Find all statements about Bob.
var allAboutBob = myGraph.find("http://example.org/people/bob",
                               null, 
                               null);

// Get all statements in the graph
var everything = myGraph.find(null, null, null);

Anatomy of a Statement

Open Anzo has a very specific idea of the term Statement. A Statement is an object with four parts:

  • subject - a URI or BlankNode
  • predicate - a URI
  • object - either a URI, BlankNode or a Literal
  • namedGraphUri - a URI

These statement parts, often called nodes, are part of a carefully designed Anzo RDF Node API.

Those familiar with RDF may be surprised by the fourth part in an Open Anzo statement. RDF tends to think of graphs as sets of triples rather than quads. That is true but the concept is still very similar. Recall that every graph in Open Anzo has a name. You can think of an Anzo statement as simply a regular RDF triple plus the name of the graph in which that triple lives. For this reason, Open Anzo and other systems like it are sometimes informally referred to a Quad Stores.

Since a statement in a quad, it means that two statements, each from a different graph, may not be equal to each other even if they have the same subject, predicate, and object.

var subject = "http://example.org/people/bob";
var predicate = "http://xmlns.com/foaf/0.1/phone";
var object = "tel:+1-555-555-5555";

// graphA and graphB are graphs with different namedGraphURIs
var statementA = graphA.find(subject, predicate, object)[0];
var statementB = graphB.find(subject, predicate, object)[0];

statementA.equals(statementB); // This is false since the statements came from two graphs with different names.

Bring on the Server

Open Anzo supports storing and retrieving graphs in a server repository. When working with an Open Anzo repository, the most important object in the library is the DatasetService. A DatasetService represents the server. It is initialized with connectivity information like host name and port. It provides access to graphs on the server, allows creation of replicas of graphs and even allows registering for notifications of changes that happen to graphs.

var configuration = {
  "org.openanzo.modelService.host"     : "anzo.example.org",
  "org.openanzo.modelService.port"     : 616161,
  "org.openanzo.modelService.user"     : myuser,
  "org.openanzo.modelService.password" : "p455w0rd"
};
var myDatasetService = new anzo.client.DatasetService(configuration);

var localGraph = myDatasetService.getLocalGraph("http://example.org/graphs/people", true);
localGraph.add(anzo.createStatement("http://example.org/people/bob", 
                                    "http://xmlns.com/foaf/0.1/phone",
                                    "tel:+1-555-555-5555"));
localGraph.close();

The example starts by creating a configuration object which is passed to the constructor of the DatasetService. Server connectivity information is the most important information among the full list of configuration properties. The getLocalGraph method of the DatasetService creates a client-side representation of the graph stored in the repository called http://example.org/graphs/people. Remember that all graphs in Anzo are named graphs. To grab a client-side representation of a particular graph, you simply need ask for it by name. The boolean parameter to getLocalGraph signifies that the graph should be created if it doesn't already exist.

A LocalGraph, those graphs returned by DatasetService.getLocalGraph(), has all the methods from the INamedGraph interface described earlier such as add, remove, and find while adding one important method: getMetadataGraph(). Since LocalGraphs are a client-side representation of a graph stored in an Open Anzo server repository, the graph has extra information such as access control lists, revision, creation date, etc. getMetadataGraph() returns a graph which is filled with information about the LocalGraph such as:

  • the graph's revision number - an integer that is incremented each time the server graph is modified.
  • creation date
  • creator username
  • date of last modification
  • user who last modified the graph
  • access control lists - what roles and users are allowed to read and write to this graph

Much like when you create a file in a filesystem there is metadata associated with the file, in Open Anzo, when you store a graph in the repository there is metadata about that graph.

Transactions

The DatasetService also provides transaction support. It maintains a queue shared by all graphs it has created. Transactions can help ensure data integrity in the repository by ensuring that either an entire group of changes are entirely written to the server or, if there is a failure, none of the changes are written. Transactions in Open Anzo are very similar to transactions that a relational database system might supply.

var localGraph = myDatasetService.getLocalGraph("http://example.org/graphs/people", true);

myDatasetService.begin();
try {
  localGraph.add(anzo.createStatement("http://example.org/people/bob", 
                                                 "http://xmlns.com/foaf/0.1/phone",
                                                 "tel:+1-555-555-5555"));

  localGraph.remove("http://example.org/people/bob", 
                    "http://xmlns.com/foaf/0.1/knows",
                     null);
} catch(e) {
  myDatasetService.abort();
}
myDatasetService.commit();

localGraph.close();

Using the begin, commit, and abort methods, you can group graph changes into a single transaction.

In fact, every change made to a graph created by a DatasetService goes into a transaction on the queue. If begin hasn't been called, then the graph internally just creates a transaction with one modification and adds it to the queue.

The transaction queue serves as both a way to group server changes into atomic units as well as a holding area to batch changes that will later be written to the server.

Replication

In Open Anzo, replication is the act of sending changes made on graphs to the server and retrieving the latest changes from the server.

The first thing replication does is send the entire transaction queue to be executed on the server. The transaction queue contains all of the statements added and removed to the different graphs. Once all of those changes are committed to the server, the client asks the server for any changes that may have been made on the server since the last time replication occurred. The server will send down any new statements and notify the client of any statements that were deleted. All of that data goes into the client side replica of the graphs.

The transaction queue and local replicas allow you to optimize network usage for your application. Rather than sending a request every time a statement changes, the transaction queue allows many changes to be batched together before being sent up to the server. And more importantly, creating a local replica allows use of the graphs data without even find or contains call going directly to the server. Of course, the trade-off is that at any given moment, the local replica may be out of date with respect to the server. You can control the trade-off between network usage and immediacy by changing when the system performs replication.

Open Anzo has many different modes for specifying how often to replicate. Specifically,

MANUAL
In MANUAL mode, replication only happens when the DatasetService.replicate method is called.
AUTOMATIC
Replication ocurrs automatically at regular time intervals specified in the time interval specified in the DatasetService.replicationInterval property.
IMMEDIATE_ASYNC
Replication is started whenever any change is made to a graph. Any add or remove to a graph from a DatasetService will trigger replication. However, the add/remove method of the graph will return immediately, without waiting for the replication to be complete. Thus it is replicating asynchronously.
IMMEDIATE_SYNC
Replication is started whenever any change is made to a graph. Any add or remove to a graph from a DatasetService will trigger replication. The add/remove' method on the graph will block until replication is complete or fails. NOTE: this method is not supported in all Open Anzo client library implementations, especially those that are single threaded. In particular, it is NOT implemented in the JavaScript library.

The DatasetService.replicate can be invoked in synchronous or asynchronous mode (except in JavaScript where it can only be invoked in asynchronous mode). When using asynchronous mode, the caller can register for events that will notify it when replication is completed and if there were any errors.

myDatasetService.replicationMode = anzo.client.replication.mode.MANUAL;
var localGraph = myDatasetService.getLocalGraph("http://example.org/graphs/people");

var replicationListener = {
  onReplicationComplete : function (????????????????) {
    localGraph.getSize(); // Now that we've replicated this may be greater than zero.
    localGraph.close();
  }
};

localGraph.getSize(); // at this point, we haven't replicated so this is zero
myDatasetService.addReplicationListener(replicationListener);
myDatasetService.replicate();

The example above illustrates that the local replica is empty until replication finishes.

Selective Replication

Sometimes you may not want to replicate and entire graph with the server. For example, a particular graph on the server may be very large and you only need part of the graph for your task. For such cases, Open Anzo allows you to specify specific portions of the graph which you'd like replicated.

var localGraph = myDatasetService.getLocalGraph("http://example.org/graphs/people", true, false);
localGraph.addTracker("http://example.org/people/bob", null, null);
localGraph.addTracker("http://example.org/people/alice", null, null);
myDatasetService.replicate();

Notice the third boolean argument to the getLocalGraph method. When false, it means that the entire graph should not be replicated. By default, that argument is true and the entire graph is always replicated. Passign false actually causes none of the graph to be replicated. Then the caller selectively adds patterns which specify what data should be replicated. In the example above, only statements about Bob and Alice will be replicated.

Notification

An Open Anzo server can send notification messages to a client that is listening for them. Notification messages are sent when the statements are added or removed from graphs on the server.

var listener = {
  onStatementAdded : function (statement) {
    // do something in response to a statement being added
  }
};

myDatasetService.addNotificationListener(listener);
myDatasetService.addTracker("http://example.org/people/bob", 
                            null,
                            null,
                            "http://example.org/graphs/people");

A caller registers their interest for notification events by creating a tracker. The example shows a tracker that will receive notifications for statements about bob.

Notification Feeding Replicas

You may have noticed that notification uses trackers and replication also uses trackers. Indeed notification can be used as a way to keep a replica up-to-date in between replications. Any statement addition or deletion that corresponds to a LocalGraph will be added to or removed from the local data for the local graph.

Remote Graphs

The DatasetService can create a different type of graph as well as the LocalGraph objects described previously.

var remoteGraph = myDatasetService.getRemoteGraph("http://example.org/graphs/people");
remoteGraph.find("http://example.org/people/bob", null, null,
                 function (statements) {
                   // do something with the statements found.
                   remoteGraph.close();
                 });

A RemoteGraph represents a graph on the server. Changes to the graph go into the transaction queue just as for LocalGraph. However, all of the graph read methods like find, contains, getSize, etc. Go directly to the server. In JavaScript all of these on remote graph operations are asynchronous. Thus, in the example a callback is provided. In other multi-threaded languages in it possible to perform the read methods synchronously.

You can think of a RemoteGraph as a LocalGraph without a replica. Note, however, that since the changes to the graph go into the transaction queue, they will not be sent to the server until the next replication. When the next replication happens depends on the replication mode as described above.

Closing Graphs

Throughout the examples above graphs are closed when they are no longer needed. This is a very important aspect of the Open Anzo API. Graphs in Open Anzo often have an association to external resources such as connections listening for notifications, replicas, etc. When a graph is no longer needed, the close method on the graph should be called.

This is similar to other APIs which represent external resources such as files or database connections.

SPARQL Queries

To document

Transaction Commands and Preconditions

To document

<< Back to AnzoJSDesign

Copyright © 2007 - 2008 OpenAnzo.org