- Class Overview
- Class Descriptions
- anzo.client.DatasetService
- anzo.client.JMSModelService
- anzo.client.JMSReplicationService
- anzo.client.JMSNotificationService
- anzo.client.JMSQueryService
- anzo.client.JMSResetService
- anzo.client.AnzoGraph
- anzo.client.MetadataGraph
- anzo.client.Replica
- anzo.client.Replicator
- anzo.client.ReplicaUpdater
- anzo.client.Server
- anzo.client.ServerQuadStore
- anzo.client.TransactionProxy
- anzo.client.TransactionQueue
- anzo.client.Transaction
- anzo.client.Precondition
- anzo.client.GraphCache
- anzo.client.Tracker
- anzo.client.TrackerManager
- Events
- Class diagram
- Class Descriptions
- Control flow
- Considerations for local persistence
- Configuring anzo.client
The purpose of this document is to describe, in detail, how to implement and connect the various components layed out in the AnzoApiPrimer, as well as introduce new non-public components necessary to make things work properly. In the first section, we provide an overview of each main object and class in the system, and to the best of our ability, how it interacts with other components. The reader is encouraged to read through this section carefully, but also understand that it is designed as a reference implementation. To complement these descriptions, we also provide a diagram, showing roughly the relationships between the classes and objects. In the second section we provide brief end-to-end descriptions of how the major operations in the client work, including reading and writing data, replication, notification, and event handling. In the third section, we briefly discuss considerations for local persistence, even though the details are outside the scope of this document, and conclude with a brief specification of how the client is to be configured.
Class Overview
Throughout the transition from anzo-java to anzo-js, we adopted a simplify-wherever-possible approach. Where we did not have the immediate or guranteed future need for a particular listener mechanism, level of indirection, interface or super/subclass decoupling, we opted to compact in favor of the less flexibly but more readily understandable approach. However, we retain the same operational design, so that we do not attempt to solve problems already done so by anzo-java. Therefor, each of the classes and objects defined in this section may have the same functionality spread out over several classes and interfaces. Where possible, we make note of the simplifications made, and the related anzo-java classes.
Class Descriptions
Anzo-js is implemented using the Dojo javascript framework. Each class in Javascript Anzo Client defined in the package anzo.client. The core rdf library is defined in anzo.rdf and is heavily referenced from anzo.client. For simplicity, each class described in this section is assumed to be in anzo.client, for example, anzo.client.Datatset if the package is ommitted. To provide enough context for our discussion, we may inlcude brief descriptions of how the various classes should be used, but for a more complete descriptions, the reader should refer to AnzoApiPrimer.
A couple notes before we get started. Often times we refer to Anzo Java components by their new name in anzo.client. Although this is perhaps a bit strange, it keeps the reader from having to first understand the complete Anzo Java API vocabulary. For example, DatasetReplicator from Anzo Java, is now Replica, though we talk about both using Replica to indicate the location in either system. Secondly, we often talk about a component as a whole by referring to the class that provides that component instead of the instance of variable containing an instance of that class. For example, we speak about the Replica even though Replica is a class with potentially multiple instances, though not in our system. Finally, the mechanisms and functionality described in this section are specified to the best of our ability. Further details may have to be filled in at the time implementation.
anzo.client.DatasetService
The DatasetService is the outer most object in any Anzo client API. Internally, in anzo.client, The DatasetService holds several pieces of important state, as well as exposes several important pieces of core functionality.
Stored state
These internal objects are completely hidden from the user. This represents one of the major departures from the Anzo Java API, in which the user often had to perform method calls on these internal objects.
- Replica - the component managing the data and state of local copies of named graphs replicated from the server.
- Server - the component managing the graphs that provide a remote view of data on the server.
- TransactionQueue - The set of transactions to be pushed to the server during the next replication.
- TrackerManager - The set of Tracker objects representing trackes the user has requested. In Anzo Java these are registered with the Replica. Here we register them with the DatasetService itself so that we can receive notification for changes to graphs that are not part of our replica.
- A list of INamedGraphListener - These are listener implementations registered by the user to report all named graph statement adds and removes. Such listeners are necessary because there may be no replica graph or server graph for a particular graph that the user is tracking.
- Two instances GraphCache - These are reference counting maps. We need them we because must keep track of a single instance of each graph per uri for calling event listeners on the graphs after notification and replication. Note that this cache is not the same thing as the local replica cache which maintains the actual triples. The GraphCache simply holds graph objects.
Core functionality
Here we describe how the major DatasetService functionality is implemented.
- DatasetService.constructor - Setup the Replica and Server
- DatasetService.getReplicaGraph(uri,trackGraph) - This method is responsible for handing out replica graphs. Before creating a new object, the replica GraphCache is consulted. If the graph exists in the replica graph cache, we return it, incrementing the usage count in the cache. Otherwise, we create a new instance by constructing an AnzoGraph around the replica's TransactionProxy. We then check to see if this graph has been marked in the replica as a candidate for reclaiming space in the QuadStore. If so, we indicate that it is no longer such a candidate, more on that later. We also must create a MetadataGraph around the same proxy and set it on the AnzoGraph. Finally, we must add the correct statements to the metadata graph that the Anzo server will interpret to create the named graph. We'll discuss these details in a later section. Notice how there is no special ReplicaGraph class as there is in Anzo Java. The replica aspect of the graph is contained purely in the replica TransactionProxy and underlying local IQuadStore. The second argument trackGraph is a boolean that indicates whether or not to track the graph. If true, we must setup a Tracker for this graph, and register it with the JMSNotificationService. Even if the user specifies false for this argument, the graph may be fully or selectively graph tracked by registering a tracker on the AnzoGraph object after it has been returned.
- DatasetService.getServerGraph(uri,trackGraph) - This method is responsible for handing out server graphs. Before creating a new object, the server GraphCache is consulted. If the graph exists in the server graph cache, we return it, incrementing the usage count in the cache. Otherwise, we create a new instance by constructing an AnzoGraph around the server's TransactionProxy. We must also create the MetadataGraph and add the special statements indicating a new named graph to the server. Notice how there is no special ServerGraph as there is in Anzo Java. The server aspect of the graph is contained purely in the server TransactionProxy and underlying ServerQuadStore. The second argument trackGraph is a boolean that indicates whether or not to track the graph. If true, we must setup a Tracker for this graph, and register it with the JMSNotificationService. The ability to add a tracker automatically with a server graph is a departure from Anzo Java, and requires a modification described in the discussion of ReplicaUpdater.addStatement. Even if the user specifies false for this argument, the graph may be fully or selectively graph tracked by registering a tracker on the AnzoGraph object after it has been returned.
- DatasetService.begin(preconditions),DatasetService.commit(), and DatasetService.abort() - These three methods refer to the current transaction and are simply passed on to the TransactionQueue.
- DatasetService.setReplicationMode(mode) - Sets the replication mode and initializes timers if necessary. All of this is deferred to the Replicator
- DatasetService.replicate() - Kicks off a replication. Internally, this just defers to the Replicator.
- DatasetService.addTracker(subject, predicate, object, namedGraphUri, eventHandler) - add a Tracker to the TrackerManager, which, in turn performs all the necessary JMS registration and other work. Any of the s,p,o,c arguments left null are assumed to be interpreted as wild cards in the tracker. addTracker must return an integer trackerHandle that the user can use to deregister the tracker. The handle will be issued by the TrackerManager. Note that this is different from the Java tracker API. The Java tracker API has the concept of a trackerSet. Rather than reference counting trackers using trackerHandles, the Java API simply say that a tracker can only be registered /deleted once per trackerSet. If you want to register a tracker twice, then register it in two different trackerSets.
- DatasetService.removeTracker(trackerHandle) - passes trackerHandle along to the TrackerManager to disengage the tracker. Note that the same tracker pattern may be registered twice with the TrackerManager. and a different trackerHandler is issued each time. Only when remove has been called for each trackerHandle issued for the pattern will the client cease tracking the pattern, more in this in later sections.
- DatasetService.executeQuery(defaultNamedGraphs, namedGraphs, query, baseUri) - passes the query directly to the JMSModelService. The results come back in JSON and the caller must be able to handle this standard query result format for Javascript.
- DatasetService.getNamedGraphRevision(namedGraphUri, revision) - gets a read-only snapshot of the named graph along with its metadata graph. DatasetService defers directly to the JMSModelService for this call. The caller must be aware that the graph exists outside the replica and is not connected to the transaction queue or server in any way. These method is to be used to show the revision history of data. User-implemented revert methods where the user explicitly replaces the contents of a graph in the replica or server with the contents of one of these read-only snapshots is perfectly acceptible.
anzo.client.JMSModelService
This class, and the next two in our list, are JMS implementations of generic service interfaces. There really isn't a readily apparent reason why IReplicationService and IModelService are seperate interfaces in the Anzo Java library, but we keep them seperate for consistency. JMSReplicationService really only implements a single method, JMSReplicationService.execReplicate. Furthermore, we do not invision having a different implementation later on, so we do not bother to mock-up an interface using the Dojo-style inheritance. Our service implementation will comprise our JMS services only.
Unlike the JMSNotificationService, the operations in JMSModelService are inherently request-response, and so other protocols other than JMS could be used for the service. The Anzo server provides a HTTP-rest and XML Web Service implementations. In anzo.client we support only the JMS version, making use ofthe JMSClient's ability to match up request and response messages, and provide a callback service when responses arrive. This is also a very clean and network efficient design since respone messages arrive over the same communication channel as notification messages, often in the same HTTP response under the covers.
The basic responsibility of the JMSModelService is to translate from such anzo.client structures as Transaction[] and sets of anzo.rdf.URI into JSON structures that makeup the body of the messages. In many cases, parameters in the reuqest are sent as JMS properties in the JMS message. The JMSModelService must also turn the JMS message bodies and properties into proper objects that the rest of anzo.client may use.
The IModelService interface defined in Anzo Java defines several methods, many of which we won't be implementing. Here are the ones that are important to anzo.client. The callback is a function that JMSModelService will call with the return value when the correlated response is received over JMS or an error message.
- JMSModelService.updateServer(transactions, returnResults, isIndexSynchronous, callback) - sends a set of transactions to the server. The response will be a corresponding list of successes or failures. The first boolean argument returnResults indicates whether or not the server should back the updates that were made. This is always true in the case of anzo.client. The second argument indicates whether or not the update call should block on the server until the text indices have been updated. Given the current state of the text index, we set this to false in each call from anzo.client.
- JMSModelService.executeQuery(defaultNamedGraphs, namedGraphs, query, baseUri, callback) - issues a SPARQL query against the server. Making the call requires a simple translation of the parameters into JMS properties. To process the response, we simply ask the server for our results in JSON. Understanding this result format will be the responsibility of client.
- JMSModelService.findStatements(statement, callback) - performs a {{{find on the server, return a list of anzo.rdf.Statement objects as results. The single parameter is also an anzo.rdf.Statement with as many elements of the quad left null as the call would like. This method is not used within the Replica for replica graphs since all replica finds go against the local replica. It is, however, used by the ServerQuadStore to perform find operations directly against the remote database.
- JMSModelService.getNamedGraphRevision(namedGraphUri, revision, callback) - prepare a simple JMS message with the URI and revison and sends to the server. The statements from the graph and metadata graph must be deserialized from JSON. We return an AnzoGraph and corresponding MetadataGraph within.
- JMSModelService.getSize(namedGraphUri, callback) - returns the number of statements in the given named graph. This seemingly random call is used to fullfill the contract of INamedGraph for graphs back by the ServerQuadStore.
The rest are methods whose behavior was rather experimental, such as inference, or functionality not used by the Anzo Java API at all.
anzo.client.JMSReplicationService
JMSReplicatoinService only implements a single method, JMSReplicationService.execReplicate(). The request of execReplicate is indeed rather simple. It takes only a set of serialized trackers, set as a JMS property as well as the marker handed back by the server for the last replication, indicating the point at which we were last up to date. What come back is a large structure containing all the additions, deletions, namedgraph additions, transactions, and ACL changes since the last replicate. Once we receive this, as a JSON object, we hand it to the replication ReplicaUpdater (there is one for notification as well see below), and the local replica gets updated. The final task of the JMSReplicationService is to notify all of the IReplicationListeners registered with the Replica.
anzo.client.JMSNotificationService
The JMSNotificationService provides three major pieces of functionality. First, it provides a connect operation to let the notification server know that the client is interested in notifications. Second, it implements a mechanism for registering Tracker objects to let the server know the client's interest in particular changes. Lastly, it receives notification messages using the JMSClent.
Receiving notification message is the buk of the JMSNotificationService's work. The distinguishing feature of a notification messages is that they have no correlation id. That is because they aren't the result of a request whose response we want to correlate. The JMSClient sends messages that don't have correlation ids to its registered messageListener. The JMSNotificationService registers itself as a messageListener. Since notification messages are very granular, processing notification messages involves a sort of buffering of messages. The JMSNotificationService collect the messages until the operations they repreent are complete enough to represent a consistent operation to a graph. Its collect messages, like transaction start, statement add/delete, and transaction end until it has at least one complete transaction. Then it takes the complete transactions and represents them in the JSON format used in the JMSReplicationService. Those objects are then handed to the appropriate ReplicaUpdater so that the changes are realized though the graph.
Notification messages are guaranteed to sent and received in order. But there may be many concurrent updates happening and their notification messages may be sent intermingled. So notification messages are in order but not serialized across concurrent updates. To handle this, the JMSNotificationService keeps buckets of incoming changes. Each bucket contains the messages for a single transaction. Once we receive the transaction end message for a bucket, then it is marked as complete. Note that it is possible that a tranaction end message arrives for a transaction that was started after a still incomplete transaction. Even if we have a complete transaction bucket, we can only apply it to the Replica once all of the transactions that preceeded it are also complete. So the JMSNotificationService applies the first k completed transactions. We know The sorting of transaction order is possible based on the transaction id. The transaction id is an increasing integer. The first k completed transaction buckets can include gaps in the transaction id sequence as long as there is no partially completed transaction bucket. For example, take the following sorted transaction buckets:
| Id: 3 | Id: 5 | Id:6 | Id: 7 | | Complete | Complete | | Complete |
Then the first k completed buckets are the first two buckets (id 3 and id 5). Skipping id 4 is okay but even though id 7 is complete it isn't in the first k completed buckets because id 6 has an incomplete bucket.
- JMSNotificationService.registerTracker(tracker, callback) - sends a message to the notification server to let them know the types of notifications in which we are interested. If we have not yet connected to the notification server it will call JMSNotificationService.connect.
- JMSNotificationService.unregisterTracker(tracker, callback) - sends a message to the notification server to let them know we are no longer interested in certain notifications. Does nothing if we aren't connected to the notification service.
- JMSNotificationService.connect(callback) - sends a registerUser operation to the notification server to let them know we want notification messages. This method isn't typically called directly but is automatically called when registering a tracker. This does nothing if we are already connected.
- JMSNotificationService.disconnect(callback) - sends an unregisterUser operation to the notification server to let them know we don't want notifications any longer. this method isn't typically called directly but is automatically called when registering a tracker. This does nothing if we are already disconnected.
Notification messages are formatted slightly differently than the response from a replicate operation. The ReplicaUpdater knows how to work with the response from a replicate operation. So prior to sending the messages to the ReplicaUpdater, the messages are transformed into the proper format. Below are some examples of the transformations:
Sample Notification Statement message:
{ "id": "jms-ID:snorks-4303-1196864363734-2:1:3:2:117", "timestamp": "Wed Dec 05 11:28:43 EST 2007",
"data": {
"body": null,
"properties": {
"metadataUri": "http://openanzo.org/metadataGraphs/aHR0cDovL2V4YW1wbGUuY29tL2dyYXBoMQ==",
"operation": "UpdateResults", "dataType": "http://www.w3.org/2001/XMLSchema#string",
"transactionId": "57", "subject": "http://example.com/subject1",
"predicate": "http://example.com/predicate1", "method": "true", "commandId": "0",
"object": "My object 1", "namedGraphUri": "http://example.com/graph1", "type": "Statement",
"objectType": "2"
}
},
"channel": "/anzo/user/default/16k838a4jkbnt"
}
Sample Statement update statement that the ReplicaUpdater understands:
{ "type": "Statement", "method": "Addition", "namedGraphUri": "http://graph1",
"subject": "http://subj2", "predicate": "http://pred3",
"object": {
"objectType": "literal", "value": "10", "dataType": "http://www.w3.org/2001/XMLSchema#float"
}
}
Notice the object property is made into a heirarchical object, the method property is changed fro true/false to Addition/Deletion, respectively, and the objectType is converted from an integer constant to a string constant based (objectType: 0=uri, 1=bnode, 2=literal as per constants in Anzo.Java's org.openanzo.serialization.SoapSerializationUtils class).
Sample Notification NamedGraph message:
{ "id": "jms-ID:snorks-4303-1196864363734-2:1:3:2:106", "timestamp": "Wed Dec 05 11:24:41 EST 2007",
"data": {
"body": null,
"properties": {
"metadataUri": "http://openanzo.org/metadataGraphs/aHR0cDovL2V4YW1wbGUuY29tL2dyYXBoMQ==",
"revision": "1", "operation": "UpdateResults", "transactionId": "53",
"createdBy": "http://openanzo.org/Role/default",
"acl": "http://openanzo.org/ACL/4710b5c4-113c-4935-8ed9-f6871ef049f2",
"modifiedBy": "http://openanzo.org/users/default", "method": "true",
"namedGraphUri": "http://example.com/graph1", "type": "NamedGraph",
"modified": "1196871881250"
}
},
"channel": "/anzo/user/default/vpjra08uq7zt"
}
Sample NamedGraph update statement that the ReplicaUpdater understands: { "type": "NamedGraph", "method": "Addition", "namedGraphUri": "http://graph1",
"metadataUri": "http://openanzo.org/metadataGraphs/aHR0cDovL2dyYXBoMQ==", "acl": "http://openanzo.org/ACL/bf6dfb87-323f-4ab9-9946-2bc68e984adc", "revision": "1", "modified": "1196871452375", "modifiedBy": "http://openanzo.org/users/default", "createdBy": "http://openanzo.org/Role/default" }
anzo.client.JMSQueryService
The anzo.client.JMSQueryService is used to send asynchronous SPARQL query requests to the server. It contains a single method called query that takes the following arguments:
- query - (String) The SPARQL query that is to be executed.
- defaultNamedGraphs - (Array) List of named graph URIs that identify the graphs that will be merged to form the default graph in the query's RDF Dataset
- namedGraphs - (Array) List of named graph URIs that identify the named graph components of the query's RDF Dataset
- baseUri - (String) The base URI against which relative URI references in the query are resolved
- callback - (Function) Called with the result set object (SPARQL result set as a javascript object) upon completion of the operation.
anzo.client.JMSResetService
The anzo.client.JMSResetService is used to reset the server. It contains a single method called reset that takes the following arguments:
- statements - (Array) List of statements that are used to reset the server. These statements specify the initial set of named graphs, users and access controls. This RDF document may not contain named graph data itself. The file is isomorphic to the org/openanzo/model/initializeNew.nt file that is used to initialize a new Anzo database the first time it starts up.
- callback - (Function) Optional argument that is called with a boolean specifying if the oppration succeeded or not.
anzo.client.AnzoGraph
The AnzoGraph is a simple extension of anzo.rdf.NamedGraph. In addition to providing the usual graph projection of an IQuadStore, The AnzoGraph contains a pointer to a MetadataGraph, the defigning feature of a named graph in Anzo, from the perspective of the user. From an operational perspective, the AnzoGraph must also take the DatasetService to perform the close operation properly, decrementing the usage count. The user of course never calls the constructor of an AnzoGraph so this complication does not overburden the user.
Another responsibility of the AnzoGraph that must be considered is graph tracking. When the user obtains an AnzoGraph from the DatasetService via getServerGraph or getReplicaGraph, he can indicate via boolean parameter whether or not the complete graph should be tracked. However, this may not be granular enough for every use case. To provide this, AnzoGraph exposes an addTracker(subject, predicate, object) method, the caller able to leave any combination of the parameters null to indicate wildcards. We also provide methods clearTrackers and removeTracker(subject, predicate, object). Internally, AnzoGraph defers to the TrackerManager but we must maintain the list of trackerHandle integers that correspond to the trackers we add to TrackerManger to support clearTrackers and removeTracker. In particular, we need a map from Tracker to trackerHandle so we know which handle to use for the removeTracker call.
anzo.client.MetadataGraph
The MetadataGraph is another extension of anzo.rdf.NamedGraph. The Anzo metadata graph is a special purpose named graph that provides system-level information about a named graph to the user as RDF itself, ACLs, version info, etc... Pervasvie through the Anzo systm, the metadata graph is also a mechanism for editing such information, but only that information. The metadata is actually stored in special purpose database tables so it is not possible to store arbitrary RDF in the metadata graph. So the MetadataGraph object itself in anzo.client just makes sure the user doesn't add any bad triples, saving him from a beratement by the server in the form of nasty error messages. Again, the user will never create one of these, the graph factory methods within DatasetService will take care of it.
anzo.client.Replica
The Replicator is the component of the DatasetService that manages the logical replica that comprises the set of named graphs replicated in the local IQuadStore. In our initial implementation, this quad store is simply an anzo.rdf.QuadStore, a memory implementation. The replica, is designed however, to admit a disk-based implementation. The Replica shares its work among two key components. The Replicator is in charge of first issuing a JMSModelService.updateServer() request. This first phase of replication sends all the current changes up the server. The Replicator is also in charge of preparing a replication request and invoking JMSReplicatoinService.execReplicate(). It does this using the replicationMarker stored in the Replica as well as the trackers in the TrackerManager stored in the parent DatasetService. ReplicaUpdater takes over to process the replication response from the server.
The Replicator also must maintain a list of IReplicationListener objects who have registered for replication events.
The remainder of this section we leave as future work until the design of Mojo has been completed and we have a better idea how Mojo will exercise anzo.js
The Replica contains one more very important piece of state, a list of named graph URIs for which no replica graphs are currently outstanding but may have triples in the replica quad store. We call this list the dormant set That is, replica graphs that have been replicated down, but all the named graph references have been closed so nobody is actively using it. This state is maintained by a plugin to the replica GraphCache living in the DatasetService. At some point, perhaps on a replication, the Replica will clean itself up, removing enough of these named graphs from the QuadStore to bring the storage down to a reasonable threshold. To evict such a graph from the dormant set, remove all trackers created for that named graph by DatasetService.getReplicaGraph(). This bookkeeping is achieved through the trackerHandler mechanism in the TrackerManager. Once the trackers have been deleted, on the next replication, all the proper statements will be delete and space reclaimed. We have to make one sublte departure from the Anzo Java design. In the Java API, once the reference count of a replica graph has reached 0, all trackers are removed. If, in anzo.client we do remove all the trackers at ref count 0, but before eviction, then the replica would not be properly maintained via notification, and we would only get updates on the replication after it was reinstated via another DatasetService.getReplicaGraph(). However, if we leave the tracker in place, then notifications and replications will continue to update our dormant graph. You might ask, well, why not just leave the triples as they are until the dormant graph is reinstated by getReplicaGraph()? This won't work because the global replication marker would be out of sync for that graph. One other change is that in the ReplicaUpdator we have to check the dormant graph set for graphs to update, in addition to the ones in the ref count table.
One open question here is how do we deal with trackers not created via a replica graph?. In the Anzo Java client, these trackers do not actually effect the local replica. This is convenient for our dormant set design, but it is kind of strange from a usability standpoint. Right now, a user can register a tracker with the replication system, and on replication, the proper statements will be brought down, but they will be dropped if no named graph is found for them, and no events will be fired. The right thing to do would seem to be to go ahead and process the statement, adding it to the container. However, assuming we do add such statements to the container, how do we reclaim that space later? The tracker would have to manually closed by the user. Maybe when such a statement update arrives, if we do not have a named graph for it we add an entry to the dormant graph set, assuming that if the user is interested in such an update, they will create a replica graph for it ? And how do replica graphs sans tracker come into play here? There are endless possibilities for inconsistency in all of this. In our current implementation, we will provide special listeners for trackers outside the scope of a named graph, but we will not bring those statements not in a replica graph into the replica.
Matt suggests that the user of the client library has to indicate what they are interested in. I think before designing this 'dormant set' mechanism we should have a look at the Mojo design and figure out what services it will actually require. Rouben's case where two different, non-aware components both quickly grab a local graph, do something with it, and close it, requires a bit more specification because at some point a replication has to occur to bring down the data. Similarly, the after the ref count zero's out, another replication has to occur before anything gets flushed from the replica.
anzo.client.Replicator
The high-level operation of the Replicator has been discussed above in the context of the Replica. However, a few details deserve a bit of extra treatment here. Before replication, we must commit the transaction queue via JMSModelService.updateServer. Operationally, we make this call, registerring our own callback in which we'll do the second phase of replication, actually replicating. If the updateServer failed, we'll immediately fire a replication failed event without further ado.
To begin the replication phase, we have to figure out which trackers have been marked for deletion, and not replicate those. We must also figure out which trackers are new trackers and which ones are old. This is so that the server knows trackers to simply hand us the delta for, and which one to hand the complete set of statements. We must also clear the transaction proxy we have been building up with notification events since this inconsistent view will be completely replaced by whatever we get during replication Next we must notify all the listeners that replication has begun. Finally we are ready to replicate. We hand the sets of new and old trackers to the JMSReplicationService and it prepares the JMS message representing the replication call. We pass the execReplicate() call a callback in which we hand the JSON response containing all the updates to ReplicaUpdater of the replica (as opposed to the ReplicaUpdater used in notification update processing.
Once replication has been completed, we must delete all of the statements in the QuadStore?, including any metadata statements. In Anzo Java, this is currently a bug and must be worked out when we reimplement anzo java. Metadata statements are currently not removed. To solve this for now in anzo.js, we can add a special type of a tracker, possibly called a local tracker that does not ever get sent to the server during replication but that can be marked as deleted and any matching statement would thus get cleaned up upon replication complete. Thus, whenever we create a local graph, we will always add one of these trackers because we may receive information about the named graph regardless of whether or not we are tracking it. The observed behavior is that we receive metadata updates if we are tracking any part of the graph, so either we always add the local tracker or we add it any time we add a tracker for the graph.
anzo.client.ReplicaUpdater
The ReplicaUpdater takes as input, a javascript object in the form of an execReplicate() JSON response, and translates these updates into named graph and metadata graph statements in its inner IQuadStore instance. In anzo.client we have two important instances of ReplicaUpdater. The most obvious resides in the Replica and updates the local replica IQuadStore using the complete replication response JSON. The second instance also belongs to the Replica, however the updates are added to the NotificationProxy, the component in the proxy pipeline that allows reads to the local replica to reflect notification updates. (Recall that notification updates are not actually comitted to the local replica IQuadStore itself because they are not consistent with any notification maker.) As described above, the JMSNotificationSystem, must transform the notification buckets it maintains into the replication JSON format.
ReplicaUpdater.updateReplica(updates) itself, will traverse the updates object, and delegate to several internal methods to handle various updates such as
- addStatement - The logic for the corresponding operation in Anzo Java DatasetServiceReplicator.updateTrackedStatement needs some review. In the equivalent method in Anzo Java, updating and event firing occurs only if there exists a graph in the local replica for the context of the given statement. This ignores the important cases of trackers not belonging to a particular named graph (see discussion above), and INamedGraphListeners registered on server graph instances. The case of trackers not attached to a particular replica graph will have to be seriously considered. However, it is quite clear that event notification aught to be carried out independently of the existence of a replica graph. Thus, in anzo.client we adopt the following strategy. We seperate updating the replica from notification. For a given statement, if there is in fact a replica graph, we, add the statement to IQuadStore for the current ReplicaUpdater. Next we fire two sets of events. First we check if there is a server and/or replica graph and invoke the event manager on each if they exist. Next, we query the TrackerManager for the set of trackers matching the Statement. Note that the TrackerManager and AnzoGraph must be implemented so that Tracker objects created for a particular AnzoGraph object are managed in such a way that they do not contribute to the running time of this query.
- removeStatement - The same considerations apply here as well.
- startTransaction
- updateGraphMetadata
The exhaustive definition will be filled out upon first implementation. Once the object has been traversed, and updates have been delegated, replication is officially complete.
anzo.client.Server
The Server provides state and functionality necessary to support server-backed graphs, AnzoGraph and their corresponding MetadataGraph instances that backend directly to the server via calls to JMSModelService. This is easily achieved by implementing an IQuadStore atop JMSModelService. This class, ServerQuadStore is described below, and we wrap it in a TransactionProxy. This TransactionProxy causes all writes to server graphs to be built up on the main TransactionQueue, and reads to be proxied through it. This TransactionProxy also serves as the outer IQuadStore implementation that DatasetService.getServerGraph() uses to hand out AnzoGraph instances. The TransactionProxy and ServerQuadStore object are both stored with the Server object inside DatasetService.
The fact that the Server and Replica share the same TransactionQueue via their respective TransactionProxy's, add's and remove's performed on a replica and server graph will reflect in each other's reads.
anzo.client.ServerQuadStore
The ServerQuadStore is an implementation of IQuadStore backed by JMSModelService. This implementation need not implement add or remove since these are all done via committing the transaction queue in JMSModelService.updateServer. In fact, the add}} and {{{remove methods need not even be defined since they will always be intercepted by a wrapping TransactionProxy. The lion's share of work is done in findStatements which is really a simple translation to JMSModelService.findStatements. To round out the discussion, we note that size() is also implemented by a simple passthrough to JMSModelService.size().
anzo.client.TransactionProxy
The next several classes all relate to transactions and commands. The design of the transaction system represents a fairly signficant departure from the way things are done in Anzo Java. In particular, Anzo Java defines several interfaces and decoupling Manager patterns in order to provision for local persistence. We believe that we can achieve the same design goals for local persistence with a simpler approach in anzo.client, and we discuss them briefly in the concluding section of this document.
We define the transaction system classes in an outside-in ordering, but before doing so, we give a quick description of how they relate to one another. The TransactionProxy is an implementation of IQuadStore that overrides the add and remove methods to insert the updates into the current Transaction contained in the TransactionQueue. It also overrides findStatements so as to filter reads through the statements stored in the transaction queue. Each Transaction is the root of a tree of Transactions induced by the user nesting begin() and commit() calls. Note that we no longer have the TransactionCommand object. This functionality has bene replaced by the Transaction tree structure.
TransactionProxy itself is fairly simple. The add method takes in either a Statement, list of Statement or s,p,o,c quad definition. First it checks if a transaction has already begun. If not, it wraps the entire add operation in a new transaction. It then asks the TransactionQueue to which it is bound for the current Transaction (possibly just created) and in turn invokes Transaction.addStatmentAddition. The find method first asks the wrapped IQuadStore to perform the same find, and then filters the results using the TransactionQueue.
anzo.client.TransactionQueue
The way we build up transactions and commands in anzo.client is quite a bit different than in Anzo Java. We provide a bit of motivating background first. In Anzo Java, nested begin() and commit() calls are treated as reference counts on the current transaction to support delegation to sub-routines that may assume they are the arbiter of the current transaction, not realizing they are already in a transaction. When an equal number of matching commit() calls have been made, the transaction is added to the queue. When a single abort() is called, the entire transaction is aborted. We think that this is slightly problematic, in that subroutines can have disasterous effects on the rest of the transaction and it is difficult to know when it is safe to call abort() and how to propogate errors. A second aspect of Anzo Java is the user defined Command. Commands, self-contained sets of adds/removes with optional preconditions, are added to the transaction queue via a special call, DatasetService.executeInTransaction. This call sets up an environment in which all adds/removes (presumably via operations on replica and server graphs) of the user command are added to a single TransactionCommand in the transaction. However, bad things can happen if the user makes any calls that modify the transaction or command state such as begin, commit, abort or even another executeInTransaction. Finally, in order for user Commands to pass information to one another, a very difficult and abstract parameter mechanism is provided. Our redesign of the these features in anzo.client combines them, providing a very simple and safe interface to the user.
Logically speaking nesting transactions within transactions does not change the fact that if any single add, remove or precondition fails in any sub-transaction, then the entire outer transaction fails. In other words, a transaction nested within another transaction, is really just a subserviant set of adds/removes that contributes to the outer transaction. A very important corrollary of this is that a logical nesting of transactions within an outer transaction created by the user can be transformed into a flat set of commands contained in that outer transaction. Forthermore, if preconditions are attached to the beginning of transactions by the user (instead of commands), then they translate directly to preconditions on the corresponding commands in the request. The truly beautiful thing here is that we have a transaction programming model expressive enough to take advantage of the full power of the Anzo server's TP system, without having to expose the notion of commands on the client, while obtaining a clean solution to the nested begin() dilemma. We illustrate the transformation algorithm with an example. Suppose the user performs the following operations, with preconditions passed in as parameters to begin().
(1) begin(p1); (2) g1.add(s1); (3) g1.rem(s2); (4) begin(p2); (5) g2.add(s3); (6) g2.add(s4); (7) commit(); (8) g1.add(s7); (9) begin(p3); (10) g3.add(s5); (11) g3.add(s6); (12) commit(); (13) commit();
After (1) we have a single transaction with a precondition, though no updates have been made.
currentTransaction : {
precondition : p1,
delta : { additions : {} , deletions : {} }, // empty delta
childTransaction : null,
parentTransaction : null,
previousTransaction: null,
nextTransaction : null
}
After (3) we have added some statements to our outer transactions.
currentTransaction : {
precondition : p1,
// we take some licesne with the notation here
// additions and deletions will be graphs
delta : { additions : {(s1,g1)} , deletions : {(s2,g1)} },
childTransaction : null,
parentTransaction : null,
previousTransaction: null,
nextTransaction : null
}
At (4) we begin a new transaction, and the currentTransaction pointer changes, though via a link to the parent, we have a link to the outer transaction. We also set the childTransaction pointer of the parent (shown in nexts step). At (6) we have
currentTransaction : {
precondition : p2,
delta : { additions : {(s3,g2), (s4,g2)} , deletions : {} },
childTransaction : null,
parentTransaction : #ref_to_old_currentTransaction,
previousTransaction: null,
nextTransaction : null
}
When we commit a transaction, we move one level back up tree and create a new Transaction object to hold subsequent updates to the transaction at that level. To do this, we have to set the current transaction to the parent of the previous-most sibling (ourself in this case) and set the new transaction as the sibling of the up-level transaction for subsequent adds so that they will be processed 'after' the children we just finished committing. so at (8) we have
currentTransaction : {
precondition : null,
delta : { additions : {(s7,g1)} , deletions : {} },
childTransaction : null,
parentTransaction : null,
previousTransaction: { // link back to previous transaction in chain
precondition : p1,
// we take some licesne with the notation here
// additions and deletions will be graphs
delta : { additions : {(s1,g1)} , deletions : {(s2,g1)} },
childTransaction : {
precondition : p2,
delta : { additions : {(s3,g2), (s4,g2)} , deletions : {} },
childTransactions : [ ],
parentTransaction : #ref_to_transaction_with_p1,
previousTransaction: null,
nextTransaction : null
},
parentTransaction : null,
previousTransaction: null,
nextTransaction : #ref_to_currentTransaction
},
nextTransaction : null
}
After committing the outer transaction, we move the transaction to the list of committed transactions.
transactionQueue : {
currentTransaction : null,
committedTransactions : [
{
precondition : p1,
// we take some license with the notation here
// additions and deletions will be graphs
delta : { additions : {(s1,g1)} , deletions : {(s2,g1)} },
childTransaction : {
precondition : p2,
delta : { additions : {(s3,g2), (s4,g2)} , deletions : {} },
childTransaction : null,
parentTransaction : #ref_to__currentTransaction,
previousTransaction: null,
nextTransaction : null
},
parentTransaction : null,
previousTransaction: null,
nextTransaction : {
precondition : null,
delta : { additions : {(s7,g1)} , deletions : {} },
childTransaction : {
precondition : p3,
delta : { additions : {(s5,g3), (s6,g2)} , deletions : {} },
childTransaction : null,
parentTransaction : #ref_to_placeholder_transaction,
previousTransaction: null,
nextTransaction : null
},
parentTransaction : null,
previousTransaction: #ref_to_transaction_with_p1,
nextTransaction : null
}
}
]
}
The tree in the end for our single Transaction looks like
T(p1) <-> T(-) | | T(p2) T(p3)
So the basic algorithm is
- begin() - create a new Transaction, if currentTransaction is not null, set it to the child of the curentTransaction, and set the currentTransaction to point to the new Transaction, and set parentTransaction pointer of new transaction
- commit() - if currentTransaction.parentTransaction and currentTransaction.previousTransaction are 'both' null, move the currentTransaction to the committedTransactions list, and set currentTransaction to null. Otherwise, set currentTransaction to the parent of the previous-most sibling of currentTransaction. If a parent is found, create a sibling for the parent. Set the next pointer of the sibling to the parent, and the prev pointer of the parent to the sibling. Finally set the currentTransaction to point to the new sibling. If a parent is not found, that means we have committed a sibling of the top-level transaction, which means we are committing the top-level transaction itself, so queue the transaction.
- abort() - simply set currentTransaction to the parent of the previous most sibling of currentTransaction, Note that the previous-most sibling may be us and so go directly to parentTransaction. If the previous most sibling has no parent, then we aborting a sibling of the top-level transaction, i.e. we are aborting the top-level transaction
- Provisionally removed queueCurrentTransaction - Just before the TransactionQueue is sent to the server via JMSModelService.updateServer, we must fully commit the current transaction. To do so, we walk up the parentTransaction and previousTransaction pointers until we find the outer transaction and add it to the list.
Finally, inside of JMSModelService.updateServer we must serialize the TransactionQueue into the expected transaction/command format. To do so, we serialize each Transaction in committedTransactions. For each Transaction in the list, we perform an enhanced pre-order (root-child-next) traversal of the transaction tree, serializing a command for each Transaction node. So, what we end up with is a flat list of nodes.
We ommit the proofs, but there are two conditions that should be proven about this datastructure.
- 'Precondition-preservation' - All updates specified by the user after begin(p1) must either
- serialized into OR
- is serialized after a command that has precondition p1
This fact assures that any update that relies on a precondition will not be attempted before that precondition is evaluted. This requirement is prehaps a bit too strong, but it illustrates the algorithm. By inspection of the tree traversal, we can see that this fact trivially holds.
- 'Order-preservation' - All updates are serialized in commands, in the order in which they are made by the user. Again, this fact trivially holds by virtue of the chosen tree traversal.
We conclude with just a bit of further discussion. A somewhat curious property of this algorithm is that nested child transaction, cause the parent transaction to be split over multiple commands when all is sent to the server. Note that this property does not violate either of the two conditions above.
Depending on how precondtions are evaluated on the Anzo server, it is quite possible that this algorithm is needlessly elaborate. The serialization step assumes that the server will actually evaluate preconditions based on commands already performed in the transaction. If it does not, then we would be able to collapse all our adds/deletes into a single command, and group all the preconditions into that one command.
Rounding out the functionality that TransactionQueue provides to TransactionProxy, to add and remove statements we simply check if currentTransaction == null? If so, should we create a new Transaction? or is it up to code higher in the stack, i.e. the TransactionProxy to maker sure we are in a transaction. We choose to only have this check in one place in the entire stack, instead of the catch-all approach of doing it everywhere. In either case, if we end up with currentTransaction we simply call the proper add or remove call on currentTransaction. filter(statements, subject, predicate, object, namedGraphUri) is a bit more complicated. We perform a root-child-next traversal of the tree, and at each Transaction node we we call Transaction.filter(statemets, s, p, o, ngUri). This set of statements will be augmented and diminished as it is passed through the tree. In anzo.client internal structures will often pass statements around as anzo.util.Set objects where we need quick add and remove, whereas the user will always present and receive statements as arrays.
anzo.client.Transaction
The Transaction contains pointers to parent, child, and sibling nodes, a list of optional preconditions, and two QuadStore instances that make up the delta aspect of the Transaction. The pointers work as described in the previous section, and we describe peconditions in the next section. Here we describe how we keep track of additions and deletions in the Transaction as well as how we filter.
In Anzo Java, each TransactionCommand, now Transaction, had a DeltaContainer. We have pulled out the simple functionality from this object and merged it all into Transaction. In anzo.client each Transaction contains two QuadStore instances, one for additions and one for deletions.
- Transaction.add(statements) - statements may be a single instance or array of anzo.rdf.Statement. We add statements to the additions QuadStore and remove statements from the deletions QuadStore.
- Transaction.remove(statements) - statements may be a single instance or array of anzo.rdf.Statement. We remove statements from the additions QuadStore and add statements to the deletions QuadStore.
The invariant provided by this algorithm is that the last add or remove or a particular statement is the operation that sticks, without any assumption about the state of the server. This is a deviation from Anzo Java's behavior that makes a possibly incorrect assumptions about the state of the server and intent of the user in order to gain a performance improvement. A very important property that will allow us to filter additions and deletions in arbitrary order is that additions and deletions
- Transaction.filter(statements, subject, predicate, object, namedGraphUri) - call statements.addAll(additions.find(subject, predicate, object)) and then call statements.removeAll(deletions.find(subject, predicate, object)). These two calls add any transaction additions and remove any transaction deletions from the find results. Recall that statements originated from a find call on the base IQuadStore.
anzo.client.Precondition
A Precondition is nothing more than a SPARQL query and an expected result. In the current implementation of Anzo server and for the forseeable future, only SPARQL ASK queries are supported. That is, queries about the state of the named graphs in the system, and have a simple true or false answer. The caller supplies the usual query (query string, default graphs, and named graphs), as well as the expected answer, true or false. The reader need only think briefly about the mechanism to understand that more complicated preconditions provide no extra benefit. Any general SPARQL query such as a SELECT, and expect result set, can be rewritten as an ASK query where the expected select results are built into the ASK query.
The particular implementation of Precondition is not terribly sophisticated. It need only store the various components until called upon for serialization.
anzo.client.GraphCache
The GraphCache is an augmented hashtable that stores a reference count with each object in the table. This is used by the DatasetService to keep track of the AnzoGraph objects it issues as local and replica graphs. When the user closes an AnzoGraph it will decrement the reference count in the GraphCache. When the count goes to 0, the graph is removed from the cache. We may also implement some sort of handler mechanism to notify interested parties that a graph has been disengaged, for example, the dormant set that is keeping track potentially space in the replica IQuadStore that may be reclaimed.
anzo.client.Tracker
The tracker is a system-wide mechanism for a client to express to the server the subset of statements and named graphs it is interest in. Each tracker contains a simple quad pattern that specifies a particular subset the statements. For example new Tracker(nguri, null, null, null) tracks all statements in the named graph nguri, and so on.
In anzo.client, Tracker objects are centrally managed by the TrackerManager. The public facing API in anzo.client does not expose the Tracker directly. Instead, we provide a few simple ways for the user to define trackers, that have already been discussed.
- DatasetService.addTracker(subject, predicate, object, namedGraphUri, eventHandler) - Adds a tracker and an ITrackerListener that will be notified when statements matching the tracker arrive via notification or replication. When a statement arrives that matches this tracker, the statement will only be added to the local replica IQuadStore if the user has created a replica graph for the namedGraphUri of the statement. As described above, this method returns a trackerHandle that the call must use to remove the tracker. If DatasetService.addTracker has been called multiple times, only the last call to removeTracker will disengage the Tracker. The TrackerManager maintains all of this information, and performs the logic.
- AnzoGraph.getReplicaGraph(), AnzoGraph.getServerGraph() - Both of these calls have an optional trackGraph parameter. If true, a Tracker is automatically added to tracker all statements in the named graph.
- AnzoGraph.addTracker(subject, predicate, object) - If the user does not wish to track the entire graph, he can pass in false for trackGraph and specify one or more restrictive trackers. Adding trackers via the AnzoGraph is different from adding via DatasetService in a few ways. First, adding the same tracker pattern twice has no effect. That is, if two separate pieces of code call AnzoGraph.addTracker with the same pattern on the same instance of AnzoGraph, it takes only a single removeTracker to disengage the tracker. However, if someone had also registered the same tracker via DatasetService it would have to be removed in both places. As described above, the AnzoGraph maintains a mapping from tracker pattern to trackerHandler. The second difference is that graph addTracker does not accept a listener because the user can already (and likely has) registered a listener on the AnzoGraph instance itself.
The Tracker object itself stores the tracked pattern (possibly as a Statement), the list of trackerHandle that have been issued for this tracker, and the list of event handlers. It also must maintain a bit that indicates whether or not it has been replicated already. The list of trackerHandles is kept essentially for reference counting the tracker. If the API user adds the same tracker pattern multiple times, they are issued a different trackerHandle each time but internally, the same Tracker object is used for all of them.
anzo.client.TrackerManager
The TrackerManager is the only class that deals with Tracker object directly. It maintains two tables of Tracker. handleTrackers, keyed by trackerHandle, is used to find the Tracker to remove when called upon to do so. patternTrackers, keyed by the tracked pattern, is used to determine if we already have a Tracker for a given pattern.
In the case where the user added a tracker on DatasetService providing an eventHandler, we will need some mechanism to notify all the event handlers when a particular statement arrives. That is, we need an algorithm that, given a statement, can find all the matching trackers. A simple way to do this is with a QuadStore. However, because patterns are not actually statements, to store a pattern as a statement we replace any null components of the tuple with a special uri called ANY. Then to match a statement, (s,p,o,c) we use the following algorithm.
(find(s,null,null,null) UNION find(ANY,null,null,null))
INTERSECTION
(find(null,p,null,null) UNION find(null,ANY,null,null))
INTERSECTION
(find(null,null,o,null) UNION find(null,null,ANY,null))
INTERSECTION
(find(null,null,null,g) UNION find(null,null,null,ANY))
The reader is encouraged to work out an example to prove this to himself. These 8 finds may seem expensive. However, only when the user specifically adds a tracker via the DatasetService will we have to invoke this algorithm.
If the notification is enabled on the client, TrackerManager must register a JMS message selector for tracker pattern so that the event publisher sends us updates to statements that match the tracker (and the we have permission to view of course). Here is the format for a message selector string from a Tracker
// These formats assume that subject, predciate and object (for literals datatype and language as well) are all not null. If any is null // then that part along with the corresponding AND is omitted. // for an object URI subject = '<subject>' AND predicate = '<predicate>' AND (object = '<object>' AND objectType = 0) // for an object BNode subject = '<subject>' AND predicate = '<predicate>' AND (object = '<object>' AND objectType = 1) // for an object Typed Literal subject = '<subject>' AND predicate = '<predicate>' AND (object = '<literalValue>' AND objectType = 2 AND dataType = '<typeuri>') // for an object Plain Literal subject = '<subject>' AND predicate = '<predicate>' AND (object = '<literalValue>' AND objectType = 2 AND language = '<language>')
Mixing trackers that were added via DatasetService with those added via AnzoGraph creates the following problem. Suppose the user invokes DatasetService.addTracker(null, null, null, nguri, myHandler). One or more replications occur, myHandler is called for the various adds and removes in nguri. In addition, the underlying Tracker is marked as replicated, and since we have no replica graph for nguri (yet), no statements from replication exist in the replica QuadStore. So what happens if the user later decides to create, and track, a replica graph for nguri? DatasetService.getReplicaGraph(nguri,true) will ask the tracker manager for a Tracker for the graph, one will already exist. On the next replicate that would ordinarily bring down the newly tracked named graph, only a few statements if any are retrieved. This is because the cached Tracker is marked as replicated so the server only hands back the changes since the last marker. Luckily, this problem is easily solved. We simply add a parameter to TrackerManager.addTracker to indicate whether or not any existing trackers should be reset, that is, set the replicated bit to false. Note that this version of the addTracker is only available internally to the TrackerManager and not on DatasetService. If a graph already exists and is being kept up to date, DatasetService.getReplicaGraph() will simply return that graph and not register another tracker. Thus, trackers will always, but only, be reset when a new replica graph is created.
Here then is how the basic TrackerManager operations work.
- TrackerManager.addTracker(subject, predciate, object, namedGraphUri, reset, eventHandler) - First lookup the pattern in patternTrackers. If it doesn't exist, create a new Tracker for the given pattern and add it to patternTrackers. In either case we now have a Tracker in hand. We generate a new trackerHandle add it to the list of handles in the Tracker. If an optional eventHandler was specified, we add that to the Tracker, and add the pattern (as a Statement to the tracker QuadStore. If reset == true then set Tracker.replicated to false. If we created a new Tracker, then we register a JMS message selector with the JMSNotificationService using the format above. Finally, return the trackerHandle.
- TrackerManager.removeTracker(trackerHandle) - Lookup the handle in handleTrackers. If we don't have a Tracker for the handle then the user has made a bookkeeper error and we return false. Otherwise, remove this handle from the list of trackerHandles in the Tracker. If the list is now empty, we remove the Tracker from both tables and unregister the message selector from JMSNotificationService.
- TrackerManager.notifyTrackers(addition, statement) - Using the algorithm above, we find all the trackers that match the given statement. For each Tracker we notify every listener of the addition or delition. TrackerManager.notifyTrackers will be called by the replica updater.
Events
anzo.client retains three basic event types from Anzo Java. Depending on the language, these will be implemented differently. However, so that an event may be registered by connecting a single method in Javascript, each type of event corresponds to exactly callback method.
Tracker events
Tracker events are fired whenever a statement arrives via replication or notifaction for which we have a tracker with at least one listener registered. The event handler is a single function passed into DatasetService.addTracker() and in turn into TrackerManager.addTracker(). The reason we cannot use dojo.connect is because the object containing the actual event handler is hidden from the user.
function myTrackerListener(addition, subject, predicate, object) {
if (addition) {
console.debug("Statement added: " + subject + " " + predicate + " " + object);
} else {
console.debug("Statement removed: " + subject + " " + predicate + " " + object);
}
}
datasetService.addTracker(null, rdf:type, :person, null, myTrackerListener);
Replication events
Replication events are fired whenever replication succeeds or fails. If replication succeeds, the user should rely on tracker and graph events to proceed. Replication can fail in two basic ways. First, the updateServer phase can fail, due to transaction commit failing. These errors will be available in the array of transactions. Should the replication phase fail, exceptions and errors will be reported in the errors object. As we build applications, it may prove more feasible to break this out into three different events.
function myReplicationListener(success, transactions, errors) {
if (success) {
// note success and wait for graph/tracker listeners to actually do work
} else if (errors) {
// we got to the replication phase, but it failed
} else {
// loop through transactions and view errors.
}
}
dojo.connect(datasetService.replicationComplete,myReplicationListener);
NamedGraph events
anzo.rdf defines a basic mechanism for listening to changes on named graphs. We use the mechanisms for notifying replica and server graphs.
function myStatementListener(statements) {
for (var i=0; i<statements.length; i++) {
console.debug("Statement updated: " + statements[i]);
}
}
dojo.connect(anzoGraph.addedStatements, myStatementListener); dojo.connect(anzoGraph.removedStatements, myStatementListener);
Class diagram
Control flow
In this section we tie together the end-to-end operation of the client. The details have already been discussed at great length in the previous sections. This section may have to get filled out as we begin implementation.
Initialization
When we create a new DatasetService? we must
- Create instances of the various service objects, JMSNotificationService, JMSModelService, JMSNotificationService
- Setup the ServerQuadStore
- Setup the TrackerManager
- Setup the TransactionQueue
- Setup the various TransactionProxy objects.
Reading and writing data
Creating named graphs
Replica graphs
check to see if a graph exists, if so return it. create AnzoGraph? wrapped around replicaTransactionProxy create NamedGraph wrapped around replicaTransactionProxy for metadata graph acquire a metadata graph uri from server create statements in metadata graph, wrap in a new transaction if necessary
Server graphs
check to see if a graph exists, if so return it. create AnzoGraph? wrapped around ServerTransactionProxy? create NamedGraph wrapped around ServerTransactionProxy? for metadata graph acquire a metadata graph uri from server create statements in metadata graph, wrap in a new transaction if necessary
Adding and removing statements
add to the current command of the outer transaction proxy
User defined commands
implementing graph.find(..)
perform a find on the inner QuadStore?, filter through chain of proxies
closing graphs
Beginning and committing transactions
Replication
Setting replication modes
manual: on user request, invoke replicate immediate: automatic: every T seconds using setTimeout. After every replciation reset the timer
Performing replication
update server
prepare modelService.updateServer call, send over JMS receive response via JMS callback process JSON object, invoke proper replicaUpdater clear transaction queue fire replication events report errors via replication listener
replicate
prepare replication request, send over JMS receive the response via JMS callback update replica, via ReplicaUpdator? of the Replicator update graphs/metadata graphs fire events fire replication events
clear the notification proxy
notification
accumlate update messages into buckets by transaction on transaction end convert buckets into JSON replication format call replicaUpdater of the notification service fire events
Considerations for local persistence
Anzo Java contains support for persisting not only the replica graphs to disk, but also the committed transaction in the transaction queue so that if the user closes the application before replicating, the committed transactions will persist. Anzo Java uses a parrelel set of TransactionQueue and Transaction classes to handle persistence. In anzo.client we can achieve the same thing with a simpler approach. When a transaction is committed and added to the queue, we can check if persistence is enabled and invoke a paritcular routine to write the transaction to the storage system we choose. Similarly, when the system loads, we can read the transactions in from the store and re-instantiate the queue. Replica graphs themselves are backed directly by a disk-based storage system. Every read (before being filtered through the various proxies), pulls directly from disk, and every write that occurs due to replication goes right to disk as well.
Calls to commit(), as well as all read operations, can potentially access the disk, opening the door for UI lockups and poor user experience. When we design the local persistence, we may have to come up with a way to persist the transactions in a non-blocking fashion as well as great a subclass or variant of AnzoGraph whose read operations take a callback.
Configuring anzo.client
Anzo Java uses .properties files to configure all components of the system, including the client. This works well as the client share many configurable components with the server. However, in anzo.client we'd like to keep things as simple as possible. Furthemore, the configuration mechanisms for the server are likely to evolve as we move to a central registry for configuration and startup so it doesn't make sense to model after the existing server configuration. For anzo.client a single configuration object, passed to new DatasetService(config) should suffice.
var config = {
username : "default",
password : "123",
jmsBayeuxEndpoint : "/cometd",
notificationEnabled : "true",
replicationMode : "manual",
persistenceEnabled : false,
}
Likely we will have a config object, DatasetService.defaultConfig that contains the default/fallback values for properties the user ommits.


