The Open Anzo Project

Semantic Application Middleware

This page contains raw notes taken that will become a design document found at AnzoJSDesign

Raw Design Notes

REVERSED DESCISION...HERE ONLY FOR POSTERITY - Right now serialization is actually a special function to make things into the JSON format. We want to minimize that since we are already in JavaScript. We want to separate the data in objects like Transaction from the methods. This mean that these objects should only work with getters and setters.

Statement, Node Serialization

  • A small subject of the system
  • New Foo() to get at the data....which only even works wif tall the state.

Design of API:


DatasetService? * Main object from which you get to all others Anzo related objects. * Dataset is a collection of Graphs * DatasetService? keeps state about connections to server.

Anzo.Now(er...just): (a.k.a. in memory) Graph NamedGraph - Simple in memory "Graph" and "Model" are the same essentially. Jena used to have Graphs and Models. Models had some extra.

TripleStore? backs a Graph - stores lists of graphs and QuadStore?


Milestone 1: JavaScript? In-Memory RDF API ( BasicGraph?, BasicEventManager?, Statement, Node, (internal QuadStore?, TripleStore?) anzo.rdf - The basic in-memory RDF data structure and implementation (without the server nessecarily. Hello World for RDF)

public:

anzo.rdf.INamedGraph anzo.rdf.Statement anzo.rdf.Value anzo.rdf.Literal anzo.rdf.Resource anzo.rdf.BNode anzo.rdf.URI anzo.rdf.NamedGraph - implements INamedGraph

anzo.rdf.GraphEventManager? anzo.rdf.AnzoValueFactory? - Factory to create these things anzo.rdf.Dataset - collection of named graphs with operations accross all of them. (no SPARQL queries) anzo.rdf.vocabulary

private:

anzo.rdf.QuadStore?

anzo.client -

public:

private:

anzo.util -

public:

private:

anzo.util.HashSet? anzo.util.Utilities

Milestone 1a: Jastor-like


DatasetService? - creates LocalGraphs? and RemoteGraphs?. Adds Transactionality via TransactionManager?.begin(), .commit().

Keeps state like: At most one TransactionQueue? and one open Transaction ("current transaction")

LocalGraph? - RemoteGraph? - A graph that represents data on the service. Perhaps throw this away in JavaScript? or (actually defer it) make this not part of the public API.

- Deffering RemoteGraph? implementation and deferring async individual calls in INamedGraph interface.

Tracing an add:

  • Transactions have "Command"s in them. Internally those get added into ITransactionCommands but that's not exposed. It happens oddly by calling the 'execute' methods which then internally would call a bunch of add/remove calls on graphs. Each add/remove call will get the current "ITransactionCommand" and add/remove triples in there.
  • ITransactionQueueListener and ITransactionQueueHandler are the same...seems you don't need both...vestigial. Both basically say things like TransactionAdded?, Cleared, etc.

Public API constructs:

DatasetService? Command RemoteGraph? LocalGraph?

these have a MetadataGraph. Metadata graphs are artifacts of Anzo. Has things like graph revision, last modified, createdby, modifiedBy, aclURI,. You can only add these sorts of statements to a metadata graph. It'll reject them otherwise. Are metadata graphs readonly? Not really but they have checks to prevent you adding statements using certain "preferred" predicates.

TransactionQueue? (ITransactionQueue)

What is an Executor? - deffered execution concurrent utility to deal with Eclipse and Swing needed events to fire on the UI thread.

ModelService? - The storage interface. Storage operations like 'update', find, execQuery, getSize of graph, get graph.

ModelServiceApi? is basically the main entry point to the storage system on the server. It's not a client thing. It's a server.

Graphs need to be closed? Why? Because there are references in the DatasetService? (that way you want just have a URI and grab the cached LocalGraph? without

any special knowledge of the graph already being cached.) We could just have a method to 'clean up the graph' explicitly rather than reference counting but then you have to expect someone to just "know" when it isn't needed anymore. That's too much to expect in an application of disconnected lenses. The references in the dataset service are for things like notification and replication.

Caching (reuse of local graphs across closing the last reference (to avoid redownloading) should be added as a seperate service.

LocalGraph? - create is very similar to RemoteGraph?. When you create a LocalGraph? it adds its triples to the DatasetService?'s one big NamedGraphContainer?. The Local graph simply wraps the NamedGraphContainer? for exposing the INamedGraph API with the ContainerNamedGraph? object. An IContainer is a "QuadGraph?". It stores graphs rather than Triples.

A dataset service has an internal replication service which has one method execReplicate(). This just basically implements the whole network communication for replication.

This replication communication mechanism uses a lot of JMS properties for the "meat" of the message, like replication marker

"Immediate" replication is not supported in the JavaScript? client because it is rarely used but it means that every single method like add, remove, commit, etc. will need to be ansynchronous. But we'd rather not do that so we will probably remove that mode in the JavaScript? one.

DatasetService? keeps a namedgraphContainerProxy

DatasetService? has a "notification transaction" one for everything...being built up from notification.

NotificationService?:

It's a service like the ModelService? and ReplicationService?. It is an internal concept of the DatasetService?. It has operations like (un)registerSelector. Selectors are essentially just listening on JMS queues. The Anzo server doesn't really know about 'selectors' it just publishes and others subscribe.

Transactions that come up from notification messages have a transaction ID that is a monotonically increasing integer. So you can actually sort them by sorting on the ID.

Event Philosophy - JavaScript? supports duck typing and we will support it for event listeners. We will supply interface classes for events but not require their use.

They will serve mainly as clear documentation of the events. The code that fires events must check for the exitence of the appropriate callback before firing the event.

anzo.rdf - The basic in-memory RDF data structure and implementation (without the server nessecarily. Hello World for RDF)

public:

anzo.rdf.INamedGraph anzo.rdf.Statement anzo.rdf.Value anzo.rdf.Literal anzo.rdf.Resource anzo.rdf.BNode anzo.rdf.URI anzo.rdf.NamedGraph - implements INamedGraph (f.k.a. BasicGraph?)

anzo.rdf.GraphEventManager? anzo.rdf.ValueFactory? - Factory to create these things

  • This factory methods for convenience will be at the "anzo." namespace: anzo.createURI, etc. This is a special case for these methods. Normally static methods should go into their own methods.

anzo.rdf.Dataset - collection of named graphs with operations accross all of them. (no SPARQL queries)

.addGraph(INamedGraph)

?anzo.rdf.DeltaGraph? - This may not be neccessary or may be extraneous in the API.

private:

anzo.rdf.IQuadStore (like IContainer in Java) anzo.rdf.QuadStore? (a.k.a "QuadStore? to the exterme"..kind of like NamedGraphContainer? in Java) anzo.rdf.RdbQuadStore?

anzo.client -

public:

    DatasetService {
      
      getLocalGraph: function (uri) {

         metadatagraphUri = ....; // Obtain via some algorithm...hopefully without going to server.
         var metadataGraph = new MetadataGraph(metadatUri, this.transactionProxy);
         var localGraph = new AnzoGraph(uri, metadataGraph, this.transactionProxy);
        
         // Add the magic statements to the metedata graph that the indicates to the server the we are creating a graph.

         // Do lots of book keeping to keep track of this graph
        
         return localGraph;
      },

      getRemoteGraph: function (uri) {

         metadatagraphUri = ....; // Obtain via some algorithm...hopefully without going to server.
         var remoteQuadStore = new RemoteQuadStore(uri);
         var transactionProxy = new TransactionProxy(remoteQuadStore);
         var metadataGraph = new MetadataGraph(metadataUri, transactionProxy);
         var remoteGraph = new AnzoGraph(uri, metadataGraph, transactionProxy);
        
         // Add the magic statements to the metedata graph that the indicates to the server the we are creating a graph.

         // Do lots of book keeping to keep track of this graph
        
          return remoteGraph;
       }
    }  

private:

AnzoGraph? - extends anzo.rdf.NamedGraph and adds the MetadataGraph...since Anzo is what adds metadata to a graph. Similar to INamedGraphWithMeta in Java. MetadataGraph - extends anzo.rdf.NamedGraph and simply overrides some methods to check for reserved predicates. IModelService JmsModelService? JmsReplicationService? JmsNotificationService? TransactionCommand? - contains what is in Java a DeltaContaier? right now plus all the transaction commands. TransactionQueue? - ITransactionQueue and ITransactionManager/ITransactionQueueManager are mixed into this one class.

It has begin/commit, etc. TransactionQueueListener/Handler? should be combined. They are duplicate. For the replication implementation: "What happens when you replicate"):

  1. The transactions in the queue get committed via the modelservice update call. Then the results come back as success/failure information. We hold onto those until after we finish replication.
  2. Then changes are pulled from the server via an execReplicate call on the ReplicationService?. The Java code parses responses from JMS with an event based system like 'handleStatementStart/End'. We've decided that it was done that way due to the SAX parsing but we're parsing into JSON first and so we can just traverse the JSON tree for parsing and don't need event-based parsing. It's likely the parsing methods get reused by notification handling code.
  3. The update results from step 1 are sent to the transaction queue in a 'transactionComitted' call which then sends events and removes the completed transactions from the queue.

TransactionProxy?

DatasetService {
  
  transactionQueue = new TranscationQueue(quadStore, { isPersisted: true, gears: true } /* plus possibly other config */) { 
  
  replica {
    baseQuadStore
    notificationProxy
    // tracker stuff
  }
  
  begin {
   transactionQueue.begin()
  }
  
  commit
  getLocalGraph
  
}

Replication (see also transaction queue description above):

Replica
  state (quadstores)
  keeps track of trackers
  getNotificationUpdater()- new ReplicaUpdater(notification proxy)
  getReplicationUpdater() - new ReplicaUpdater(base quad store)

Replicator
  - doReplicate will use a ReplicaUpdater to update the replica via the BaseQuadStore based on the JSON repose from the execReplicate call to the replicate service.
  - handling notifications, etc.

ReplicaUpdater(Proxy) - knows how to turn a parsed server response into operations on the replica.
  updateTransaction
  etc.

Notification Flow: Following a message from a server notification to the LocalGraph?. Basic understanding

  • Statements sent up with a notification are only visible to a user once the whole transaction has been received.
  • The way that works is that JMS notification processing code collects all of the messages and only once it has gotten the transaction complete message does it actually call the IRepositoryHandler methods. Those are the methods which then modify the NotificationContainerProxy? thus adding it to the user's view of the LocalGraph?.

1. A JMS message arrives from the publisher via the broker and is handled by JmsNotificationService?. 2. The JmsNotificationService? parses the message and handles the different kinds of messages.

  • Transaction Start Message:

Creates a spot in its pending transaction list. It will dump any further messages that correspond to this transaction into that spot.

  • Statement Messsage:

Dumps this message into the appropriate transaction bucket, without doing any processing of the statement.

  • Transaction Complete Message:

Finds the first K completed transactions and applies those.

JmsNotificationService?.handleTransactionComplete() {

// Here we should add the statements to the notification proxy. datasetService.updateTrackedStatements(this.datasetService.getNotificationNamedGraphContainer(), messageData);

which internally just calls the the notification proxy's add/delete implementation.

The notification graphs add/delete implementation takes care of sending out events appropriately.

}

  • Named Graph Message

datasetService.updateTrackedNamedGraph(this.datasetService.getNotificationNamedGraphContainer(), messageData);

internally it then calls the notification container's add/delete methods which take care of sending events, etc.

  • Node message: JmsNotificationService? maintains a table of NodeId? (integer) to URI. This message lets it build that mapping so that the 'statement' message can use the id's rather than sending the full URIs each time.

Question:

  • Does adding a statement via LocaGraph?.add send a tracker event (assuming the statement matches an active tracker)?

TransactionCommand? DeltaContainer? - An implementation of IQuadStore

anzo.util -

public:

private:

anzo.util.HashSet? anzo.util.Utilities

Notes on listeners

The Anzo Java client defines several listener and event handling mechanisms.

1) INamedGraphListener - registered on an event manager owned by the DatasetService? 2) INamedGraphListener - registered on an individual named graph 3) IReplicationListener - registered on the DatasetServiceReplicator? 4) TrackerListener? - registered on a tracker for when updates induced by that tracker occur 5) ITransactionQueueListener(and Handler) - registered on the TransactionQueue? for TransactionEvents? 6) QueueEvent? and descendants - a whole class hierarchy of various unrelated events that have to be queued in the DatasetServiceReplicator? due to concurrency, and customExecutor thread restrictions. 7) INotificationListener - registered on notiifcation service, used internally in the IRepositoryHandler mechanism

Here's what we think of each of these

1) Definitely need these for stuff we are tracking and that we don't have local graphs for. 2) Natural sort of events at the graph level. 1+2 can have some redundancy, but I guess this rope is ok. If the use registers a listener with both a graph and the dataset service, then they'll get two events for each statement

3) We need some way to notify the user that a replication has completed or failed. However, contrary to what we agreed earlier, you may need to have a way to register for a particular replication has finished if you are counting on a paritcular named graph having been added, i.e. there is already a replication going before you call replicate. Then again, you can just register a listneer on that named graph before you call replicate to get the event. In the end, perhaps a user need only know if a replication has failed, everything else will happen through more specific events.

4) I tend to think we can do away with these. The user can implement filtering in the INamedGraphListener registered with graphs or the dataset service 5) Again, not sure about the use case from the user's perspective. Any errors that occurred during transactions can be passed to the user in the ReplicationListener? event. 6) I'm fairly confident that the concurrency and threading issues that motivated this mechanism will not exist in javascript. 7) Since we are calling through to the replicaUpdater directly, we are do not need this mechanism. I'm sure there are others, but for completeness can you remember any others that Matt setup? Most importantly, are there any other events that I missed that we need to allow our users to listen to?

Authentication Service

Authentication service provides the ability to verify an Anzo user's credentials, retrive information about their roles, etc. Typically this is used in the Java client when in embedded mode. Typically such a Java client will be setup running as a superuser. The caller will then use the authentication service to verify a user's credentials and, if they are valid, willuse the runAs methods of IModelService to have Anzo apply that user's ACLs to all operations. Typically an authentication service eventually ends up in the server in an AuthenticationProvider? which is what knows about LDAP, etc.

In the JavaScript? implementation, we don't need to supply an AuthenticationService? because this use case above doesn't apply. The purpose of the above was mainly to avoid having different infrasstructure connectsion (to JMS, etc.) per user. Typically, the JavaScript? client will be used by an end user rather than as an intermediate data service. That is, there is no JavaScript embedded mode. This means the model service in JavaScript? won't support runAs method.

It is possible that we end up needing this when considering perhaps JavaScript? running on the server. Nothing in the design above precludes later adding the authentication service or embedded mode functionality.

Sean wonders about the notification...can it be split up.


- Matt questions: - Why a triplestoreand a quadstore? Isn't a QuadStore? enough (just pass nul for name).

  • Why keep a seperate triple Store object rather than the graph being actually a seperate store.
  • Notification transaction: It looks like if you get transcaction 1 begin, 3 begin, 1 end, 3 end, 2 begin, 2 end. The processTransaction code looks like it will execute transaction 1, then 3, then 2.
Copyright © 2007 - 2008 OpenAnzo.org