The Open Anzo Project

Semantic Application Middleware

Standalone Server

The Standalone Server is a reference to the main Open Anzo server that one would run. The server is made up of a variety of components, namely the storage server, the notification service, the query service, and a variety of add-on services like the Atom and Sparql endpoints. At the core of the server is the Jetty web container engine that acts as a host to these other services. The server can also start an embedded ActiveMQ JMS server in order to provide notification services to the system. This is an optional component that can be disabled via configuration properties. Once the Jetty and ActiveMQ services are started, the rest of the Open Anzo services are placed within the Jetty container. Once all the services are started, clients can talk to the server using a webservice interface, and can listen for notification events by registering with the JMS server.

Embedded Server

The Embedded Server is a reference to using some of the components like the storage and notification services within another program, without the need of starting a standalone server. The program that is embedding the services can talk directly to the storage and notification services without the need of going threw a webservice interface. Note: Multiple embedded servers can connect to the same relational database, and will be kept up-to-date with that the other servers are doing if they all connect to the same notification server.

Storage Service

At the center of a Anzo system, whether in standalone or embedded mode, is a storage service designed to store millions of RDF statements in an underlying relational database. This is accomplished by combining a variety of modules that provide the system with different functionality. The storage service as described here will describe the core components of the store that handle the storage and retrieval of RDF from an underlying database, and provide revision history for the data.

As described in Server's Temporal Layout, the server uses a temporal layout to store its data. This layout extends to not only statements, but also to the other server database tables?. When a client updates the server, either threw a webservice or directly threw an embedded call, the update will consist of a set of transactions. Each of these transactions will in turn be made up of one or more commands, and each of the commands in turn will be made up of a set of additions and deletions to one or more graphs.

When an update is received over the webservice layer, the transactions arrives on the server as an XML document. This XML consists of elements corresponding to the different operations that can be performed on the server. These operations are things like: adding and removing statements, adding and removing NamedGraphs, adding and removing ACLs, and so on. The different operations and format of the elements can be found in the webservice reference. The server uses a combination of a SAX parser to parse the XML, and a callback handler to handle the different elements that can be seen with the XML. The SAX parser parses the document, and as it sees an element, it calls appropriate method on the callback handler corresponding to the element being parsed. An example would be that when the SAX parser sees a statement element, it calls the handleStatement callback on the handler.

When an update is received via an embedded call, the transactions are made up of a set of graphs containing statements that makeup the additions and deletions, so these graphs are processed with a graph parser. This graph parser looks at the statements and determines what to which operations the changes correspond. This graph parser then calls the same callback handler that the SAX parser calls in the webservice case.

As the changes are processed by the server, a results object is built up, containing the changes that are occurring as part of the transaction. When the transaction is complete, this result object is used for 2 main things. First it looks threw the changes to the NamedGraphs and increments the revision number on any graph that was updated as part of the transaction. The second thing the results are used for are publishing the results to the notification service. If text indexing is enabled, the indexer will also use the results object to update the text index.

Notification Service

The notification service is made up of 2 main parts. The update results publisher, and the notification server. As updates are processed on the server, the update results are passed to an updates publisher. The updates publisher takes the contents of the update results and places them on an update queue within the JMS server. The notification server is responsible for processing messages from that queue and placing them on the queues of any client that has permission to see the data. The notification server decomposes the update message into individual statement messages before it sends the data to the client, which allows the clients to filter their subscriptions based on contents of the statements. For more detailed description of the notification system, see notification architecture?.

Authentication Service

The server can act as both a consumer and provider of authentication data. In the standard use case, the server stores login ids and passwords for the users in the system, and uses that data to authenticate user requests as well as the JMS connections. The server can also provide this authentication service to other systems. Those systems make a webservice call, providing the userid and password, and the server authenticates the provided credentials and returns the results to the caller. The server can also delegate the authentication to a secondary authentication provider like LDAP instead of using internally stored userids and passwords. In this case, the server still contains entries for users in the system, but the login id and password are passed to an authentication interface that authenticates the provided credentials, and if successful, returns the URI of the user. The requirement is that the URI returned for the user, is the URI that the user is identified with within the server. It is beyond the scope of Open Anzo to keep the systems in synch, so it would require a process to update update the server if users were added or removed to the secondary authentication provider. Open Anzo does currently allow for a secondary provider of Role/Group information, since it needs a full history of changes in roles in order to provide the temporal query and replication functionality. Again, a secondary process would have to be used to synch the Role/Group information from a secondary provider with the data within the server.

Replication Service

In order to provide efficient off-line support to clients, the server employs a replication service. This service is used to keep a client's copy of the data in synch with the servers copy. A more detailed explanation on using replication can be found at Replication?, but as far as the server architecture is concerned, the temporal layout is fundamental to the replication service. Since it is possible to determine the state of the server at any point in time, replication consists of asking the server for any changes that occurred since the last replication time. The server takes the timestamp of the last replication time, and queries its tables for any change that occurred since that time. It also uses the timestamp to determine if any change in the ACLs and Roles change what data the user can see. All this data is combined and the delta since the previous replication is passed to the client.

Copyright © 2007 - 2008 OpenAnzo.org