The Open Anzo Project

Semantic Application Middleware

Named graph objects, that is, implementations of the named graph interface, expose two basic events:

  • statementsAdded
  • statementsRemoved

Those events are fired at the graph statement level so sometimes they can be somewhat lacking when users want events for higher level operations. In particular, people want to use statement events to represent these two higher level topics:

  • property updates - For example, if someone changes the value of a property such as changing your name to fix a typo, then that change would be reflected as two separate events. One statementRemoved event would fire for removing the triple with your current misspelled named. Then a statementAdded event would fire to show the addition of the new triple with the corrected name.
  • logical resource modification - For example, if I change the latitude and longitude of a building, I want one event to represent both changes. Otherwise I may get one event for changing the latitude and one event for changing the longitude. Actually, it would really be four events, a statementRemove and a statementAdded for each the latitude and longitude.

One issue that this may cause is performance degradation in user interfaces. If a user interface wants to dynamically update to reflect changes in the graph data, then one simple way to implement such a user interface is to have it redraw each time a statementAdded or statementRemoved is fired. But for a change like the latitude/longitude example above, that might mean 4 redraws in quick succession that could have been done as just one.

One mechanism that AnzoClient gives to handle this situation is the transactionCompleted event. Multiple modifications can be grouped together into one transaction and only one transactionComplete event will fire for those changes. But the transactionComplete event merely says that some transaction was completed. The client must often have some more information about the transaction to know how to respond to the event. For example, if this transaction created a new "person" then the app might want to refresh the its contact list UI, etc. Transactions provide the transactionContext as a place to store arbitrary application data in the transaction that will be present in the transactionComplete event. That mechanism solves most issues.

However, there are some situations where there isn't enough coordination between the writers and the event listeners so as to use transaction events successfully. The only grouping mechanism left in such a case is the event call itself. For example, in a replica graph, all statements added in one transaction are sent in an array in one call to the statementsAdded event. Similarly for statementsRemoved. Applications can take all of the statements given in one statementsAdded or statementsRemoved event as a sort of logical unit much like what is implied by a transactionComplete event. Memory graphs don't have a grouping concept such as the begin/commit calls, but if you pass an array of statements to the add method, all of those modifications are sent in a single statementsAdded event. Similarly for the remove method and statementsRemoved. Still, in both of those situations the additions and removals are split up as separate event calls. Meaning that in the latitude/longitude example above, there would still be two redraws where one could suffice even if the changes were all in the same transaction. The suggestion to remove this inefficiency for cases where transactionComplete messages aren't viable is to have the Anzo Client API fire a statementsModified event. This is simply one event that contains both the set of additions and the set of deletions that happened in a single transaction.

Adding a statementsModified event has various implications. It could be done additively which you could consider as cluttering the API, making it more confusing. Or it could be added to completely replace the statementsAdded and statementsRemoved events, which would cause a heavy burden for updating code/docs/samples. Another downside is that it wouldn't add much to the memory graph situation because memory graphs would still only group modifications by passing arrays to the add or remove method. So the event would always fire with either the additions or deletions as null for memory graphs. Some consider that inelegant while others don't and dismiss that as an API concern. One argument claims that a single statementsModified event is a more generic mechanism and that it allows someone to add transaction functionality to memory graphs (via subclassing or the like) while still presenting the basic graph interface to listeners so it's elegant in that sense.

The resolution at which we arrived in discussion is that we believe that changing to the statementsModified style of event is a viable solution to this issue if we decide that the cost of implementation and API change is better than the performance problem it would solve. So if we start running into situations where we would have to do more elaborate things to improve performance and this change would mitigate the problem, then we will make this change. Essentially, wait until this is actually a problem. The change could be done additively to start with while deprecating the individual methods for a transitional period.

Note that proposed solution of adding the statementsModified event does not address the latitude/longitude issue for memory graphs since they have no way of grouping multiple adds/removes. To address that issue there are two possible solutions:

  1. adding a grouping construct to memory graphs (like begin/commit but different)
  2. Adding a 'change' method to the graph API that takes two arrays of statements...additions and deletions at once. Or something similar to that.

Some possible workarounds or techniques to handle this situation without transactionComplete events or the statementsModified event do exist. One is to simply delay the UI refresh by some time. Basically, the first event comes in and a timeout is set. When the timeout expires a 'refresh' method will be called to refresh the UI. While that timeout is pending, any events that happen are simply ignored since they'll be subsumed by the handler for the first event anyway. So you exploit the temporal locality of the event calls to avoid multiple UI refreshes. You can use some knowledge about the specific application to create similar workarounds that don't involve time as the mechanism for grouping the events. Other techniques may involve inspecting the statements and deciding to wait for subsequent events if they are expected. For example, if you know that all people MUST always have exactly one first name, then when you see a statementRemoved event that removes a person's first name, your application might assume that there will immediately be a subsequent event that will show a first name added to that person. So it could choose to wait on that second event before refreshing the person UI.

Another suggestion is to create a set of event accumulators of a sort. For example:

  • LatituteLongitudeEventManager - logically groups updates of latitute and longitude
  • ModificationEventManager - logically groups add&removes of the same subj/pred pair into an update
  • TemporalEventManager(5000ms) - logically groups statements that arrive within 5 s of each other

This localizes the conversion from graph events to higher-level logical events in an encapsulated and reusable manner.

Copyright © 2007 - 2008 OpenAnzo.org