Binary Store

An instance of a Binary store is implemented as a combination of a servlet within the service container and a dataset within the anzo repository.

Binary store startup.

The Binary store loads its configured dataset (from here on called the BS dataset) and subscribes for updates to the graphs within this dataset.

File create

A file is posted to a known binary store url (eg  http://expo.cambridgesemantics.com/binarystore/create ) as multipart/form-data. Included in the form's action url is a client generated unique id. Authentication is handled by the servlet authenticator filter. The user is authenticated and the create graph acl is checked to make sure that the user is authorized to create a file in the binary store. If the user has the rights to create a file then a unique url is generated. The prefix for this url would be BASESERVER_URL + "/binarystore/" + servernode + "/" + date_year + "/" + date_month + "/" + date_day + "/" + incremented_integer + "/" + filename where incremented_integer is a a global integer which is zeroed at the start of each day and incremented each time a new url is generated (eg  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png ). This is referred to from now on as the FILE_URL. If the form action url includes a client generated unique id a statement stream (a client notification method) is instantiated using this uri. File upload progress events are sent through the statement stream back to the client. The statement stream is an anzo notification api for statements which are not stored.

Example HTML form

<form method="post" action="http://expo.cambridgesemantics.com/binarystore/create?revisioned=true&upload_uri=urn:E16A5F5A-FFAD-11DC-A95E-C0A155D89593" enctype="multipart/form-data">
<input type="file" name="file" value="/tmp/mypicture.png" />
<input type="submit" />
</form>

Only one file may be received per post. The file is saved as a unique name within a server configured directory. The file on the disk mirrors the directory structure for the url eg "/srv/anzo-binarystore/node1/08/04/01/22/mypicture.png. This allows the file to be located from the FILE_URL mapping. When the file has been fully received by the server a graph (revisioned or non-revisioned based on the form data) is created as the authenticated user. This graph is added to the BS dataset.

At this stage a series of handlers can be called to interogate the file in order to create additional "meta" triples. One of these handlers could be an Aperture extractor (see  http://aperture.sourceforge.net/tutorial/extractors.html). Extracted text can be indexed by lucene and added as subject reference to the lucene index (Discuss architecture for this facility). These "meta triples" are added to the files graph.

The FILE_URL is then returned to the client.

File read

The FILE_URL is resolvable and is used to access the file. eg  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png would return the file. If the metadata for the file is required then the client includes an accept header specifying that they want metadata and which particular format they would like (n3,rdfxml etc). In this case the files graph will be returned. Additionally metadata can be obtained by appending a aspect=metadata&format=n3 query string. eg  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png?aspect=metadata&format=n3. aspect=file is the default query string returning the actual file.

Revisions can be accessed using  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png?revision=2 (&aspect=file) syntax.

Additional facilities exist to do transforms on the file/metadata. Eg  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png?aspect=file&transform=thumbnail or  http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/12/myworddoc.doc?aspect=file&transform=thumbnail would return a thumbnail of the word file.

Transforms are provided by binary store transform plugins.

Access to the file is governed by the FILE_URL graphs ACL's.

Meta-data for the file can also be accessed using the regular anzo client api against the FILE_URL graph including Sparql queries with the textmatch predicate.

File update

In order to update a file in the binary store a client submits a form.

eg

<form method="post" action="http://expo.cambridgesemantics.com/binarystore/update?graph=http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png&upload_uri=urn:E16A5F5A-FFAD-11DC-A95E-C0A155D89593" enctype="multipart/form-data">
<input type="file" name="file" value="/tmp/mynewpicture.png" />
<input type="submit" />
</form>

User Authorization to update file is checked on the graph ACL. If not allowed a 401 (not authorized) is returned. If allowed then a statement stream is subscribed to with the client generated unique id and progress information is reported to the client.

For a revisioned file (ie it it described by a graph with the revision facility). The latest stored file is renamed with a -revision_number eg mypicture.png-2 and the current upload is stored in place of the original file.In a non revisioned store the original filename is replaced with the new file.

The same meta extractor handlers are run for file update as for file create. Lucene index updated with file contents. A success or failure is returned to the client.

File delete

The Binary store is subscribed to all graphs for the files which are stored in the binary store. If the graph related to the file is deleted then the file is automatically deleted. Lucene indexes are deleted and the FILE_URL is removed from the BS dataset

Additionally it is possible to delete a file from the binary store using a form action as described below.

eg

<form method="post" action="http://expo.cambridgesemantics.com/binarystore/delete" enctype="multipart/form-data">
<input type="hidden" name="graph" value="http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png" />
<input type="submit" />
</form>

ACL'S

Acl's for the file are governed by the FILE_URL graphs ACL's. To change the ACL's on the file the regular anzo client api is used.

sample graphs

Binary store dataset graph

<BS> anzo:namedGraph < http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png>, <FILE_URL2>, <FILE_URL3> .

 http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png graph

< http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png> rdf:type anzo:binarystoreitem < http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png> < http://www.w3.org/2008/http-headers#content-type> "image/jpeg" < http://expo.cambridgesemantics.com/binarystore/node1/08/04/01/22/mypicture.png> dc:title "mypicture.jpg"

Anzo.js client side api

anzo.Client.BinaryStore?

  • Constructor
    • constructor(/*String - binary store url*/url,/*anzo.client.AnzoClient?*/anzoclient)
  • Methods
    • addFile(/*boolean*/ revisioned)
    • getFile(/*uri*/ uri, /*callback*/ function(/*anzo.Client.BinaryStoreItem?*/ bsi, /*Error*/ error){})
    • deleteFile(/*uri or anzo.Client.BinaryStoreItem?*/ uri, /*callback*/ function (/*Error*/ error){})

anzo.Client.BinaryStoreItem?

  • Methods
    • update(/*form -- see below*/, /*callback*/ function (/*Error*/) {})
  • Properties
    • revision
    • fireEvents
    • src
    • isValid
  • Events
    • onProgress(contentlength, bytesuploaded)

Example of form.

    <FORM id="testfile">
    	<INPUT type="file" name="file">
    </FORM>

Sample code.

        var anzoClient = new anzo.client.AnzoClient(properties);
        var binaryStore = new anzo.client.BinaryStore("/binarystore",anzoClient);

...

      	var bsf = null;
        
        function uploadFile(file) {
        	if (!bsf && !file) {
       			bsf = binaryStore.addFile(true);
       		} else if (file && file instanceof anzo.client.BinaryStoreItem) {
       			bsf = file;
       		}
       		else {
       			//this just tests the getFile method (we already had a bsf)
       			binaryStore.getFile(bsf.src, uploadFile);
       			return;
       		}
        	bsf.fireEvents = true;
        	var progress = function (len, read) { 
        		log(read+ " of " + len +" file1 have been uploaded.");
        	}
        	var handle = dojo.connect(bsf, "onProgress", progress);
        	bsf.update("testfile", function (error) {
        		dojo.disconnect(handle);
        		if (!error) {
	        		log("File uploaded : " + bsf.src + " Revision : " + bsf.revision);
	        		document.getElementById("display").src = "test.jpg";
	        		document.getElementById("display").src = bsf.src;
	        	} else {
	        		log("ERROR:" + error.message);
	        	}
        	}); 
        }
...

    <FORM id="testfile">
    	<INPUT type="file" name="file">
    </FORM>

Additional

File locking should be considered in a subsequent release of the binary store.