The Open Anzo Project

Semantic Application Middleware

Overview

This document describes various problems with the current approach to Anzo.JS authentication and describes the design for various possible solutions.

Current Authentication & Problems

To communicate with the Anzo server, Anzo.JS needs to authenticate to the server. Authentication is needed for all access to the repository such as for executing SPARQL queries, replicating graphs, sending updates, etc. The Anzo repository will check access control and so it needs to authenticate any request.

Authentication credentials are setup in the anzo.client.AnzoClient constructor and used for all operations for that particular AnzoClient instance. Currently, the authentication credentials are sent to the BayeuxJMSBridge as HTTP Basic Auth information. However, there are some problems with using HTTP Basic Auth for the AnzoClient authentication. In particular:

  • Incorrect credentials will cause the browser's HTTP authentication password dialog to appear. This is inappropriate for a library that will be used as part of other applications.
  • HTTP basic auth involves sending the password in plaintext with each request.

The simplest and most effective way to solve the plaintext password issue is to simply use SSL (HTTPS). However, we imagine a use case where the password is sent once via SSL and then the rest of the application could be run via unencrypted HTTP. It is common for apps to be deployed in this manner for efficiency. While any authentication token sent over unencrypted HTTP is susceptible to a replay attack, exposing the plaintext password is especially problematic. Using a time-limited token of some kind helps limit the exposure to replay attacks.

Solution Strategies

To avoid the browser's authentication dialog prompt from appearing when invalid credentials are sent it appears that we must move away from using HTTP basic or digest authentication. The XMLHttpRequest object on most browsers will prompt the user whenever it sees a 401 response from the server. So we must move away from those mechanisms. HTTP digest authentication would help with the concern of passing the password in plaintext but unfortunately still ties us to the browser's 401 password prompt.

One strategy is to used form-based authentication. This simply means to send the login credentials to a well known login URL. The server checks the validity of the credentials and returns a response with a token. The client then sends the token on each subsequent request. The server checks the token for validity before allowing any request. That approach also addresses the second concern of passing the password in plaintext with each request since instead a difficult-to-guess token is passed with each request.

There are various ways to implement this form-based authentication scheme.

J2EE Session & Filter

One of the typical ways to implement form-based authentication use the J2EE session to denote if a user is authenticated. Typically there is a login servlet of some form that isn't access protected. The client submits an HTTP POST request with the username and password. The servlet checks the validity of the credentials. If they are valid, the server adds an attribute to the session. For example:

request.getSession().setAttribute("org.openanzo.authentication.LoggedIn", Boolean.TRUE);

When the session is used as such, the J2EE container will set a session ID cookie on the response. The cookie simply contains a randomly generated number. That number identifies this particular server session. Once the cookie is set on the client, the browser will send it with every HTTP request, including those made via XMLHttpRequest.

The next step is to actually protect access to the requests that require authentication. For that we use a J2EE Filter. A filter is a bit of code that can run before sending requests to any servlet in the J2EE container. We can write a filter that simply checks if there the session attribute set in the login servlet is valid. For example:

HttpSession session = request.getSession(false);
if (session == null && session.getAttribute("org.openanzo.authentication.LoggedIn") == null) {
  // not authenticated...return error message (403 response code) or redirect to login page as appropriate
} else {
  // authenticated properly...so let the request go through.
}

We can place such a filter in front of any URLs that require authenticate to protect access to them.

One important consideration of using this mechanism is that, if you run multiple servers for scaling, you must take special care to make sure the correct session is located. You can do this by forwarding requests to the server that created the session (i.e. "sticky session load balancing"), or by clustering the J2EE servers so that they replicate the session between them, or by storing the session in a shared database rather than in the server memory, among other techniques. Note that Anzo.JS mainly communicates using the BayeuxJMSBridge servlet and that servlet already holds state specific to the user such as JMS consumers and topic subscriptions. So sticky session load balancing is already likely required for that. Using the J2EE session doesn't add any extra burden.

One advantage of this mechanism is that expiration of the authentication token after inactivity is handled by the the J2EE server. It expires the session after a configurable period of inactivity.

J2EE Form-based Authentication

J2EE servers have built-in support for the authentication and access control mechanism describes above. Essentially, most J2EE servlet containers will handle setting the attribute in the session and have a built-in access control filter. J2EE form-based authentication. There is no difference from implementing this technique manually.

One thing to consider is that we may need customization of the response when authentication fails. In particular we need to make sure that an XMLHttpRequest has enough information in the response to give meaningful error messages. The HTTP response code may be enough or we might need to return some JSON or XML. Either way we need to avoid redirecting to an HTML login page. I believe this is possible with the built-in J2EE form-based authentication.

Encrypted Token

Both approaches described above used the J2EE session id as the authentication token sent to the client. The J2EE session id is simply a random number. That mechanism requires the server to keep track of all currently active session ids. That takes up server memory and complicates load balancing (requires some way to make sure the appropriate server finds the session). An alternative technique which doesn't require any per-user server state also exists.

The basic approach is to use an encrypted string as the authentication token rather than a random number. The server keeps a secret encryption key. When the login servlet gets a request, it verifies that the credentials are valid. Then it creates a string consisting of the username concatenated with the current server time. It encrypts that string with the secret key using a symmetric cipher like AES or Triple-DES. Then it sends the encrypted string as a cookie on the HTTP response. That encrypted string becomes the authentication token.

The J2EE filter protects access by checking for the cookie. It first decrypts the value of the cookie and parses the username and time from the decrypted string. Simply by possessing the cookie means that the server has proven your identity as that user. The filter checks the timestamp contained in the cookie and compares it to the current time. If too much time has passed, then the server considers the cookie expired and rejects the request. If the request is valid and within the timeout, the server must still create a new value for the cookie which updates the timestamp to the current time. It sets that new encrypted value as the cookie in the response.

This mechanism trades off security for space. It's slightly less secure since you have the trouble of keeping a key secret. An added benefit is that this removes the need to have any server affinity for the authentication token. Of course, the BayeuxJMSBridge still requires the server affinity for its JMS state. The disadvantage is also the added implementation and maintenance cost of this method.

Note that security may be slightly improved by incorporating more information into the token. In particular, adding the client's IP address into the token may help protect against unsophisticated replay attacks. If the current request does not come from that IP address which was initially authenticated, it should be considered an invalid token. A sophisticated attacker could still spoof the IP address as well as replay the token so this doesn't strictly add real security. The most effective actual security improvement is to communiate via SSL.

Considerations for a distributed architecture

OpenAnzo is designed to be a distributed system since various services and components communicating via the Communication Bus. All of the various components along the communication bus require the authentication information for access control. When communicating via JMS, that authentication information is already well communicated. This document describes how the authentication happens for HTTP access to components on the combus. Much of the document discusses the BayeuxJMSBridge which is the only component which Anzo.JS touches via HTTP. However, more components possibly exist. For example, most components on the combus are accessible via a RESTful HTTP API. Also, a the binary store component would also expose functionality over HTTP that requires access control. All of the various components could be running on completely different machines. Sticky session load balancing works for distributing load among various instances of a particular component. But it doesn't work for sharing the credentials across components. How do the various methods described above work in such an environment?

The J2EE form-based authentication and the very similar filter & session methods described above don't address this distributed scenario. The list of valid session ids would somehow have to be shared across various servers. It conceivably could be done by backing the sessions by a database which all components with HTTP access could use. Another possible method is for a login component to keep the active session map and all other components could ask it if a particular session is valid before each request. That technique is essentially a poor-man's single sign-on/single sign-off (SSO) system. That is, a central authority for authorization and session control which all components defer to for checking the validity of credentials.

The encrypted token technique described above does work in a distributed environment. The main requirement is that all of the components know the secret key. Then each one can validate the token as needed.

All of the mechanisms require some cooperation between components in a distributed system.

  • Form-based authentication variants for distributed systems require one of these forms of cooperation:
    • All components use the same database to back the J2EE sessions
    • Or, all components know about the central component used to verify IDs and ask it each time to verify a valid request. (a.k.a. poor-man's SSO)
  • The encrypted token mechanism requires shared secret key knowledge and shared knowledge of the format and algorithm of the session key.

Suggested Technique

The encrypted token technique is the recommended technique mainly because it works effectively, without adding additional complexity or overhead, in a distributed architecture.

Integration into JavaScript AnzoClient

For any of the techniques described above, we can incorporate it into the AnzoClient in a simple way. In the AnzoClient.connect method, we add an XMLHttpRequest call to the login servlet. When successful, the appropriate cookie will be set. We then proceed with initialization of the BayeuxJMSBridge connection. Authentication failures are surfaced as errors via the connect callback for the caller to decide what to do. Most callers will display an error and collect new credentials from the user.

By default, the AnzoClient connect method will send the login POST request to the URL given as location in the config properties appended with /anzo_authenticate. For example,

var anzoClient = new anzo.client.AnzoClient({
    location : "/cometd/",
    username : "exampleUser",
    password : "passw0rd"
});

anzoClient.connect(function connectCallback() {
  // ...
});

In that example, the login request would go to /cometd/anzo_authenticate.

For situations where the username and password should be sent over HTTPS and the rest of the application can be sent via HTTP, you need to supply a full URL for the login request. That way the URL's scheme can be different. For that, the AnzoClient reads the loginURL config property. For example,

var anzoClient = new anzo.client.AnzoClient({
    location : "/cometd/",
    loginURL : "https://www.example.com/cometd/anzo_authenticate",
    username : "exampleUser",
    password : "passw0rd"
});

anzoClient.connect(function connectCallback() {
  // ...
});

Encrypted Token Implementation

Currently, the HTTP Basic Auth implementation in OpenAnzo is done via the ServerRealm class, an implementation of org.mortbay.jetty.security.UserRealm. The embedded Jetty server is setup to use the ServerRealm class for HTTP basic auth. All servlets configured to require authentication are given the appropriate constraint mapping in the EmbeddedJettyService.start method.

To add support for authentication using the encrypted token mechanism, we add a new implementation of the org.mortbay.jetty.security.Authenticator. The implementation's authenticate method will look for the cookie with the name, ANZOTOKEN. If not found, it will return an authentication failure. If found, it will decrypt the value of the cookie using the javax.crypto API. It will parse the decrypted string into the username, timestamp, and IP address. If the current server time is more than a configured number of minutes later than the cookie's timestamp, then the authentication request will fail. If the token is valid, the authenticator will make a call the UserRealm's getPrincipal method which in turn will call the OpenAnzo AuthenticationService via IAuthenticationService.getUserPrincipal to obtain the full principal information for the user (such as their roles, etc.). It will place that information into the HTTP request object's UserPrincipal property. That will give servlets and other components access to the authenticated user information. The authenticator will be very similar to Jetty's built-in FormAuthenticator except that it won't use the session. If there is an authentication failure, the server will either send a a 403 Forbdidden response or redirect to a login page.

This design is somewhat Jetty specific. It relies on implementing the Authenticator interface which is Jetty specific. We might consider going with a Servlet Filter approach instead because it would be transferable to another app server more easily. It also insulates us from changes to Jetty a bit more. The downside is that is that we then can't make use of the HttpServletRequest getUserPrincipal mechanism.

The token timeout configuration and login redirect configuration goes into the servlet configuration as in the following example:

@prefix comp: <http://openanzo.org/serviceContainer/component/> .
@prefix component: <http://openanzo.org/serviceContainer/component#> .
@prefix scp: <http://openanzo.org/serviceContainer#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix servlet: <http://openanzo.org/serviceContainer/servlet#> .
{
  comp:srvRootPrivate component:className "org.openanzo.server.endpoint.GenericServletEndpoint" ;
    servlet:contextPath "/private" ;
    servlet:docRoot "./docroot-private/" ;
    servlet:initOrder "9" ;
    servlet:authenticationMode servlet:EncryptedTokenAuthentication ;
    servlet:pathSpec "/*" ;
    servlet:protectedPathSpec "/*" ;
    servlet:servletClassName "org.mortbay.jetty.servlet.DefaultServlet" ;
    servlet:loginPage "/login.html" ;
    servlet:errorPage "/error.html" ;
    servlet:authTokenTimeout 15 ;
    servlet:authTokenRefreshWindow 2 ;
    dc:description "srvRoot-private" ;
    dc:title "srvRoot-private" ;
    a scp:GenericServlet, scp:ServletComponent, scp:Component .
}

The servlet:authTokenTimeout property denotes the time the token remains valid in minutes. The servlet:authTokenRefreshWindow property Denotes the window of time, in minutes, during which the cookie's timestamp will not be reset. That is, if the cookie's timestamp is less that that many minutes old, then a new updated cookie will not be issued. The cookie's timestamp will only be refreshed when the cookie is older than the token refresh window. The token refresh window is simply an efficiency consideration to slightly reduce the network traffic and encryption operations. Note that the token reset window can have the affect of reducing the effective real timeout of the cookie. For example, if the timeout is 30 minutes and the token reset window is 5 minutes, then the effective timeout, depending on the timing of the user's requests may actually work out to be 25 minutes. For example, if the timeout is 30 minutes and the token refresh window is 5 minutes, then the effective timeout, depending on the timing of the user's requests may actually work out to be 25 minutes. Consider the scenario where the user is issued a token and they make very frequent requests until immediately before the refresh window expires, just before 5 minutes have elapsed in our example. Since the refresh window was never crossed, the server never issued an updated timestamp in the cookie for any of the requests. If the user then makes their next request about 25 minutes later, then the server will see a timestamp in the cookie that is actually 30 minutes old. So the server will consider it expired. Thus the user, in this pathological case, didn't get the full 30 minute timeout after their last request. Always consider that the effective timeout is the token timeout minus the token refresh window.

The encrypted token authenticator will rely on an instance of the SecretKeyEncoder interface for its encryption and decryption needs. In particular, the EmbeddedJettyService will be responsible for initializing the encrypted token authenticator for servlet contexts that require it. This means that the EmbeddedJettyService must have a SecretKeyEncoder component as a dependency. If no SecretKeyEncoder is given to the EmbeddedJettyService, then only basic auth will be supported. An example configuration of the EmbeddedJettyService and the SecretKeyEncoder component looks like this:

@prefix comp: <http://openanzo.org/serviceContainer/component/> .
@prefix component: <http://openanzo.org/serviceContainer/component#> .
@prefix scp: <http://openanzo.org/serviceContainer#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix networkPred: <http://openanzo.org/serviceContainer/component/networkComponent#> .
@prefix secretkey: <http://openanzo.org/serviceContainer/component/secretKeyEncoder#> .
{
  comp:SecretKeyEncoder component:className "org.openanzo.server.security.SecretKeyEncoderImpl" ;
    component:initOrder "4" ;
    dc:description "Encryption component based on saved secret key." ;
    dc:title "SecretKeyEncoder" ;
    secretkey:keyFileLocation "/tmp/anzo/keystore" ;
    a scp:Component, scp:SecretKeyEncoderComponent .

  comp:EmbeddedJettyService component:className "org.openanzo.standalone.EmbeddedJettyService" ;
    component:dependency 
      comp:EmbeddedActiveMQService , 
      comp:JMXService , 
      comp:ServerRealm ,
      comp:SecretKeyEncoder ;
    component:enabled "true" ;
    component:initOrder "3" ;
    networkPred:connection comp:HttpConnection ;
    dc:description "Embedded Jetty Service." ;
    dc:title "EmbeddedJettyService" ;
    a scp:Component , component:NetworkComponent .
}

Login request

The login request, the request in which the username and password are given and a token is set in the response, must be sent to a URL that ends with the suffix /anzo_authenticate. The authenticator component will check each request to see if it ends with that path string. If so, it treats it as a login request. The request should encode the username and password into request parameters anzo_username and anzo_password, respectively. This mechanism is completely analogous to the J2EE Form-based authentication scheme which uses j_security_check for the login path suffix and j_username and j_password` for the request parameters. Ideally, this request is sent as a POST request to prevent the username and password from being stored in the URL in the browser's history.

Redirection to Login page

Typical behavior when a user tries to navigate to a proetected URL without yet having appropriate credentials is for the server to redirect the user to a login page. The Encrypted Token Authenticator should do this. It will allow a login page URL to be configured in the server. It will redirect to the login page when a protected resource is requested without having a valid AUTHTOKEN.

One important consideration, however, is that this redirection behavior is not always desired. Specifically, when a request is being made programatically via XMLHttpRequest, a succinct 403 response would be more effective. The redirect wouldn't serve the XMLHttpRequest well since the login page to which it redirects likely won't be rendered for the user. The server must somehow identify HTTP requests made via XMLHttpRequest and avoid redirection in such cases. This can be done by looking for the "X-Requested-With: XMLHttpRequest" HTTP request header. That header has become a de-facto standard for all widely used AJAX toolkits such as Dojo, jQuery, YUI, etc.

Timeouts with Cometd Long Polling

Anzo.JS works by using the Comet long-polling technique for receiving events from the server. This means that there is almost always an open request to the server. Every 30 seconds or so, a new long running request is sent to the server to check for any pending messages. Because of this polling technique, a user who may have left their computer and hasn't been active for a long time on the browser still looks like an active user to a naive server. The server keeps seeing a new request every 30 seconds or so so it things the user is active. We need to support the ability for idle apps to timeout even in the face of cometd long polling requests. There are various techniques we can use for this:

  • Exempt the cometd poll requests (connect) from being counted as activity on the server.
    • The problem with this is that it doesn't account for an app that is just streaming data from the server. To deal with that case, we could count the poll request as real activity only when it actually sends back a message.
  • Monitor activity at the client in the form of new requests made via the Anzo.JS API or responses received.
  • Monitor keyboard and mouse usage at the machine itself - only Mozilla-based browsers support this as of FF3 using nsIIdleService.

The first technique seems the most promising. However, implementation is tricky. In particular, inspecting the request to determine that this is a poll request strange. It seems to break abstraction boundaries. Perhaps a special header could be added to the poll request. But that might involve patching dojo.cometd. Or it might mean simply subclassing one of their 'transport' objects. Also, the authenticator runs right at the start of the request. So if a message actually gets sent in the poll response, and we want to reset the timeout in the cookie at that point, we need for the cometd code to give us an appropriate hook. Perhaps a filter could work in the absence of such a hook but it's not clear. The response headers would have to be modifiable. So a filter would likely have trouble with that since it would probably be too late then.

After some investigation, the solution found involves subclassing the cometd JSONTransport class. The JSONTransport class sees messages on their way back to the client and is the only class that realy has access to the HttpServletResponse object to affect the cookies. Inside the send method, we inspect the message being sent and if it is a message sent to the "/meta/connect" bayeux channel, then our custom transport implementation ignores that message. If it's a message to any other channel AND the refresh window has expired, then the transport adds the new refreshed authentication token cookie to the response. To avoid having the the transport duplicate the encryption code or gain access to the secretKeyEncoder, the transport looks for a prepared new cookie in a request attribute. The authenticator will place the prepared cookie into that request attribute if the cookie needs to be refreshed. To obtain access to the request, we use the bayeux "requestAvailable" mechanism and so the cometd servlet must be appropriately initialized with that property. We do that in the CometdServletEndpoint#initialize method.

The behavior of the authenticator can be toggled between either adding the refreshed cookie directly to the response or adding the cookie only to the request parameter. The second mode is called the customTokenRefresh mode. It can be configured using the servlet:customTokenRefreshEnabled property. For exampl, this is how the cometd servlet is configured:

@prefix comp: <http://openanzo.org/serviceContainer/component/> .
@prefix component: <http://openanzo.org/serviceContainer/component#> .
@prefix scp: <http://openanzo.org/serviceContainer#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix networkPred: <http://openanzo.org/serviceContainer/component/networkComponent#> .
@prefix servlet: <http://openanzo.org/serviceContainer/servlet#> .
{
  comp:CometdServletEndpoint component:className "org.openanzo.server.combus.bayeux.CometdServletEndpoint" ;
    component:dependency comp:CachedAuthenticationService, comp:CachedAuthorizationService, comp:NodeCentricModelService;
    servlet:contextPath "/cometd" ;
    servlet:initOrder "7" ;
    servlet:authenticationMode servlet:EncryptedTokenAuthentication ;
    servlet:pathSpec "/*" ;
    servlet:protectedPathSpec "/*" ;
    servlet:customTokenRefreshEnabled true ;
    dc:description "Cometd Servlet Endpoint" ;
    dc:title "CometdServletEndpoint" ;
    networkPred:connection comp:JmsConnection ;
    a scp:Component, comp:NetworkComponent, component:CombusComponent, scp:ServletComponent .
}
Copyright © 2007 - 2008 OpenAnzo.org