Monday, May 27, 2013

Resources Are Four-Dimensional


The term ROA (Resource-Oriented Architecture) is misleading. It should ideally stand for "Resource/Representation-Oriented Architecture", even though that's quite a mouthful.

I've found in my discussions with people who work on REST-based systems that lots of them are very fuzzy about the notions of "resource" and "representation", even though these are the fundamental concepts underlying REST (and my forthcoming messaging-oriented variant, ROMA).

Let me try and explain this. Let's say I found a space capsule that had fallen out of the sky, and by deciphering the alien glyphs on its surface, I understood that it contained a 4-dimensional object from the planet Royf. Unfortunately, I couldn't take the object out, because ours is a 3-dimensional world and such an operation is impossible. However, I found that it was possible to get the capsule to "project" a 3-dimensional image of the object it contained, so I could "see" it in a way that made sense to my limited mind. I found that I could also ask for the object to be manipulated in 3-dimensional terms. I knew, of course, that the object itself was 4-dimensional and so my instructions had to be somehow translated into terms that made sense in the 4D world. But I found to my satisfaction that the 3D image that resulted from my instructions was exactly what I wanted.

I realised then that my interactions with the space capsule were RESTian. The 4-dimensional object was the resource, an unseeable thing that I had no way of even comprehending and which was therefore mercifully shielded from my vision. What I could ask for (through a GET) was a 3D "representation" of the object, and this was something I could understand. I could also manipulate the object in several ways. I could show the capsule other 3D objects and say, "Change its shape to resemble this", or "Make its colour more like this", and it would happen! Obviously, the objects I was holding up were not the same as the object inside the capsule. They were representations of what I wanted the object to look like, when I saw it in 3D terms.

That's really what REST is. The only aspect of the resource itself that we can directly deal with is its name, or URI. The actual resource is completely unseen, indeed unseeable.  Everything that is actually seen is a representation, whether it's a representation of what the object is like right now, or a representation of what we want the object to be like. Everything that goes "over the wire" in REST is a representation.

Nerds can readily understand what a 3D projection is

[See also that REST is the very opposite of "Distributed Objects", although some industry personalities continue to insist that REST is DO! (JJ, I'm talking to you.) Distributed Objects tries to bring about an illusion that remote objects are local, allowing you to grasp them using virtual reality gloves. REST tries to bring about a discipline that says even local objects like the one inside the space capsule should be treated as remote, and we shouldn't try to grasp them directly, only deal with them indirectly through representations. Distributed Objects works well when it does and fails horribly when it doesn't. REST always works.]

Hopefully, this should set to REST some of the confusion around resources and representations.

Monday, May 20, 2013

SOA As Dependency-Oriented Thinking - One Diagram That Explains It All


I've been talking and writing about SOA as "Dependency-Oriented Thinking" for a while now, and have even conducted a couple of workshops on the theme. The feedback after the interactive sessions has always been positive, and it surprises me that such a simple concept is not already widely known.

I'm in the process of splitting (slicing?) my white paper "Slicing the Gordian Knot of SOA Governance" into two, one dealing with SOA ("Dependency-Oriented Thinking" or DOT) and the other dealing with Governance and Management ("Dependency-Oriented Governance and Management" or DOGMa).

Partway through the DOT document, I realised that one of the diagrams in it explains the entire approach at a glance.

Here it is (updated after the release of the Dependency-Oriented Thinking documents). Click to expand.



This is of course the BAIT model of an organisation, with a specific focus on Dependencies. BAIT refers to Business, Application, Information (Data) and Technology, the four "layers" through which we can trace the journey of application logic from business intent to implementation.

[Basic definitions: SOA is the science of analysing and managing dependencies between systems, and "managing dependencies" means eliminating needless dependencies and formalising legitimate dependencies into readily-understood contracts.]

At the Business layer, the focus on dependencies forces us to rationalise processes and make them leaner. Business processes need to be traceable back to the organisation's vision (its idea of Utopia), its mission (its strategy to bring about that Utopia) and the broad functions it needs to have in place to execute those strategies (Product Management, Engineering, Marketing, Sales, etc.). Within each function, there will need to be a set of processes, each made up of process steps. Here is where potential reuse of business logic is first identified.

At the end of this phase, we know the basic process steps (operations) required, and how to string them together into processes that run the business. But we can't just have these operations floating around in an organisational "soup". We need to organise them better.

At the Application layer, we try to group operations. Note that the Business Layer has already defined the run-time grouping of operations into Processes. At the application layer, we need to group them more statically. Which operations belong together and which do not? That's the dependency question that needs to be asked at this layer.

The answer though, is to be found only in the Information layer below, because operations only "belong" together if they share a data model. As it turns out, there are two groups of data models, those on the "outside" and those on the "inside". The data models on the "inside" of any system are also known as "domain data models", and these are never visible to other systems. In contrast, a data model on the "outside" of a system, known as an "interface data model", is always exposed and shared with other systems. In SOA, data on the outside is at least an order of magnitude more important than data on the inside because it impacts the integration of systems with one another, whereas data on the inside is only seen by a single system.

Version churn is a potential problem at the Information Layer, because changing business requirements could result in changed interfaces. With a well-designed type hierarchy that only exposes generic super-types, the interface data model can remain stable even as newer implementations pop up to handle specialised sub-types. Most changes to interfaces are then compatible with older clients, and incompatible changes are minimised.

Once we have our data models, we can go back up one level to the Application layer and start to group our operations in two different ways, depending on whether they share an internal (domain) data model or an interface data model. Operations sharing a domain data model form Products. Operations sharing an interface data model form Services. (And that's where the "Service" in "Service-Oriented Architecture" comes from.) Products are "black boxes" meant to be used as standalone applications. Services are "glass boxes" with no other function than to loosely bundle together related operations.

Finally, we have to implement our Services. The description and deployment bundles that are used need not correspond one-to-one with the Services themselves. They should in general be smaller, so that the volatility (rate of change) of any single operation does not needlessly impact others merely because they share a description bundle (e.g., a WSDL file) or a deployment bundle (e.g., a JAR file). If we also pay attention to the right types of components to use to host logic, transform and adapt data, and coordinate logic, we will be implementing business intent in the most efficient and agile way possible.

This, in fact, is all there is to SOA. This is Dependency-Oriented Thinking in practice.

The white paper will explain all this in much greater detail and with examples, but this post is an early appetiser. [Update: you can fall to and help yourselves, since the main course is now ready. Grab the two documents below.]


Dependency-Oriented Thinking: Volume 1 - Analysis and Design
Dependency-Oriented Thinking: Volume 2 - Governance and Management

Thursday, May 16, 2013

50 Data Principles For Loosely-Coupled Identity Management


It's been a while since our eBook on Loosely-Coupled IAM (Identity and Access Management) came out. In it, my co-author Umesh Rajbhandari and I had described a radically simpler and more elegant architecture for a corporate identity management system, an architecture we called LIMA (Lightweight/Low-cost/Loosely-coupled Identity Management Architecture).

Looking at developments since then, it looks like that book isn't going to be my last word on the subject.

IAM has quickly moved from within the confines of a corporate firewall to encompass players over the web. New technology standards have emerged that are in general more lightweight and scalable than anything the corporation has seen before. The "cloud" has infected IAM like everything else, and it appears that IAM in the age of the cloud is a completely different beast.

And yet, some things have remained the same.

I saw this for myself when reviewing the SCIM specification. This is a provisioning API that is meant to work across generic domains, not just "on the cloud". It's part of the OAuth 2.0 family of specifications, and OAuth 2.0 is an excellent, layerable protocol that can be applied as a cross-cutting concern to protect other APIs. SCIM too is OAuth 2.0-protected, but that's probably where the elegance ends.

The biggest problem with SCIM is its clumsy data model, which then impacts the intuitiveness and friendliness of its API. I critiqued SCIM on InfoQ, and in response to a "put up or shut up" challenge from some of the members of the SCIM working group, I began working on an Internet Draft to propose a new distributed computing protocol, no less. That's a separate piece of work that should see the light of day in a couple of months.

In the meantime, I began to work on IAM at another organisation, a telco this time. My experiences with IAM at a bank, an insurance company and then a telco, had by then given me a much better understanding of Identity as a concept, and I began to see that many pervasive ideas about Identity were either limiting or just plain wrong. Funnily enough, most of these poor ideas had more to do with the Identity data model than with technology. I also observed that practitioners tended to focus more on the "sexy" technology bits of IAM and less on the "boring" data bits, and that explained to me, very convincingly, why systems were so clumsy.

I then consciously began to set down some data-specific tips and recommendations that I saw being ignored or violated. The irony is that it doesn't cost much to follow these tips. All it costs is a change of mindset, but perhaps that's too high a price to pay for many! In dollar terms, the business benefits of IAM can be had for a song. Expensive technology is simply not required.

So that's the lesson I learnt once more, and the lesson I want to share. No matter what changes we think are occurring in technology, the fundamental concepts of Identity have not changed. The data model underlying Identity has not changed. Collectively, we have a very poor understanding of this data model and how we need to design our systems to work with this data model.

So here are 50 data principles for you, the architect of your organisation's Identity Management solution. I hope these will be useful.

The presentation on Slideshare:
http://slidesha.re/14uo3YY

The document hosted on mesfichiers.org:
http://atarj9.mesfichiers.org/en/

Friday, May 03, 2013

"What Are The Drawbacks Of REST?"


It seems the season for me to post comments in response to provocative topics on LinkedIn. 

A few days ago, Pradeep Bhat posed the question, "What Are The Drawbacks Of REST?" on the REST Architects LinkedIn Group. As before, I had an opinion on this too, which I reproduce below:

"I wouldn't say REST has "drawbacks" as such. It does what it says on the tin, and does that very well. But remember that the only implementation of the REST architecture uses the HTTP protocol. We can surely think of a future RESTian implementation that uses another transport protocol, and that is where some improvements could be made. 

1. HTTP is a synchronous, request/response protocol. This means the protocol does not inherently support server-initiated notifications (peer-to-peer), which are often required. That's why callbacks in RESTian applications require the use of application-level design patterns like Webhooks. Now that we have a bidirectional transport protocol in the form of WebSockets, perhaps the industry should be looking at layering a new application protocol on top of it that follows RESTian principles. 

2. The much-reviled WS-* suite of protocols has at least one very elegant feature. These are all end-to-end protocols layered on top of the core SOAP+WS-Addressing "messaging" capability. They resemble the TCP stack in that the basic protocol is IP, which only knows how to route packets. SOAP messages with WS-Addressing headers are analogous to IP packets. In the TCP world, end-to-end reliability is implemented through TCP over IP, and the SOAP world's analogy is WS-ReliableMessaging headers in SOAP messages. In the TCP stack, IPSec is the end-to-end security protocol (not TLS, which is point-to-point). The SOAP equivalent is WS-SecureConversation. Such Qualities of Service (QoS - reliability, security, transactions) can be specified by policy declaration (WS-PolicyFramework) and SOAP endpoints can apply them like an "aspect" to regular SOAP traffic. 

The REST world has nothing like this. Yes, an argument could be made that idempotence at the application level is a better form of reliability than automated timeouts and retries at the transport level. Similarly, we could argue that an application-level Try-Confirm/Cancel pattern is better than distributed transactions. But what remains is security. WS-SecureConversation with WS-Security is routable, unlike SSL/TLS, which is the only security mechanism in REST. With WS-Sec*, messages can also be partially encrypted, leaving some content in the clear to aid in content-based routing or switching. This is something REST does not have an elegant equivalent for. SSL is point-to-point, cannot be inspected by proxies and violates RESTian principles. It is just tolerated. 

The reason behind REST's inability to support such QoS in general is that all of these require *conversation state* to be maintained. Statefulness has known drawbacks (i.e., impacts to scalability and failure recovery), but with the advent of NoSQL datastores like Redis that claim constant-time, i.e., O(1), performance, it may be possible to delegate conversation state from memory to this datastore and thereby support shared sessions for multiple nodes for the purposes of QoS alone. I don't mean to use this for application-level session objects like shopping carts. If nodes can routinely use shared NoSQL datastores to maintain sessions, then the argument against statefulness weakens, and Qualities of Service can be more readily supported *as part of the protocol*. In RESTian terms, we can have a "uniform interface" for QoS.

3. While REST postulates a "limited" set of verbs, HTTP's verbs are too few! 

POST (add to a resource collection), PUT (replace in toto), PATCH (partially update), DELETE (remove from accessibility) and GET. These are actually not sufficient and they are frequently overloaded, resulting in ambiguity. 

I would postulate a more finely-defined set of verbs if defining a RESTian application protocol over a new peer-to-peer transport: 

INCLUDE (add to a resource collection and return a server-determined URI), PLACE (add to a resource collection with client-specified URI), REPLACE (in toto), FORCE (PLACE or REPLACE), AMEND (partial update, a container verb specifying one or more other verbs to specify operations on a resource subset), MERGE (populate parts of the resource with the supplied representation), RETIRE (a better word than DELETE) and SOLICIT (a GET replacement that is also a container verb, to tell the responding peer what to do to the initiator's own resource(s), because this is a peer-to-peer world now). Think of GET as a SOLICIT-POST to understand the peer-to-peer model. We also need a verb of last resort, a catch-all verb, APPLY, which caters to conditions not covered by any of the others. 

4. HTTP combines application-level and transport-level status codes (e.g., 304 Not Modified and 400 Bad Request vs 407 Proxy Authentication Required and 502 Bad Gateway). The next implementation of REST on another transport should design for a cleaner separation between the application protocol and the transport protocol. HTTP does double-duty and the results are often a trifle inelegant. 

So that's what I think could be done as an improvement to REST-over-HTTP. Apply the principles (which are very good) to a more capable peer-to-peer transport protocol, and design the combination more elegantly."

I'm in the process of writing an Internet Draft for a new application protocol that can be bound to any transport (Pub/Sub, Asynchronous Point-to-Point or Synchronous Request/Response). The protocol is part of a new distributed computing architecture that I call ROMA (Resource/Representation-Oriented Messaging Architecture) and covers not just the data model and message API but also higher levels (QoS, description and process coordination). It's been 5 years in the making and has reached 170 pages so far. It may take another couple of months to get to a reviewable state. Stay tuned.