Saturday, May 12, 2012

Tight Coupling in the TCP/IP Stack!


Scandal and embarrassment!

The TCP/IP stack of Internet protocols, the poster child for a layered architecture with well-defined responsibilities and interfaces that abstract out needless dependencies, has a dirty little secret that I just stumbled upon since I now work for a telco. But before I tell you what it is, a quick recap of how the technology works.

IP stands for Internet Protocol. Every device has an IP address. [IP version 4 has been the most common so far, and IPv4 addresses look like this: 192.168.1.1 (or in hex, C0.A8.01.01). IP version 6 (IPv6)  addresses look like this: 3ffe:1900:4545:3:200:f8ff:fe21:67cf.]

The way the Internet works is by routing packets of information, hop-by-hop, from a source to a destination. Each node along the way knows, by looking at the destination IP address in a packet, how to forward that packet so it gets one step closer to its destination. So all that an IP network really has are routing smarts. It's the destination's IP address in a message that holds all the information required for it to reach its intended audience.

That's great when the sources and destinations of messages are fixed in location. They have a certain IP address assigned to them when they start up, and from then on, that IP address typically doesn't change until they next start up.

Mobile data devices (which include 3G mobile phones and later devices that use the packet-switched data network) have introduced a problem. Their IP addresses need to keep changing because they connect to different nodes (or cells, or towers) as they move, and it would play havoc with routing if they carried their original IP addresses around when connecting to new nodes. So fine, the technology allows for their IP addresses to change dynamically. However, the logical data connections that the devices establish need to remain for the duration of the session. There could be a download going on, for example, and an interruption of the connection will abort the download. Innovations such as "fast mobile IP" were introduced to mitigate the visible effects of the problem, but did not address its root cause.

The root cause lies in a rather ugly fact about IP addresses. An IP address confuses a device's identity with its location. A device's location keeps changing as it moves, but its identity does not change. A location is important to know where packets are to be delivered. But logical concepts like sessions need to be tied to a device's identity, not to its location. These are two different concepts, but a single mechanism (the IP address) has been chosen to implement both of them. As long as the location and identity did not independently change, the design flaw remained hidden. Now with data-enabled mobile devices, device location and device identity show themselves very clearly as two different things, and the conceptual limitation of the IP address has therefore been exposed.

That's the rationale behind the new protocol specification called HIP (Host Identity Protocol). HIP is meant to sit between TCP and IP. Normally, a TCP-level domain name is resolved by DNS to an IP address. A whole generation of IT professionals has come of age with this principle internalised as an axiom of How Things Work. HIP is a Copernicus or a Galileo challenging an established view. The Sun does not go round the Earth, after all. It's the Earth that goes round the Sun! That's going to take some getting used to. For a networking professional or a web architect, discovering that the venerable TCP/IP stack should actually be the TCP/HIP/IP stack is a bit like discovering that they're an adopted child. But however painful the realisation and readjustment, it's better that the truth be known.

Under the new proposal, a TCP-level domain name needs to be resolved by DNS to a logical HIP name, which then gets further resolved to an IP address! Now, if a device is moving, its IP address can keep changing, but its HIP name will remain the same. Therefore TCP connections need not be torn down and re-established. Sessions need not be dropped and re-created.

RFC 4423 (HIP Architecture) says:
In the current Internet, the transport layers are coupled to the IP addresses. Neither can evolve separately from the other.
[...]
There are three critical deficiencies with the current namespaces. First, dynamic readdressing cannot be directly managed. Second, anonymity is not provided in a consistent, trustable manner. Finally, authentication for systems and datagrams is not provided. All of these deficiencies arise because computing platforms are not well named with the current namespaces.
It goes on to say:
An independent namespace for computing platforms could be used in end-to-end operations independent of the evolution of the internetworking layer and across the many internetworking layers. This could support rapid readdressing of the internetworking layer because of mobility, rehoming, or renumbering.
Amazing, isn't it? We've been nursing a tightly-coupled serpent in our collective bosom for over 3 decades, and we didn't even know...

It's going to take a while for HIP to become part of the Internet ecosystem (if it ever will!) The power of entrenched ways of thinking could prove too powerful to allow a much-needed rationalisation.

The lesson for me personally is that if we don't architect a system right, we will live with its negative implications for a long, long time. Even the founding fathers of the Internet, geniuses as they were, were not perfect, and we can clearly see in hindsight how a conceptual blunder (a conflation of location with identity) has impacted us.

I do believe though, that even the current HIP proposal is making a blunder of its own by confusing identifiers with identity credentials. RFC 4423 says:

In theory, any name that can claim to be 'statistically globally unique' may serve as a Host Identifier. However, in the authors' opinion, a public key of a 'public key pair' makes the best Host Identifier. As will be specified in the Host Identity Protocol specification, a public-key-based HI can authenticate the HIP packets and protect them from man-in-the-middle attacks. 
From my own work on Identity Management, I have come to realise that multiple sets of credentials can be used to arrive at, or establish, an identity. The establishment of an identity within a given context requires an identifier. This identifier may be the credentials themselves, or something else. It's important to realise that the "may be" should not be taken as a "must be". For the purpose of security, the authors of the HIP specification are proposing that verifiable credentials be used as the identifier in all situations. I fear that will result in a similar problem later on when the requirements of authentication and identity establishment diverge in some context. I'll write to the committee explaining my concerns.

1 comment:

Unknown said...

There is no problem that can't be solved by another layer of indirection...