Monday, April 29, 2013

JEM (JSON with Embedded Metadata) - A Simpler Alternative to JSON Schema?


I've long been a supporter of the JSON Schema initiative, and I was also happy to see developments like Orderly, which offered a simpler and less verbose format than JSON Schema. But Orderly introduces its own format, which necessitates conversion to JSON Schema before it can be used. Both approaches are unsatisfactory in their own way. One is too verbose and the other needs translation.

All of this made me wonder if we aren't approaching the problem the wrong way. JSON Schema is a conscious effort to replicate in the JSON world the descriptive capability that XML Schema brings to XML. But is this the best way to go about it?

I would like descriptive metadata about documents to be capable of being embedded inside the document itself, rather like annotations in Java programs. Indeed, this metadata should be capable of forming a "scaffold" around the data that then allows the data itself to be stripped out, leaving behind a template or schema for other data instances.

So I'm proposing something that I think is a whole lot simpler. It does require one fundamental naming convention to be followed, and that is this:

Any attribute name that begins with an underscore is metadata. Everything else is data.

Let's take this simple JSON document:


We can embed metadata about this document in two different ways. Click diagram to expand.


I'm calling the first style "Metadata Markup", where the data elements of the JSON document retain their primacy, and the metadata around them is secondary and serves to add more detail to these data elements. One can readily see that "_value" is now just one of the possible attributes of an element, and many more such attributes can therefore be added at will.

I call the second style "Metadata Description", where the primary elements are metadata, and any data elements (whether keys or values) are modelled as the values of metadata elements. Note that describing a document as an array (a nested array in the general case) rather than as a dictionary (or nested dictionary) of elements allows the default order of the elements to be retained. This is quite useful when this format is used to publish data for human consumption.

The first style, Metadata Markup, is more suitable for document instances, because a lot of detailed meta-information can accompany a document and can be hidden or stripped out at will. It is easy for a recipient to distinguish data from metadata because of the leading underscore naming convention. There is no need to pre-negotiate a dictionary of metadata elements. (Click to expand.)



The second style, Metadata Description, is more suitable for schemas, because in this format, all elements pertaining to instance data (both keys and values) are just values. If only the values representing keys are retained, we get a "scaffold" structure describing the document, and more metadata elements representing constraints can be added, turning it into a schema definition. (Click to expand.)


Obviously, this system will not work for everyone. I'm sure there are JSON documents out there that have underscores for regular data (HAL?), so adoption of this convention won't be feasible in such domains. But if a significant subset of the JSON-using crowd finds value in this approach, they're more than welcome to adopt it.

1 comment:

Unknown said...

I love the idea of thinking outside of the box, like you have done here. I also love the idea of working to simplify JSON Schema to discover the essence of what it is trying to do and then express it in a more simple manner. I thank you for this article -- I haven't (so far) seen others attempt this.

In terms of the implementation, my initial reaction is that, by using underscores to indicate meta data, that it is impossible to use the specification to describe or validate itself, because I do not see an easy, consistent solution to how to describe an instance which itself has underscores in it. I may be wrong in this.

But, I'll keep thinking along these lines. Thank you!