Canonical Entity representation

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Crell's picture

One place that the Web Services and Context Core Initiative (WSCCI) and Configuration Management Initiative (CMI) overlap is the need to have a standard, canonical format to represent nodes and other entities in non-PHP and non-SQL format. There are a number of places where that is useful:

  1. Including entities in exported configuration, or in configuration files.
  2. Taking a content snapshot in some form other than an SQL dump file (which, you know, kinda sucks for mose uses).
  3. Transferring a node from one site to another for content sharing purposes.
  4. Aggregating content from many sites together for improved searching and cataloging.
  5. Exposing Drupal content to other non-Drupal systems. This is made easier by using non-Drupal-specific formats.

These are all problem spaces that exist in Drupal 7 now, and did back in Drupal 6, too. Various one-off solutions exist. For Drupal 8 we should have a better universal answer to this question, and be able to build common tools to support it. Those can, and should, also influence our API design to help improve external integration.

Existing solutions

There are four general approaches that I am aware of.

  1. Serialized PHP: The simplest of course is to simply dump a node to PHP code using var_export(), or just use PHP's serialze() function on it. While that does result in a string representation of a node (or other entity) that can be saved to disk or sent to another site, it is a generally poor format. It is PHP-specific, Drupal-specific, serialize() is easily corrupted, and it does not do anything to help with IDs that differ between sites or references to other entities. In short, it's not worth our consideration.
  2. drupal_execute() arrays: This is the approach taken by the Deploy module in Drupal 6. The basic idea is that nodes in Drupal 6 rely too heavily on the Form API for, well, everything, so saving a node and not going through a form save operation would lose half the useful data. That is fortunately not the case in Drupal 7 anymore thanks to improvements in the Field API, and of course FAPI arrays are one of the least portable formats we could come up with so it fails goals 1, 2, and 5.
  3. json_encode(): Deploy in Drupal 7 drops drupal_execute() in favor of running json_encode() on a node object to send between sites via Services.module. The receiving end then simply runs json_decode() and node_save() (well, a custom entity_save() routine since core has no entity_save()). That is much cleaner and more portable. JSON is a well-known standard format that is dead-simple to parse in PHP, it's understood by a wide variety of systems, and can be included in either JSON-based or XML-based configuration files. (With some escaping it's just CDATA.) However, blindly dumping a node object to JSON without thinking about its structure is not useful for external integration, because the structure is too unpredictable for anything but custom parsing.
  4. Atom/XML: On a previous client project in Drupal 6, I worked on a team that produced the Views Atom and Feeds Atom modules. The basic idea was to serialize nodes to a custom XML format, and then use the Atom format (IETF 4287) to wrap them for transport between sites. Atom turned out to be an excellent choice, as Atom supports multiple payload formats, including non-XML; it supports encryption (although we did not use it); it supports UUIDs for synchronizing objects to avoid content duplication; and it supports PubsubHubbub, an Atom extention that makes push-based updates possible. (And yes, there's a module for that.) It worked well, and at Palantir we're now starting work on a Drupal 7 project based on the same tools. (Expect Drupal 7 versions of that full suite soon.)

I spoke with Deploy maintainer Dick Olsson (dixon_) earlier today, and we both agreed that we really ought to standardize on one format that Deploy can use now in D7, that we can use for clients like the one I'm working with now, and for a Drupal 8 standard. There's plenty of good reasons for it, and no good reasons to not standardize, and we can standardize now, even without Drupal 8 being anywhere close to a release.

We also agreed that Atom was probably the best wrapper format, since it contains a number of features (as above) that are useful when needed and can be skipped when not. It's also a well-recognized standard, which in most cases is superior to some Drupal-proprietary format.

So, let's do. And let's do while I have a client that can pay for at least some of the work to help build a common library for it. :-)

Requirements

A canonical serialized entity representation should:

  1. Be at least somewhat human readable, or at least is if you pretty-print the whitespace.
  2. Be reasonably straightforward to parse in PHP.
  3. Be parsable by non-Drupal, non-PHP systems as well.
  4. Have a consistent, regular, predictable structure.
  5. Be supportable by any entity automatically by virtue of being an entity.
  6. Not try to handle everything that an entity might have on it, only those things that are fully supported. That is, entities right now have basic properties that are defined by the entity type, and they have fields. It's been common for modules to also throw any random stuff they want onto the bare object structure at various times. Those are very specifically not supported, as that makes the structure too unpredictable.
  7. The following workflow must work, and result in no change to an entity (this is not an API example):

    <?php
    $entity
    = entity_load($type, $id);
    $string = entity_serialize($entity);
    $entity = entity_deserialize($string);
    entity_save($entity); // Once this API call exists.
    ?>
  8. Be revision-aware.

Two options immediately spring to mind. One is to reuse the XML format from the Views Atom module. (Note: The sample there is namespaced, which makes it uglier to read, but the actual tags are fairly simple; please pardon the namespacing.) That has the advantage of already existing, and we can rip parsing logic out of that module into a standalone library. We could also tweak it as needed before making it a canonical format.

The other is to use JSON, but something more robust than just throwing an object into json_enocde(). For one thing, in Drupal 8 entities are classed objects and will have non-public properties, so that won't even work in the first place as those non-public properties would get lost. For another, we want a more regular and non-Drupal-specific structure than that would give us.

(Yes, the XML vs. JSON wars have already been fought. That CMI is going XML at this point is a mark in XML's favor. Please do not simply repeat anything already said in that thread. Please.)

I will also offer that there is no intrinsic reason we cannot provide both an XML and JSON canonical form, as long as they are reasonably related. It does not have to be either/or.

References and Dependencies

Here of course comes the ugly part. Drupal entities routinely contain references to other entities. Nodereference, Userreference, Entity Reference, File fields, OG group membership, the author property of nodes... the list goes on. Of course, those references are generally entity IDs, which means totally and utterly useless when a node is serialized and used anywhere except right back on the same site. We need some alternate way to represent them.

I encourage everyone to read these two articles on REST before commenting on this section. They contain very valid points regarding how resources should reference each other in a REST/hypermedia form. There's some discussion of them in this earlier thread, too. Remember, the receiving system may not be a Drupal site!

Just throwing UUIDs on everything is only a partial answer. Having some sort of /entity/$type/$uuid path that we can always rely on could be a part of the solution, but perhaps not. I'm not sure here yet.

The other question is files. Not only do we need to translate fids into something useful, using a Drupal stream wrapper URL may not be useful. Sometimes it will be; actually in the client project I have right now we do want to send over Drupal stream wrapper URLs, because we have a common file server. However, that will not always be the case. So what do we want to do here?

Wrappers and control

For both an XML format and a JSON format, the Atom spec actually provides a very nice envelope. It's a widely understood format, extensible, supports both push and pull based updates, has an extension that can push deletion notifications, and scales well once you introduce an external PuSH hub server.

Naturally not every use case will need a wrapper; if we're just saving out a serialized entity to disk, then Atom doesn't have any real purpose. For a web service wrapper, though, we could do far worse.

Discuss. :-)

Comments

Another issue : text input

yched's picture

Another issue : text input formats. What to export ?
- [raw input + text format name] means nothing outside drupal land (or when exported to a drupal site with different text formats
- exporting the check_format()'ted string means no drupal re-importability.

In case its useful, there is

xtfer's picture

In case its useful, there is a Drupal 6 UUID URI resolver, though I notice its currently without a release... http://drupal.org/project/uuid_resolver

restws project

yched's picture

Another module trying to address exactly that in D7 : RESTful Web Services (relies on Entity API module). We should probably ping fago and klausi.

RESTWS

klausi's picture

RESTWS does not use any special format. It was designed to support many different formats that are driven by FormatControllers. However, it heavily relies on the Entity property API for retrieving and naming properties. We took special care of properties that are references to other entities by providing the ID, a fully qualified URL and the resource type, to comply with the REST principles, but that's it. So I think the actual format does not really matter; how the properties are named and how they are retrieved is important.

The suggested XML Atom format from above looks fine, I just can see one inconsistency: "title" should be listed inside the "properties".

restws module and my thoughts

fago's picture

RestWS handles entity URIs already in a format specific manner, so proper RESTful references are constructed. It generates /$entity_type/$id URLs for that, whereas mapping to different default URIs like taxonomy/term/$id is not yet solved (redirects maybe).

For more information about the restws module, please see this post. The post also shows the used representation formats.

In short, it's working based upon entity property info of the entity API module. Based upon that information we know about entity references regardless from the storage back-end and can take it into account. Generally, I think basing the representation only on known properties is a good idea, such that interim-module added stuff is kept out.

See http://drupal.org/node/1346220 for the d8 entity property info issue.

For both an XML format and a JSON format, the Atom spec actually provides a very nice envelope. It's a widely understood format, extensible, supports both push and pull based updates, has an extension that can push deletion notifications, and scales well once you introduce an external PuSH hub server.

I fully agree. Having Atom would be very nice, in particular in conjunction with push. I'd prefer having a more light-weight implementation not relying on views additionally though. Implementing atom means we should take over the atom vocabulary though what might be unwanted in a straight-forward conversion where e.g. the comment subject remains a subject and is not converted to atom's title.
So maybe there should be both, a straight-forward xml/json conversion and an atom xml/json conversion?

@json/xml
I think that we should provide both, JSON and XML representations. So developers can pick what they are most comfortable with.

- [raw input + text format name] means nothing outside drupal land (or when exported to a drupal site with different text formats
- exporting the check_format()'ted string means no drupal re-importability.

That one is tough. Actually, the desired export depends on your use-case (external access vs deploy to another drupal instance).
Well, the raw input variant would have to include a reference on the text-format used. Still, that's probably not a big help for anyone. Generally, I think it's a good idea to process the text formats, but don't sanitize it (web services do not sanitize data). As that might not be desirable in some case (content staging, entity unserialize/serialize) we need to be able to opt out from that though. Maybe via separate output formats?

Envelope not required

Crell's picture

The Atom envelope is not required for all use cases. We absolutely should be able to get just the XML/JSON version of an entity as a string and do what we want with it.

I'm more suggesting that if we need any sort of flow control, syndication tracking, etc. (for deploy, some services calls, site to site syndication, external feeds, content notification/push, etc.) that we standardize on Atom as our go-to wrapper.

I also agree entirely that if we go this route we want to have stand-alone libraries for encoding/decoding an entity, and separately for building a basic Atom feed out of them. Whether core exposes an actual feed URI or we leave that to contrib/views I don't know, but that logic should be kept as stand-alone as possible.

In fact, if we can find and adopt an Atom parser/generator library rather than writing our own, so much the better.

But atom isn't just an

fago's picture

But atom isn't just an envelope, isn't it? Once you are using it, you have to make use of its required elements, i.e. atom:title. Also, once we are using atom we should make use of its opt. vocabulary too.

I'm more suggesting that if we need any sort of flow control, syndication tracking, etc. (for deploy, some services calls, site to site syndication, external feeds, content notification/push, etc.) that we standardize on Atom as our go-to wrapper.

Sounds good! It would be great to see us also implementing the Atom Publishing Protocol, as it'd standardize any restful interface.

Machine names for formats?

pwolanin's picture

For Drupal 8 at least, I hope formats might have a machine name (or UUID) instead of just an integer ID?

Perhaps formats themselves should be entities which are exportable? Of course, they would still have to reference code libraries or some other bigger logic.

Would it make sense to have both the raw and rendered content in the export? For example, having the rendered content means the exported entity could be used to populate a search index.

Output formats cannot be

DjebbZ's picture

Output formats cannot be exported, since they're not data but algorithms...

About Atom parser/generator : SimplePie can parse atom and rss feeds (it's in the Feeds module), but it's going under rewriting and needs help to port it to full PHP5 OOP. The PHP Universal Feed Generator, found in the top Google results and in several answers in StackOverflow, seems a good tool to generate valid Atom feeds. I've read the code quickly, it's OOP, simple and straightforward.

About Url's

DjebbZ's picture

The REST resources Url's could be constructed based on the html output url + '/$format_name', e.g. node/[nid]/json or node/[nid]/xml, taxonomy/term/[tid]/xml, etc. So that services discovery is made even easier.

Well, then we're well out of

Hugo Wetterberg's picture

Well, then we're well out of the bounds of the REST philosophy. The preferred way is to send accept-headers or as a fallback: file name extensions.

Got you Hugo. Another thing

DjebbZ's picture

Got you Hugo.

Another thing we said yesterday during the WSCCI meeting in IRC is that our API for generating such canonical representations of entities should work with any entities, not only core entities, so that custom created entities can be represented without any additional code or implementation by the custom entity creator. If it's needed, it's never gonna be done.

Account for language

stevector's picture

The Views Atom format used in Drupal 6 likely could not be used verbatim as it does not account for the changes in field-level language handling made in Drupal 7. I don't know if these or any other i18n changes coming in D8 make Atom, JSON or any other format preferable.

Have we looked at the Open Data Protocol?

cpliakas's picture

The Open Data Protocol seems to be relevant here. The spec supports two formats, the XML-based AtomPub format and the JSON format, which seems to be inline with the OP. In addition, there are existing PHP and JavaScript libraries available for download. I haven't worked with the libraries nor do I know what license they fall under, but their mere existence suggests we should at least take a good look at it. Ironically the PHP library is sponsored by Microsoft, which does raise an eyebrow. In addition seems there is a heavy MS bias throughout the site, but there doesn't seem to be an MS bias in the protocol itself.

Anyways, just wanted to throw it out there.
~Chris

Nice!

Crell's picture

I spent a few hours yesterday reading the OData documentation, and so far I really like what I'm seeing.

tl;dr version for people who haven't read the site (although I suggest you do so): It's really just a formalization and slight extension of the Atom spec, which is already pretty darned good, and a mechanism for defining the payload tags you use. There's a default standard for simple data, but that probably won't work for us. Something inspired by it may. Also, it has essentially a JSON alternate version of the same thing although I've not looked into that as much. And then to that it adds an index format so that a site can list what Atom feeds it has available.

Another advantage here is that we already have Atom-generating code available in Drupal (views_atom and the atom module, which I suspect will be merging in the next few months), and Atom consuming code (the OData module, which I just discovered yesterday).

The site also has freely available libraries that we could download and use, in PHP and various other languages. There's just one problem: It's Apache 2 licensed. That means it's only compatible with GPLv3, not with GPLv2, which Drupal uses. Since I think it's unlikely that we'd switch to GPLv3 for Drupal 8, that means we could not bundle their code. :-( (I'm also not entirely sure about their library API yet; I've only just glanced at it.)

Thoughts? Do we want to build OData support into Drupal as our first-class export/serialization mechanism?

OData looks indeed very interesting

dixon_'s picture

OData looks indeed very interesting, it seems like something we definitely should look into.

I will have a deeper look at the protocol, in context of this discussion as well as being the maintainer for the Deploy module, and come back with some feedback, as soon as possible (tm).

While OData looks fine, in

xtfer's picture

While OData looks fine, in principle, I can see two problems with it. Firstly, it is essentially a Microsoft project, and as such is covered by the Microsoft Open Specification Promise, which has its own quirks and limitations. Secondly, as a data format, it is not widely used as yet, and was basically built as a serialisation mechanism for Microsoft products like Dynamics and Sharepoint. For those two reasons alone, and despite is obvious positive qualities, I would not want to make it the basis of our export mechanism.

Somewhat thinking out loud here

andremolnar's picture

Further reflection re: the format wars. I agree with the statement:

I will also offer that there is no intrinsic reason we cannot provide both an XML and JSON canonical form, as long as they are reasonably related. It does not have to be either/or.

Its almost implicit, but we should be explicit. If the contract is done right it should be easily transformed from any to any as long as what is represented doesn't change. Put another way: the data (and maybe the Interface) is canon, not the format. There is no reason export or import tools or configuration storage or object creation or whatever you can imagine couldn't all be pluggable systems - each potentially consuming/producing a different format if that's what the developer's heart desires.

Arguably, the fewer the better, but the sky is the limit.

As for references, I think you've hit the nail on the head. References need to be URIs to the canonical information.
IF you're exporting you have a choice to make - either bundle up what is returned by the reference URI OR just include the URI and let the consumer decide what to do with it. But, that's a good choice to have. In each case you have the right data (right now) or the right data forever as long as it lives, but you have the right data.

CMIS

dixon_'s picture

Me, heyrocker and skwashd briefly talked about this on IRC today, and the CMIS came up as a potential candidate. None of us has fully wrapped our heads around the format or specification yet. It might not even make sense. But I'm just adding it in here, as a note:

Specification: http://docs.oasis-open.org/cmis/CMIS/v1.0/os/cmis-spec-v1.0.html
Existing Drupal module: http://drupal.org/project/cmis

My research

dixon_'s picture

Based on my completely (un)biased research that I conducted myself, I would recommend the OData specification as the way we should represent canonical entities in Drupal 8.

In my research I compared OData and CMIS. Those are the two biggest specifications, it seems, related to CMS and ECM systems.

Continue to read for my conclusions and what action points I will take to try this in Drupal 7.


CMIS

Specification: http://docs.oasis-open.org/cmis/CMIS/v1.0/os/cmis-spec-v1.0.html
Example of an entity: Atom format

CMIS is a very common specification, but mostly related to ECM systems, like Alfresco and Sharepoint, that (more or less) assumes documents in a hierarchical folder structure. The specification does not cover "programming interface objects" or other "administrative entities" like user profiles(!).

Here are some outstanding quotes from the specification:

CMIS provides an interface for an application to access a Repository. [...] In accordance with the CMIS objectives, this data model does not cover all the concepts that a full-function ECM repository typically supports. Specifically, transient entities (such as programming interface objects), administrative entities (such as user profiles) [...] are not included.
(from http://docs.oasis-open.org/cmis/CMIS/v1.0/os/cmis-spec-v1.0.html#_Toc243...)

There are four base types of objects: Document Objects, Folder Objects, Relationship Objects, and Policy Objects.
(from http://docs.oasis-open.org/cmis/CMIS/v1.0/os/cmis-spec-v1.0.html#_Toc234...)

Pros with CMIS

Cons with CMIS

  • Very extensive specification difficult to wrap one's head around
  • Doesn't map well to what we are looking for here, it focuses on documents and folders, not generic data entities/objects
  • Doesn't support the JSON format (although it might be on the way)


OData

Specification: http://www.odata.org/developers/protocols
Example of an entity: JSON format, and Atom format

OData is a common specification (but not as common as CMIS) that defines ways to represent abstract data models, or Entities that may be a part of (not assumes) Collections or feeds. OData also specifies some URI conventions for querying resources, that goes inline with REST principles (although I didn't look closer at it in this research).

Here are some outstanding quotes from the specification:

[...] enables the creation of HTTP-based data services, which allow resources identified using Uniform Resource Identifiers (URIs) and defined in an abstract data model, [..]
(from http://www.odata.org/developers/protocols/overview#Introduction)

OData supports two formats for representing the resources (Collections, Entries, Links, etc) it exposes: the XML-based AtomPub format and the JSON format.
(from http://www.odata.org/developers/protocols/json-format)

Pros with OData

  • Quite well supported specification
  • Lightweight specification and easy to understand
  • Supports both Atom and JSON
  • Maps well to what we are looking for here and Drupal's data model (i.e. Entities, Entity Types etc.)
  • The specification seems to empathize RESTful principles more (i.e. how references are handled etc.)
  • Has some ready-to-use SDK's (http://www.odata.org/developers/odata-sdk)

Cons with OData

  • It smells a bit Microsoft (it's published under Microsoft Open Specifications)
  • It's SDK is not GPL compatible (but for canonical entity representation it's not useful anyhow, also read Action points below)


Short conclusion

CMIS is very complex, doesn't do exactly what we want and only supports Atom. OData is lightweight, do what we want and supports both Atom and JSON with more empathize on RESTful principles.


Action points

As the maintainer of UUID and Deploy, I will:

  1. Add a Services resource type to the UUID module that represents entities according to the OData specification.
  2. References in that resource will be made as URIs according to the OData specification (http://example.com/[services api path]/entity/[uuid] and the entity type is specified in the payload)
  3. Implement support for this OData resource in Deploy module for its content deployments

I won't give a fixed timeline for these action points, but I'm on a project that will benefit and thus be able to provide me time to work on most (hopefully all) of this stuff.

OData SDK is Apache 2.0 Licensed

skwashd's picture

Just to clarify, the OData SDK is licensed under the terms of the Apache 2.0 License, which is GPLv3 compatible, but not GPLv2 compatible which means it can't ship with Drupal code.

Here is the Software Freedom Law Centre's review of the Microsoft Open Specification Promise.

Blargh

Crell's picture

Money quote from the writeup: "The OSP cannot be relied upon by GPL developers for their implementations not because its provisions conflict with GPL, but because it does not provide the freedom that the GPL requires"

So an implementation of an OSP-covered specin GPL PHP would be fine, but could get people downstream in trouble. Blargh.

That said, OData is 90% straight up Atom, which is not MS-proprietary in the first place, so I'm not sure to what degree it applies in practice.

Have I mentioned that I hate non-Free code/specs/companies?

thanks for the writeup! I

fago's picture

thanks for the writeup!

I agree with you that CMIS looks more complex. Odata seems to be a good fit, technically I really like it. In particular I like that the specification includes an optional place for service metadata (Service Metadata Document) too. I'm not so sure about OData's legal / microsoft concerns though. :/

References in that resource will be made as URIs according to the OData specification (http://example.com/[services api path]/entity/[uuid] and the entity type is specified in the payload)

For a real RESTful design I don't think there should be an "api path" involved. According to the REST principles each resource should have a unique URL/URI. Thus, the URL a user uses to view the resource in HTML shouldn't differ from the "api-url". That's important for references to work regardless of the "api endpoint" used.
However additionally, it's nice to have a simple uniform URL pattern, like http://site/entity_type/id. That doesn't fly with some existing URLs like taxonomy/term/id, but still we could redirect http://site/taxonomy_term/id style URLs to the right ones.

This seems like a great place

sethviebrock's picture

This seems like a great place to start focusing -- the intersection of these two initiatives.

Re: input/text format exporting, it would seem that exportable formatted textual data (to be consumed by another Drupal instance) might be required to conform to one of a set of default-Drupal-core-provided formats so that the formatting could be referenced between instances by machine name / ID, which could be ignored by everything non-Drupal that would consume this data. Any algorithmic deviation from default-Drupal-core formats could render the entity's textual formatting as "unexportable" (akin to the "overriden" state in the Features module), which a corresponding UI adaptation would have to handle. This is re: "Not try to handle everything that an entity might have on it, only those things that are fully supported." So, in this instance, if users really want to go beyond core defaults, they can, but it won't be exported by core (but someone will surely develop a little workaround module like http://drupal.org/project/input_formats, which is fine, but that shouldn't be in core.) Seems like this logic could apply in similar quandaries, if any arise.

sethviebrock.com

We seem to be talking about

xtfer's picture

We seem to be talking about two things in this discussion that might need to be teased apart:

  1. The way an Entity is represented when exported (its Format)
  2. How an Entity is moved from one place to another (its Transport)

Transport and Format should be somewhat independent of each other.

On OData...

Ive done some more digging into OData, and it seems a poor fit for Formating canonical entities, primarily because OData is not specifically "a way to represent data", but "a Web protocol for querying and updating data", with a tacked on entity model. As such, it has a lot of HTTP related stuff which would get in the way other stuff we are already looking at. It is a simple form of web service (no WSDL), and falls into the Transport bucket.

Additionally, to do it properly, and not just cherry pick its representation of entities, requires describing your data for OData services, which means implementing a restful web service using OData, which would likely conflict with anything implemented in the rest of WSCCI.

We'd also want to be sure that exporting an Entity in OData format could be done without the corresponding OData web service, and understand what the implications of that might be. Can it be consumed, for example?

Also, looking through the OData SDK for PHP, its very Microsoft-centric, and has some possible reported issues when not running on Windows.

Yes

Crell's picture

I agree, the format and transport are/should be separate questions. OData happens to try and address both of them.

The OData SDK is a non-starter anyway, due to it being Apache 2 licensed. I don't see Drupal 8 moving to GPLv3. Drupal 9 maybe. :-)

At this point I do think that Atom is the right transport format; it doesn't impose a payload format, however. OData defines a mechanism for defining a payload format, but I haven't checked yet to see if it's flexible enough to handle Drupal entities. It appears from my read through that the SDK would do some kind of decoding, but the spec itself is just the wire format.

How do other folks feel about the OData legal concerns? Too hot to touch?

At this point I'm confident that Atom is the right transport format. I'm just not sure of the right payload format. Worst case we just use the format defined by views_atom and call it a day.

Being web-servicey, OData's

xtfer's picture

Being web-servicey, OData's format must be described to OData for every entity type. That could probably be automated, but its still fiddly. I'd prefer Atom from that perspective, for transport.

Are there any schema's associated with the custom format? I notice in the example that its defined in RDF, but has no associated schema defined. Do we need something like...

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:drupal="http://drupal.org/ns/entity/">

Taxonomy Import/Export by XML has a similar problem, in that it uses drupal.org as the base URI for vocabularies exported as SKOS. Theres the possibility of namespace collision there too (though there is also prior art, at least for taxonomies).

Did anyone make a decision on

xtfer's picture

Did anyone make a decision on what this is going to be?

Anyone taken a look at HAL?

tnightingale's picture

I ran across this http://stateless.co/hal_specification.html in my twitter stream this morning. It is a very light-weight hypermedia type that can be expressed in XML or JSON, might be worth considering.
The following are also interesting discussion on the pros/cons of defining your own media types:

Interesting

Crell's picture

It doesn't look like the group has had much activity lately. Is that just a bunch of people discussing, or is it a formal WG of some kind?

Not sure

tnightingale's picture

I'm not entirely sure. I've been doing a little reading lately on RESTful API's, particularly looking for good examples of hypermedia/HATEOAS and have seen it mentioned in a couple of places (I think this was the initial link http://weblogs.asp.net/cibrax/archive/2012/03/06/building-hypermedia-web...). For the most part it seems to be just a bunch of people discussing.

From what I've read, there are two common approaches to media types: either stick with a relatively generic and commonly used format such as ATOM or HTML, or try rolling your own (usually XML or JSON based). HAL looks like it falls somewhere in the middle.

If Drupal decides to take the DIY approach, HAL might be a good reference.

JSON-LD

lsmith77's picture

I would also like to add JSON-LD to the list of possible formats:
http://en.wikipedia.org/wiki/JSON-LD

Its more or less RDF for JSON.

Its used by create.js (or more specifically the underlying VIE library to sync changes from the frontend to the backend and back):
http://createjs.org/
http://wiki.iks-project.eu/index.php/Semantic_Editor

JSON-LD and editable representations?

ethanw's picture

I think a JSON format is definitely ideal...in part because I am particularly interested in JS client-side library integration (Backbone) but also because it is a great balance of flexible, lightweight and widely adopted. I was initially concerned JSON-LD would be too verbose, but it looks reasonable.

That said, I am uncertain how editable data is represented in JSON-LD (or other formats) when an entity's editable value is different from it's rendered representation. Videos are a good example of this: the rendered HTML the server should provide after the entity is rendered for presentation is different from the "raw" embed URL which clients of our services will need to send back to update a resource.

I have not had a chance to see how VIEjs handles this using JSON(-LD?), but that might be a good place to start.

on my flight back from

lsmith77's picture

on my flight back from DrupalCon i realized i totally forgot to mention JSOP:
http://www.slideshare.net/uncled/jsop

i am not sure how far along they are in the standardization of this. jackrabbit implements one evolution stage of JSOP, which we recently adopted inside Jackalope to more efficiently communicate "diffs" aka changes to node instances that were made in memory that we want to persist in one save operation in Jackrabbit.

JSON-LD Background

msporny's picture

Hi folks,

Stéphane Corlosquet (scor) pointed me to this discussion and asked if I had some time to answer some JSON-LD questions. To which I thought: It's the Drupal community - of course we have time for the Drupal folks!

Just a quick bit of background before going into a high-level on JSON-LD and then replying to concerns. My name is Manu Sporny - I wrote the first version of the JSON-LD language and continue to be one of the primary editors of the document. I'm the current Chair of the RDFa/RDF Web Applications Working Group at W3C as well as the acting chair of the Web Payments and Linked Data in JSON Community Groups at W3C. I am also an editor in the Microformats Community, Web Payments / PaySwarm CG (Web Keys, Web Payments, Payment Intents), RDFa WG (RDFa 1.1 Lite, RDFa 1.1 Primer), HTML Working Group (HTML5+RDFa), RDF WG (JSON-LD Syntax, JSON-LD API), and member of the Semantic Web coordination group, RDF Working Group, and a variety of other groups operating at W3C. So, I dabble in Linked Data, the semantic web, and payment standards for the Web.

We created JSON-LD because we wanted a way to express Linked Data in JSON that didn't have any of the nasty RDF cruft that had built up over the years. Even though I participate in a variety of RDF/Semantic Web related activities, I've always hated how complicated it was to write code for the Semantic Web and Linked Data. So, JSON-LD was a back-to-basics approach to Linked Data - RDF was put on the back burner and the primary focus was on making Linked Data easy to use for Web Developers.

Specifically, we needed a way to pass JSON objects and messages for the Web Payments work, ensuring that the messages could express Linked Data (IRIs, I18N, etc.), but without changing the workflow for Web Developers that already use JSON. Yes, there is a loss-less mapping from JSON-LD to RDF and back, but that's very much in the background - you don't have to work in triples, or SPARQL, or RDF, or any of the other technologies that separate you from the data. Just work in JSON for the most part. We've even built an API for JSON-LD (called the JSON-LD API) that allows you to transform JSON-LD into a variety of different layouts that make programming with the language easier. For example - we provide a feature called "framing()" that allows you to take input data and re-structure it so that it aligns nicely with the algorithms in your application (JavaScript, Python, Ruby, PHP, etc.)

So, JSON-LD seems to align really nicely with the WSCCI stuff that you're working on because... this is exactly what we designed the language and API to do. You can view the latest spec for the Syntax here:

http://json-ld.org/spec/latest/json-ld-syntax/

and the latest spec for the API here:

http://json-ld.org/spec/latest/json-ld-api/

If you'd like, you can play around with a live JSON-LD editor here:

http://json-ld.org/playground/

Click the buttons for the examples at the top to see the JSON-LD and then the output. In the next post, I'll try to respond to how JSON-LD meets the requirements of the project. Hope this is helpful. :)

JSON-LD Deep Dive

msporny's picture

Hi all,

I had responded on the WSCCI thread and tried to show how JSON-LD would meet their requirements there. The same sort of stuff applies to this discussion, so I thought I'd try and explain how JSON-LD applies to WSCCI requirements in the hope that in doing so, some of the questions in this discussion are answered as well.

Format Expressiveness/Hypermedia Linking

JSON-LD was created to express Linked Data in JSON without requiring folks to change the way they publish and consume JSON (unless they wanted to use some of the more advanced features of JSON-LD). This means that you can add meaning to any JSON document today by just adding an HTTP header:

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#referencing-contexts...

Or by adding key-values to the data (one key-value if you want your keys to expand mean something in Linked Data, two key-values if you want to give your data IDs that are IRIs and make it true Linked Data):

{ 
  "@context": "http://json-ld.org/contexts/person",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "birthday": "10-09",
  "member": "http://dbpedia.org/resource/The_Beatles"
}

Example Hypermedia node content

You use IRIs to link to content - so any IRI will do. Here are some examples

{
  "@context": "http://example.org/drupal-wscci",
   "localFile": "file:///tmp/foo.txt",
   "localResource": "http://localhost/foo.txt",
   "remoteResource": "http://www.youtube.com/watch?v=RYlCVwxoL_g&feature=g-vrec",
   "remoteFile": "http://example.com/music/mysong.ogg"
}

Support for ad-hoc/configurable resource definitions

All JSON keys and values can be given "meaning" by modifying the JSON-LD Context:

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#the-context

Not every value in the JSON needs to have a mapping in JSON-LD. That is, you can mix plain old JSON with JSON-LD and that's a perfectly valid way to operate in JSON-LD. This allows you to add fields first and give them meaning later - allowing flexibility during the design/development process. If a particular key/value becomes used widely, you can give it "meaning" by defining things like a URL to identify the term, a datatype, a default language, etc.

Discoverability is done just like on the Web. URLs and IRIs are first-class citizens in JSON-LD, which means all you have to do is "follow your nose" and you may find /more/ data at that URL. For instance, if you see this URL in JSON-LD: "http://example.org/foo/bar" - you can, via HTTP negotiation, ask for a JSON-LD representation of that IRI and you may get back a document with more JSON-LD data in it. So, you can effectively crawl the web to find more data... this is one of the powerful concepts that Linked Data uses to make data more useful and less tightly coupled with the system that is publishing it.

Formatter Implementation/Handling

I don't really understand this requirement, but I'll take a shot at it anyway:

You can associated a slew of meta-data with a URL in JSON-LD. This means you can give that URL a human readable name like "Really funny cat video", or a creation date like "2012-05-12T21:48:22Z", or even things like saying what sort of editor should be used to modify the URL (like a Web-based video editor that is started via a Web Intent): "contentEditor": "http://www.aviary.com/web". This approach is incredibly flexible and allows you to describe as much as you want to about a particular piece of data...

Internationalization

JSON-LD supports full UTF-8 Internationalization, and even allows you to tag any string with a language value (which allows you to do things like specify text labels for a particular piece of editable content in multiple languages):

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#string-international...

Support for Collections

There are two types of supported collections in JSON-LD: sets and lists.

Sets are unordered collections (the concept of a mathematical set).
Lists are ordered collections (the concept of an array).

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#sets-and-lists

UUID

UUIDs are supported in JSON-LD. There are three major ways of expressing UUIDs. The first is through a simple key-value:

"uuid": "550e8400-e29b-41d4-a716-446655440000"

The second is by representing the UUID as an IRI:

"@id": "uuid:550e8400-e29b-41d4-a716-446655440000"

The third is by translating a URL for the data to a version 3 UUID ( http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_3_.28... ):

"uuid": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"

But the best way to give your data an ID is to generate a unique URL for it that resides on the Drupal system:

"@id": "http://groups.drupal.org/comment/reply/229318#data-389247982"

***Versioning/Conflict Resolution/Locking****

You can use any JSON facility that does time-stamping, versioning, conflict resolution to resolve this. If you want to go a bit deeper down the Linked Data rabbit hole, you can use what are called "Named Graphs". Named Graphs allow you to talk /about/ information, not just express it. So, a named graph says "This is the information that I know about the URL http://example.org/foo/blah.txt". You can time-stamp when you retrieved information from that URL, you can version it by normalizing the data and digitally signing it (which JSON-LD supports both graph normalization and digital signatures), you can do conflict resolution in a variety of ways since JSON-LD supports information diff-ing (which allows you to tell, deterministically, what new data was added and what data was removed from a particular URL). More about this feature here:

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#named-graphs

and here:

http://json-ld.org/spec/latest/rdf-graph-normalization/

Bottom line: These features allow you to do things like delayed conflict resolution and JSON object/change-set merging.

PHP Libraries

There is a reference implementation in PHP:

https://github.com/digitalbazaar/php-json-ld

It's the same one we use for our commercial implementation - it's always up-to-date (and will be kept up-to-date) because our business (Digital Bazaar) depends on it.

JavaScript Libraries

There is a reference implementation in JavaScript:

https://github.com/digitalbazaar/jsonld.js

It's the same one we use for our commercial implementation - it's always up-to-date (and will be kept up-to-date) because our business (Digital Bazaar) depends on it. You can see this library in use at:

http://json-ld.org/playground/

Current Drupal Projects and Groups

Unfortunately, I don't know enough about active Drupal Projects and Groups to answer this question. I know scor (Stephane Corlosquet) has a JSON-LD module for Drupal.

Community Experts

There are a number of people that are always on the #json-ld IRC channel on freenode.net. We're on there 24/7, so if you have a question, just drop it in the channel and you will most likely get a response within an hour (unless all of us are sleeping).

Other Drupal Resources

JSON-LD sprang out of the requirement for a light-weight Linked Data format for Web Developers. We were working on the Web Payments work and RDFa when we needed something lightweight to do a good Linked Data REST API:

http://payswarm.com/

Anticipated "Lift" for Core Implementation

I think that we've implemented most of the language/API stuff that you'd need. There are up-to-date libraries in PHP and JavaScript that have commercial support via Digital Bazaar. The last remaining bit that we need to settle once and for all is the Framing code for the JSON-LD API... but that's progressing nicely.

Also keep in mind that the W3C just picked up this spec for standardization... so, it's on track to become an official standard at some point in the next year:

http://www.w3.org/2011/rdf-wg/meeting/2012-05-30#line0235

Marketshare

Look at slide #15 for the systems that are currently integrating or have integrated JSON-LD:

http://www.slideshare.net/lanthaler/jsonld-for-restful-services

JSON/XML Flexibility

Since there is a path from JSON-LD to RDF... you can render in RDF/XML - which would be an awful thing to do. Since JSON-LD exposes a tree-based structure, you could also easily convert the key-value pairs into an custom XML-based Linked Data format. I can go into more depth on this if required as it's a big post in and of itself.

Semantics

JSON-LD is all about semantics... after all, it's a Linked Data format in JSON. More about this here:

http://json-ld.org/spec/ED/json-ld-syntax/20120522/#introduction

Support for Semantic Querying

There are two primary ways that JSON-LD can be queried. The first is via the JSON-LD API. You can query by example using the Framing feature of JSON-LD:

http://json-ld.org/spec/ED/json-ld-api/20120524/#framing

To view a framing example in the JSON-LD Playground, go here: http://bit.ly/KLvuTO
Click on the "Framed" tab at the bottom. You will notice that the input data is a flat representation of libraries, chapters and books. The framed output organizes the data into a more hierarchical form using something called a "frame" that places all chapters into books and all books into libraries. This is a mechanism called "query by example". You give the JSON-LD API a JSON object that you'd like to see in the output and it queries the data to find "things" that look like that object.

You can also translate JSON-LD to RDF and put that into a triple/quad store and use SPARQL to query the triple/quad-store.

Semantic Libraries/Tools Using

Aside from the libraries themselves, you can see other projects that are integrating JSON-LD here:

http://www.slideshare.net/lanthaler/jsonld-for-restful-services

Expect to see many more when the spec is finalized in the coming months and released as an official W3C standard.

Hope this helps. :)