Resources, representations, URIs, and WSCCI routing

Posted by effulgentsia on March 28, 2012 at 12:46am

There've been some interesting discussions happening recently (as well as not so recently) about how the Web Services initiative (formerly known as WSCCI) should best implement the underlying plumbing for making Drupal 8 properly RESTful. In this post, I want to clarify my current understanding of where we landed in today's #drupal-wscci IRC meeting. Disclaimer: I'm not an expert on the intricacies of HTTP, REST, and related architectures, so what I explain here might be incorrect, in which case, please say so.

Let's start with some quotes:

Wikipedia:

REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. A resource can be essentially any coherent and meaningful concept that may be addressed. A representation of a resource is typically a document that captures the current or intended state of a resource....An important concept in REST is the existence of resources (sources of specific information), each of which is referenced with a global identifier (e.g., a URI in HTTP). In order to manipulate these resources, components of the network (user agents and origin servers) communicate via a standardized interface (e.g., HTTP) and exchange representations of these resources (the actual documents conveying the information).

Roy Fielding's definition of a resource in the original dissertation that first defined REST:

The definition of resource in REST is based on a simple premise: identifiers should change as infrequently as possible. Because the Web uses embedded identifiers rather than link servers, authors need an identifier that closely matches the semantics they intend by a hypermedia reference, allowing the reference to remain static even though the result of accessing that reference may change over time. REST accomplishes this by defining a resource to be the semantics of what the author intends to identify, rather than the value corresponding to those semantics at the time the reference is created. It is then left to the author to ensure that the identifier chosen for a reference does indeed identify the intended semantics.

Tim Berners-Lee on Generic Resources:

A URI represents a resource. A "resource" is a conceptual entity (a little like a Platonic ideal). When represented electronically, a resource may be of the kind which corresponds to only one posisble bit stream representation. An example is the text version of an Internet RFC. That never changes. It will always have the same checksum.

On the other hand, a resource may be generic in that as a concept it is well specified but not so specifically specified that it can only be represented by a single bit stream. In this case, other URIs may exist which identify a resource more specifically. These other URIs identify resources too, and there is a relationship of genericity between the generic and the relatively specific resource.

As an example, successively specific resources might be

1. The Bible
2. The Bible, King James Version
3. The Bible, KJV, in English
4. A particular ASCII rendering of the KJV Bible in English

...This model is more of an observation of a requirement than an implementation decision. Multilevel genericity clearly exists in all our current life with books and electronic documents. Adoption of this model simply follows from the rule that Web design should not arbitrarily seek to constrain life in general for its own purposes.

Putting this into context of Drupal:

There's information in a Drupal site that we wish to make available to a person or machine somewhere on the internet. Examples:

An article someone wrote (e.g., node/5).
A list of blog posts by a certain user (e.g., blog/3).
Modules that are available/enabled on the website (e.g., admin/modules).

For any of the above, a person or machine may want to be more specific about what is wanted. Examples:

Node 5 at a particular revision.
Node 5 at a particular revision, in English.
Node 5 in English, as an HTML page.
Node 5 as an HTML page suitable for a low-bandwidth mobile device (i.e., maybe without all the comments and most blocks).
Node 5 as an HTML page as would be seen by an anonymous user (even if I'm currently logged in).
User 3's blog posts, in English, as an RSS/Atom feed.
The list of enabled modules in JSON.

Some questions that come up:

Which levels of specificity should map to unique URIs vs. which levels should be determined by other information (e.g., HTTP request headers, user preferences as recorded in the Drupal site, etc.)?
How does this affect module developers implementing hook_menu() or other code?
How does this affect the routing code currently being worked on by the Web Services Initiative?
How does this affect the desired UX for page building tools, all the way down to the question: "what is a page"?

Here's what I think we decided in today's IRC meeting:

When a HTTP request hits Drupal, prior to routing, Drupal will canonicalize $request. The canonicalization will attempt to set the "path" to the most generic form possible and fill in other parts of $request with specifiers whose semantics are defined by HTTP 1.1. For example, because HTTP 1.1 defines standard headers for language and format negotiation, whether a request comes in for node/5 with language and format headers set, or whether the request comes in as en/node/5.json, we will standardize $request to contain 'node/5' as the desired path, 'en' as the desired language, and 'json' as the desired format.
Because of above, routing needs to take into account all of $request, not just 'path'.
When generating URLs (e.g., when code calls l()), client code (e.g., modules) will pass canonicalized information (similar to the $options parameter currently used, but with some changes TBD), and the generator (a Symfony based replacement to the url() function) will have some way of determining whether to add language, format, and other information to the outbound URL or not. Modules will be able to add listeners to hook into this logic (similar to hook_url_outbound_alter(), but probably moved from a hook to an OOP-based Symfony event listener.
I don't yet know what this means for UX.

The key summary here is that different sites will end up wanting different things with respect to what information is in the URLs vs. HTTP headers, and therefore, this decision should be separated from WSCCI routing code and module code that doesn't need to care. Therefore, we have a "canonicalize $request" phase, and a "generate outbound URL" phase, and everything in between, including routing, hook_menu(), and 99.9% of all other Drupal code works with the canonicalized information.

Thoughts?

Comments

Routing for aliases

Posted by andremolnar on March 28, 2012 at 2:16am

This came up in IRC

andremolnar: I think /about-us (json) would return 404 [or other appropriate http response code] unless there was an alias to node/5 (json) explicity stated somehow

EclipseGc: which is why I was saying that we don't deliver anything for content-type headers that aren't' explicitly set via some UI

Crell: about-us => node/5, and then do everything else on node/5. So if node/5 type json exists, then about-us type json exists, too.

And in the 'what is a page' thread its clear that we could have a number of different paths to node/5 all with different contextual information informing what is displayed (and therefore being different resources - or "successively specific resources"). I posted one example in http://groups.drupal.org/node/218914#comment-721864. If all aliases to node 5 always return the same json representation, then you couldn't build a service that more closely reflects the actual content at each path.

(If you want I can expand on the example from the other thread to describe a specific use case for different json representation for different paths, even though each at their core refer to the same node.)

At any rate, I'm bringing it up again to say my piece. I can live with a a Drupal where foo/bar/baz -> node/5 and zab/rab/oof -> node/5 have the same JSON representation - but its not right.

I can live with a a Drupal

Posted by xtfer on March 28, 2012 at 2:44am

I can live with a a Drupal where foo/bar/baz -> node/5 and zab/rab/oof -> node/5 have the same JSON representation - but its not right.

That is no different from foo/bar/baz and node/5 having the same representation. They are actually different URI resources, they just happen to share the same representation.

This problem exists currently and the solution is to ensure you don't serve up identical representations on different URLs by blocking access to some, hence the necessity for http://drupal.org/project/globalredirect.

EDIT: If they really are different, a 303 should be returned with a Location set to the URI of the representation.

More info: http://www.w3.org/TR/cooluris/

Minor Revision

Posted by andremolnar on March 28, 2012 at 5:38am

After talking on IRC I will change my position slightly
"but its not right," should now read: "but it could be much better and it looks like there will be pieces in place that would allow what I am imagining, so I'm good."

Great writeup

Posted by Crell on March 28, 2012 at 2:52am

I think that's a very good summary, Alex, thanks!

The other key thing from the meeting is that while in a Blocks/Layout world you could configure the HTML page at node/5 to have more content on it that is not from node 5 than not, at a framework/routing/REST level it is still the HTML representation of node 5. That you are inlining lots of linked data is irrelevant, and in fact inlining linked data is perfectly reasonable in many situations within a REST model.

To Andre, foo/bar and baz/bah both being the same resource in JSON is no better or worse than foo/bar and baz/bah both being the same resource in HTML. In either case, we should resolve path aliases as soon as possible (as we already do now) and then everything else is the same. If you have multiple aliases to the same system path, then you already have an application-level bug, so the framework/routing level need not be responsible for solving it.

In practice, I don't think aliases make much if any sense except for HTML, since that's the only case where they're human-readable. If I'm writing a JSON REST client, I want to use node/5, not foo/bar, because it's more debuggable. Path aliases are, really, just a bit of UI vineer for web browsers.

Use Case

Posted by andremolnar on March 28, 2012 at 3:59am

Okay, looks like I have to provide the use case that's not an "application level bug" nor "vineer". (Its a 'feature not a bug' and a Information Architecture space / important contextual information)

From the earlier example: Consider the following
http://example.com/node/5
http://example.com/Air/KellyWatchTheStars,
http://example.com/Air/MoonSafari/KellyWatchTheStars,
http://example.com/Air/GreatestHits/KellyWatchTheStars

node/5 is the track information (lets say nothing more than title and track length).
Air/KellyWatchTheStars is a display that has collateral info about the artist, but nothing about albums.
Air/MoonSafari/KellyWatchTheStars is a display with additional collateral information about the album
Air/GreatestHits/KellyWatchTheStars is the same as the other display except for the fact that its a different album altogether

In each case its the same resource node/5 at its core.
If I wanted the JSON representation of each of those, its not unreasonable to think that I could develop a service that returns a json object that is aware of a) the IA space context b) returns differently composed objects based on that context. Its a reasonable example of "successively specific resources"

If however, as an application designer, I only want json representations for first 3 out of the 4, and specifically don't want the 4th - the 4th should return an appropriate http response (e.g. nothing of that type to see here).

Now, I would 100% agree, if each of these 4 paths were literally returning the exact same HTML representation - then yeah, at a minimum a poor IA choice. In that case they might as well all return the exact same json since they are literally exactly the same thing (though not at all RESTful).

These should not all be aliases of node/5

Posted by effulgentsia on March 31, 2012 at 1:17am

Thanks for these use-cases. These have a corollary in the non-web world too. If I want to listen to "KellyWatchTheStars", I can go to a music store and make that request to the music store clerk, who can respond to my request by selling me a USB stick with a mp3 file on it, or a CD single, or the MoonSafari album, or the GreatestHits album: all of these contain valid representations of the resource I asked for. On the other hand, if I want a CD containing that song, and I want MoonSafari cover art on that CD, then that's a different resource, and I need to add this precision as part of my request.

So, "KellyWatchTheStars with MoonSafari cover art" is a different resource than "KellyWatchTheStars". Therefore, it should not be represented in Drupal as an alias. I'm guessing that in D6 and D7 sites, people sometimes create multiple aliases to the same "node/NID" path, and then have code that changes what's shown on the page based on the URL. But I think for D8, we should consider this to be wrong. If you want to make 4 different pages all decorating the same "primary" content in different ways, then make 4 different pages that do so, and give each of these a different path. As I understand it, our goal for D8's page manager is to make this easy.

I think this matches what User Advocate says below. What do you think? Are there holes with this reasoning?

My Understanding

Posted by andremolnar on March 31, 2012 at 2:33am

I have a few thoughts here. Forgive me if this goes on a bit. Stick with me. tl;dr; In the end I agree that there are 4 different pages, but its not necessarily a problem that they're an alias for the same resource.

Part of it comes down to 'what is a page?' Historically in Drupal, the act of creating a node resulted in creating a page.
However, this has proven limiting since people either a) didn't want to create a page or b) if they did want to create a page, wanted control over what else was on that page besides the content they just created.
Many great minds found work around after work around and after years of pondering and variations, it seems that its come down to WSCCI and SCOTCH to figure it all out once and for all.

My understanding of how layouts will work, is that a 'page' can be built around some resource. In most cases a node will act as the base, but it could be a user, view, comment, image, file or anything else. The base resource will be the seed and a combination of contextual information and user selected rules will determine what else appears around the resource.

User Advocate's, and yoroy's work is around figuring out a user interface that is functional and works with the architecture that is being built to allow layouts and blocks everywhere... and by design provide a positive user experience.

Next we consider the definition/usage pairs User Advocate talks about and how that fits in here. node/5 really only means something to Drupal. Requesting node/5 directly should still of course return a page (when making a request for html) or some other representation of the resource provided the request explicitly asks for a different representation. However, if the content defined by node/5 is merely the seed used in the creation of a page the 'usage' is entirely different.

With all that said, and to finally answer your question: On a technical level, I don't think there is anything wrong that a number of 'pages' with different path aliases to the same base or seed resource (e.g. node/5). The reason is, that the layout system needs to know at a minimum what its going to build a layout around. The routing system also needs to know which controller is going to handle the request. The node represented by node/5 provides both the means to do routing and the content defined by node/5 provides the key bit of contextual information allowing a page to be composed around it. The path alias provides yet another layer of context. As is user information and so on and so on. See Context User Stories

In the end a 'page' is what we are calling all the things we see after we request a resource and key contextual factors have been taken into consideration.

Finally to bring it back to WSCCI and the services side of things - the same concepts can be applied. Requesting the JSON representation of node/5 at foo/bar/baz or zab/rab/oof still need to be routed and the context that node/5 provides along with path can make a determination on how the json response can be composed.

What it means though is that a UI needs to be created to allow the same sort of user control to define what (if any) JSON payload will be returned given different contextual information besides the base resource being requested. And from what I understand from talking to EclipseGc is that such a UI could theoretically be created using the same plugin system that will drive the UI for the html representation.

Understanding the implications of these use cases

Posted by user advocate on March 28, 2012 at 7:36pm

Andre, these are a really important use cases from the UX point of view and I think you’ve hit the nail on the head in terms of the relationship between urls and nodes (as content sources).

I think the best way to understand the implications of this is to regard a node as ‘content definition’ and the contextually aware handler for a given url as ‘content usage’. Even though they are two sides of the same coin, there is an enormous difference between definition and usage of content (or any other kind of resource).

The principle of definition/usage pairs can be extremely powerful when designing UI systems. Here are some random Drupal examples: definition of taxonomy terms versus usage of taxonomy links in a navigation system; definition of a View versus usage of that View in a Panel Pane. These definition/usage pairs can chain - as in this case where we have a site builder definition of a content type versus cms usage of that type for creating specific content which in turn is cms definition of specific content versus presentation usage of that content within some user context.

Andre’s use cases point to scenarios where the user contexts vary significantly and so the usage of the predefined ‘node/5’ varies also. That does indeed imply that the rendered HTML could also differ significantly.

BTW, the IA Space concept is ultimately a way of pre-defining ‘user visitable’ locations in a site such that these location definitions can be used in different ways to show specific assemblages of content or controls for different user contexts. This has enormous implications for improving the UX for final presentation as well as administrative workflows. All this (and more) is within reach through the architecture being discussed here.

If we get this right it will be great news for UX because clear usage context is at the core of solid UI design.

Michael Keara
User Interface Systems Architect,
The User Advocate Group