There've been some interesting discussions happening recently (as well as not so recently) about how the Web Services initiative (formerly known as WSCCI) should best implement the underlying plumbing for making Drupal 8 properly RESTful. In this post, I want to clarify my current understanding of where we landed in today's #drupal-wscci IRC meeting. Disclaimer: I'm not an expert on the intricacies of HTTP, REST, and related architectures, so what I explain here might be incorrect, in which case, please say so.
Let's start with some quotes:
REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. A resource can be essentially any coherent and meaningful concept that may be addressed. A representation of a resource is typically a document that captures the current or intended state of a resource....An important concept in REST is the existence of resources (sources of specific information), each of which is referenced with a global identifier (e.g., a URI in HTTP). In order to manipulate these resources, components of the network (user agents and origin servers) communicate via a standardized interface (e.g., HTTP) and exchange representations of these resources (the actual documents conveying the information).
The definition of resource in REST is based on a simple premise: identifiers should change as infrequently as possible. Because the Web uses embedded identifiers rather than link servers, authors need an identifier that closely matches the semantics they intend by a hypermedia reference, allowing the reference to remain static even though the result of accessing that reference may change over time. REST accomplishes this by defining a resource to be the semantics of what the author intends to identify, rather than the value corresponding to those semantics at the time the reference is created. It is then left to the author to ensure that the identifier chosen for a reference does indeed identify the intended semantics.
A URI represents a resource. A "resource" is a conceptual entity (a little like a Platonic ideal). When represented electronically, a resource may be of the kind which corresponds to only one posisble bit stream representation. An example is the text version of an Internet RFC. That never changes. It will always have the same checksum.
On the other hand, a resource may be generic in that as a concept it is well specified but not so specifically specified that it can only be represented by a single bit stream. In this case, other URIs may exist which identify a resource more specifically. These other URIs identify resources too, and there is a relationship of genericity between the generic and the relatively specific resource.
As an example, successively specific resources might be
1. The Bible
2. The Bible, King James Version
3. The Bible, KJV, in English
4. A particular ASCII rendering of the KJV Bible in English
...This model is more of an observation of a requirement than an implementation decision. Multilevel genericity clearly exists in all our current life with books and electronic documents. Adoption of this model simply follows from the rule that Web design should not arbitrarily seek to constrain life in general for its own purposes.
Putting this into context of Drupal:
There's information in a Drupal site that we wish to make available to a person or machine somewhere on the internet. Examples:
- An article someone wrote (e.g., node/5).
- A list of blog posts by a certain user (e.g., blog/3).
- Modules that are available/enabled on the website (e.g., admin/modules).
For any of the above, a person or machine may want to be more specific about what is wanted. Examples:
- Node 5 at a particular revision.
- Node 5 at a particular revision, in English.
- Node 5 in English, as an HTML page.
- Node 5 as an HTML page suitable for a low-bandwidth mobile device (i.e., maybe without all the comments and most blocks).
- Node 5 as an HTML page as would be seen by an anonymous user (even if I'm currently logged in).
- User 3's blog posts, in English, as an RSS/Atom feed.
- The list of enabled modules in JSON.
Some questions that come up:
- Which levels of specificity should map to unique URIs vs. which levels should be determined by other information (e.g., HTTP request headers, user preferences as recorded in the Drupal site, etc.)?
- How does this affect module developers implementing hook_menu() or other code?
- How does this affect the routing code currently being worked on by the Web Services Initiative?
- How does this affect the desired UX for page building tools, all the way down to the question: "what is a page"?
Here's what I think we decided in today's IRC meeting:
- When a HTTP request hits Drupal, prior to routing, Drupal will canonicalize $request. The canonicalization will attempt to set the "path" to the most generic form possible and fill in other parts of $request with specifiers whose semantics are defined by HTTP 1.1. For example, because HTTP 1.1 defines standard headers for language and format negotiation, whether a request comes in for node/5 with language and format headers set, or whether the request comes in as en/node/5.json, we will standardize $request to contain 'node/5' as the desired path, 'en' as the desired language, and 'json' as the desired format.
- Because of above, routing needs to take into account all of $request, not just 'path'.
- When generating URLs (e.g., when code calls l()), client code (e.g., modules) will pass canonicalized information (similar to the $options parameter currently used, but with some changes TBD), and the generator (a Symfony based replacement to the url() function) will have some way of determining whether to add language, format, and other information to the outbound URL or not. Modules will be able to add listeners to hook into this logic (similar to hook_url_outbound_alter(), but probably moved from a hook to an OOP-based Symfony event listener.
- I don't yet know what this means for UX.
The key summary here is that different sites will end up wanting different things with respect to what information is in the URLs vs. HTTP headers, and therefore, this decision should be separated from WSCCI routing code and module code that doesn't need to care. Therefore, we have a "canonicalize $request" phase, and a "generate outbound URL" phase, and everything in between, including routing, hook_menu(), and 99.9% of all other Drupal code works with the canonicalized information.