REST routing

Posted by Crell on May 11, 2011 at 1:06am

Last Thursday (5 May) I had an impromptu IRC chat with a number of people about the routing logic we are going to need in Phase 3. We haven't really nailed that down much, so this was an excellent conversation to have. I've tried to capture all of the relevant details below (not necessarily in the order we arrived at them), but those that were present can correct me if I'm wrong somewhere. Further discussion welcome.

Attending:

Larry Garfield (Crell)
Earl Miles (merlinofchaos)
Vladimir Zlatanov (dikini)
Roy Scholten (yoroy)
Justin Randall (beejeebus)

Previous discussion: http://groups.drupal.org/node/67588

We originally were talking about the UI needed to define routing rules and how one would use context for that. We quickly drifted to the routing itself, however. Vladimir noted an Erlang-based router called webmachine, which while we cannot use it directly (it's Erlang) or do a direct port (PHP is too totally different) did serve as a good discussion point for comparison.

What we came to realize is that if we really want to support arbitrarily complex routing rules, that's, well, expensive. Especially since some of the information on which we could want to route will require interaction with Drupal as it is derived context (eg, node type). However, most routing logic we want to handle based solely on request context (the raw HTTP request). It's faster, and frankly the majority case (we think).

This split exists in Drupal now, sort of. Core routes (via hook_menu) solely based on path. Panels adds a concept of "variants" on top of it to allow essentially secondary routing, but because it has to dance around core it is ugly, hacky, hard to understand, and no one really likes it. However, it appears that conceptually we will still need to have something like that for routing based on arbitrary criteria, so we will need to do it right from the get-go.

Essentially, routing becomes an inherently 2-step process. The first, which for the time being I'll call primary routing, takes the incoming context and looks up possible mapping information based on selected, fixed criteria in the request. The primary routing uses "build time" logic; that is, it's like hook_menu now where we do a lot of work up front to figure out our mapping table and then request time is very fast. That will return one or more possible Response Controllers (plus configuration) to handle the incoming request.

If only one response controller is found, we're done. Pass the context object off to the response controller and let it do its thing. If more than one possible response controller is found, then we trigger the secondary routing step. The secondary routing step is request-time, that is, it's PHP code that executes in the request to decide which response controller to use. The secondary routing narrows the list down to a single response controller, and then we use that and we're done.

Primary routing

First off, we all agreed that primary routing should be pluggable. Since we're already building a plugin system, it seems like a reasonable system to use. :-) In fact, arguably the routing is simply the Mapper for the Response Controller Plugin Type. (Check the definitions page for what each of those pieces is.) Making it pluggable has a number of advantages:

It forces a clean separation of concerns using a system and pattern that we're going to be using throughout Drupal.
While the default implementation will almost certainly be SQL-based, there's no reason why one shouldn't be able to reimplement it using MongoDB as a backend, or some other system.
In fact, one very important alternate implementation would be "hard coded". If we view the installer and updater as not one-off hacky scripts but as simply alternative routers with their own configuration, then we can vastly simplify their code. Those systems no longer need to deal with an alternate "state", they just respond to an alternate, pluggable router. It also means that, in essence, Drupal CMS, the Drupal Installer, and the Drupal Updater become three separate applications built on top of Drupal Framework. Which is just all kinds of hot and sexy.

We went around a couple of times on what the primary routing criteria should be. We didn't come to a firm conclusion, but the closest we got to a consensus was:

Domain
Path (or rather path pattern, as we do now)
HTTP Method (GET, POST, PUT, etc.)
Content-Type (vis, text/html, text/json, image/jpg, etc.)

Note that Language cannot be a primary routing criteria since it is derived context; language could be influenced by all kinds of things like use preferences.

All of those elements are available directly in the HTTP request itself. Together, they uniquely identify a REST resource (domain, path), what to do with it (method), and how (content type). Even just those four attributes give us vastly more flexibility than we have now in Drupal. The trivial, degenerate implementation would look like simply adding Domain, Method, and ContentType columns to the menu_router table. (We'll likely do much more than that, but you get the idea.)

Most interestingly, it allows for multiple operations on a single URL.

GET node/5, type: text/html: Return to me the HTML page of node 5.
POST node/5, type: text/html: Submit this data to node 5.
GET node/5, type: text/json: Return to me the JSON version of node 5.
PUT ndoe/5, type text/json: Here's a JSONified version of a node, save it to node 5.
GET node/5, type: application/pdf: Return to me node 5 rendered as a PDF.
DELETE node/5, type: *: Delete node 5 (assuming proper permissions)
GET some/view, type: xml/atom: Return to me that view, using the Atom display plugin.
GET some/view, type: xml/rss: Return to me that view, using the RSS display plugin.
POST node/5, type: drupal/form: Submit this Drupal form at node/5 (proposed???)

And so forth. The domain part means, amusingly, that we've just put part of Domain Access into core almost for free. Ken wil be very chagrined to hear that. :-)

Another important factor is that with that richer information we can return far more useful and appropriate error messages. For instance, if someone sends a PUT node/5 request, and there is no PUT handler registered but there is a GET handler registered, then an HTTP 404 is not the correct response. The correct response is HTTP 405 Method Not Allowed. If the request is for text/json, then sending back a big HTML error page is simply flat out wrong; instead, an empty JSON response (or something sensible) should be sent back instead. There's considerable potential for performance improvements if we can send back a proper HTTP 302 (Found), 303 (See other), or 304 (Not modified) response very early before the rest of Drupal initializes.

This would be extremely useful for REST applications, but also for our own mundane Ajax usage. If you've ever used the Views or Panels UIs and had an Ajax response come back as an incomprehensible pile of garbage in an alert box, you should be very excited about the possibilities this opens. :-) The best part is that this is all existing parts of the HTTP standard; we're just not bothering to use it yet.

One suggestion was to also include ETags in the primary routing information, but I'm not sure if that's viable since that could depend on a huge number of factors that wouldn't be impacted at build time.

While I'm talking about nodes above in the examples, remember that at this level there are no nodes or entities; there is just data and requests to resources. Essentially, at this point we're architecting a Drupal-independent REST server.

Secondary routing

We have no idea yet what form this would take. :-) It needs to not be dog slow, and needs to be configurable from the UI, but beyond that I have no idea what this looks like. As Earl noted, however, "it's amazing what they assume will work", so we need to make sure that "anything" (not to be confused with "everything") can be used here. My thinking is that we want to make it reasonably fast for the common cases (entity bundle and language are the most common I can think of), but still allow people to shoot themselves in the foot with incredibly stupid and slow routing rules.

Other considerations

Another important question is access control. We didn't really cover that. Presumably access should be pluggable (duh), and bound to each router, not just each path. Someone could easily have access to GET node/5, but not PUT node/5. Right now that's governed by simple callback functions that are serialized into the menu_router table. We likely want something more robust than that, but it still needs to be thought through.

Also relevant is error handling. While breaking up the routing information as above means that we can return a separate 4xx error type for each method or content type or domain, we still need to determine how to do so in a performant fashion that isn't butt ugly.

We also need to determine how we support wildcards. In the degenrate case, we wouldn't want to have to record the domain of the site into every record. That makes the site less portable. So at the very least we will need to make that optional and/or allow an "any" key.

While our primary implementation will likely be SQL, as noted above we need to ensure that the semantics we define will still work on other systems (hard coded, MongoDB, Cassandra, whatever).

There's still some open questions but I really like the direction this conversation went. It provides us with an extremely robust under-pinning to do lots of things Right(tm) with respect to HTTP, which is the lifeblood of Drupal I/O.

Discuss. :-)

Comments

Second & Third level routing

Posted by mikeytown2 on May 11, 2011 at 3:27am

Primary routing - Info discovered from the URL in short.
Looks good to me!

Secondary routing - Info discovered from things like $_cookie & $_server.
theme - think mobile or not.
role - anonymous, authenticated, misery
geo - GeoIP, Geolocation
https or http
i18n

Third level routing - Very configurable.
Node type
View display
Taxonomy term
Panel used
Page number
View arguments/exposed filters
etc...

Why 3?

Posted by Crell on May 11, 2011 at 4:04am

Are you proposing a third level? I don't see why that would be needed. We have "the 90% case that we make really fast" and "other". Context access we're looking to make as transparent as possible, so I don't get why you would need a third level for cookies. That would be in the "other" level just like node type, since both would require runtime behavior.

Level 2 can influence Level 3

Posted by mikeytown2 on May 11, 2011 at 4:43am

Level 2 would be an optional level in short; by default in Drupal we wouldn't care about any of these things. But if we wanted an international (i18n) shopping (https) website that runs well on mobile and desktops with rich feedback (role) behind a paywall; having level 2 influence level 3 would be nice. You can do it in only two levels, but you would have to make sure that the page handler is fully aware of the other contexts at that level and acts accordingly. Optional level 2 doesn't care about the path like node/5 where level 3 does (it would care about a path prefix like /en or /fr). You are correct that optional level 2 would require php runtime behavior, but those settings could be set at the apache level (mod_geoip, mobile filter, language) which would then use some simple logic to set the context accordingly.

Still not following

Posted by Crell on May 11, 2011 at 5:21am

It sounds like you're suggesting that phase 2 be more of a "phase 2 and phase 2_alter"?

Given that language may depend on user preferences, which in turn depends on the session cookie and a database lookup, I'm still not clear on why you would want to explicitly split that phase. I admit I don't know what goes on inside phase 2, so it may make sense for it to have multiple passes internally, but I'm still not following how it is intrinsically a separate step from the rest of "things that we can't optimize for".

However, you do make a good point that we may want to include the protocol in the lookup as well: http vs. https. (And, gulp, that makes me wonder if we need to consider other protocols, or other ports. Although I'm probably now just being silly.)

Maybe...

Posted by mikeytown2 on May 12, 2011 at 3:33am

It could be phase 2 alter... but when you bring in page caching, I think it changes the game. I've helped to setup a page cache variant for mobile theme vs normal and another one based off of GeoIP; these are all for drupal's core cache or memcache, no boost trickery in these 2 cases.

This is why I think we need phase 2 and phase 3: for the page cache. Someone say role based page caching in Drupal without hacking around core?

I think this muddies the

Posted by sdboyer on May 12, 2011 at 4:01pm

I think this muddies the waters. There are N cases, and NxN arguments, for additional possible caching criteria and the order they ought to come in. I don't see the value to organizing additional explicit phases around them.

Not sure how useful .. but...

Posted by te-brian on May 11, 2011 at 4:12am

FuelPHP (the framework) has HTTP-verb-based routing. http://fuelphp.com/docs/general/routing.html. A peek at the source might give some ideas. A big difference in from Drupal is that they use a routes config file.

Some source: https://github.com/fuel/core/blob/master/classes/router.php

Good. Are you still planning

Posted by pounard on May 11, 2011 at 4:52am

Good. Are you still planning to split it into response controller and response object (this would make even more sens with the described data upper). As soon as you plan to differenciate "reponse context" (in mean here HTML, JSON, GET, PUT, others) you definitely should. The response object will determine if the system will bootstrap the Layout/Block/Region systems). A JSON response has no sens to have a Layout for example, while the Secondary routing could impose the Response Object to be JSON.

Pierre.

Still TBD

Posted by Crell on May 11, 2011 at 5:26am

Whether or not we have a formal response object that gets populated or if the response controller is the response object is still up in the air, I think. You're correct that many response types would have no need for layouts, although I don't know if that's a system that is initialized or simply an implementation detail of the HTML response controller(s).

I'm not following your logic though that we have to separate those two if we're going to have separate response types. Why would that be necessary and/or a good idea? (I'm not against it; I'm just trying to get a sense of the pros/cons you're suggesting. Note the earlier threads that discussed this question in some detail as well, too.)

Object basically carry the

Posted by pounard on May 16, 2011 at 4:57am

Object basically carry the method to render the output. If you want to enable multiple response types (JSON and such) the final output render method probably should be owned by the response object itself. If I follow ZF response object, it also carries the current response status (that can be modified during runtime) such as the "redirect" state, the erroneous state, the custom error handler and such.

Making it plugable can also allow ultra-verbose specific implementation to be used for development boxes or allow SimpleTest specific implementation for testing for example. In my previous example, I think the "response context" makes no sense, it's the response object. What you called the response context is IMHO more the input context or such, but this seems quite arguable.

While the input object carries the input data, the response object, logically, as its opposite should carry the response data (render array, or raw strings, or structured array, can be pretty much anything), potentially inside various response segments (that could be potentially regions for reponse object of type page) and which deliver itself (for page, it calls page build, but if this component exists, it should probably have the page build as a method itself).

Pierre.

Using HTTP standards is

Posted by Sylvain Lecoy on May 11, 2011 at 9:37am

Using HTTP standards is definitely a good thing. ETag is a nice idea as it can help for cache invalidation, but also for concurrency. In a collaborative platform, you can frequently work on the same item, when you submit a PUT (for full update) or PATCH (for partial update) request, you need to be sure that you wont blindly overwrite the work of someone else. This is done by including the ETag received on the item in the If-Match header when PUTing or PATCHing the work back to the server. If the state of the server is still the same, the update will succeed with a 200 OK status, otherwise will fail with a 412 Precondition Failed status.

Different

Posted by Crell on May 11, 2011 at 3:49pm

I agree that ETags can be useful, but not necessarily for routing. What you describe is for state checking on an object, not for selecting which object to speak to. Since the ETag would change after every node_save(), or any time some other node is updated that changes a view that appears on that page, I don't know that it has any value for routing. Definitely for caching it could be useful, though, and I think we're already using it for that.

So we both agree that It does

Posted by Sylvain Lecoy on May 13, 2011 at 11:50am

So we both agree that It does not have any value for routing. You mentioned using ETags in the primary routing information, but as ETag goal is more cache invalidation and concurrency control mechanism, I wasn't sure about using it for routing.

Chagrined?

Posted by agentrickard on May 11, 2011 at 6:13pm

Hardly. Reduce my workload, please.

--
http://ken.therickards.com/

Good stuff here. I've been

Posted by sdboyer on May 11, 2011 at 9:23pm

Good stuff here. I've been thinking about this a lot too, and this provides a nice baseline structure for it. Gonna shoot for terse today, though.

sql-backed default backend? yikes

First - SQL-backed as the default backend? I guess it depends on how much we want to put into the primary vs. secondary router, but in my thinking at least, we want a response controller selected as early as possible. Read: pre-db. Read: default is in conf. I think it's hard to argue a "Drupal Framework" if we don't aim for base functionality without hitting the db. Frankly, this whole discussion looks like a regression back into assuming the incoming request is HTTP; the only way this whole framing makes sense to me is if it's assumed as happening after we've established at our outermost layer that we're serving HTTP. Larry, while you note this at the end, I'll be twitchy until someone can point me to the pluggable layer that wraps all this.

Some notes from symfony2 on that - we might consider different entry points (their word is "front controller"), but basically, alternatives to index.php, to facilitate really brute force different entryways. Nothing wrong with that for things like drush, but we'd have a sticky time revisiting clean URLs.

the routing cascade

re: "What should our primary routing criteria be?" - I've gone around on this question, too. In rereading the HTTP response codes, though, I think an answer implicitly lurks:

404: The server has not found anything matching the Request-URI...
406: The method specified in the Request-Line is not allowed for the resource identified by the Request-URI...
415: The server is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method.

The protocol itself orders these for us. Path, then Method, then Content-Type. The protocol obviously doesn't care about how we implement it, but it stands to reason that in the absence of a clear argument to the contrary, we should follow sequences laid out in the protocol spec we're implementing. Really, re-reading the response code types convince me of it even further - method & content type are decorations on the base resource identified by the URI. Making them primary would be insane - which is not NOT the same as saying that we couldn't use them early on to affect routing. Just that they're not the base on which we build.

life and times of menu_router

I don't know that menu_router staying canonical is a good idea. Even a possible idea. But in this plan, it seems muddy as to exactly where & how it gets (potentially) cut out. Seems to me our choices are either the front controller route (which is fraught with other problems for us, I think), or that we have a simple, primitive conf-backed routing system for primary routing, and richer routers (like the current menu system) move into secondary routing.

Another interesting possibility to a primitive-path system is that it could be used to select menu_router-backed (that is, they use the same base data tables & structure) secondary routers that are "just as complex as needed, but no more" for a particular path pattern. E.g., the secondary router for node/5 is gonna be a doozy with all the additional indirection it needs to consider for routing, but the router for custom/path/22 is more straightforward.

Lemme also just put a shout out there for NOT planting ALL of our vastly differentiated data at identical base paths. Now that we've got entities, let's learn to hate nodes, eh? The inversion havoc they wreak on a URI-centric routing system (need to load the whole data object before you know what to do with it) cannot be overstated. This has always been true, but in the Node Craze of '06-'11, our lust for CCK gave us tunnel vision.

An interesting side benefit there is that it allows for the possibility of defining secondary routers which do the same thing as the menu system, but can do it without all the indirection - maybe without even hitting the db. A great case for code generation, there.

crap on a cracker, auth is scary

Auth is a spot this gets complicated. We can surely retain the basic menu_router system we have now, but that's not canonical anymore. So that logic needs to be encapsulated behind an interface, as Larry says. Might actually be easier for menu_router - instead of holding a callback in the db table, it holds a plugin name that it can ask about what to do for a certain path. What gets to be a bit killer, though, is what the menu system is good at - building an access-sensitive menu that only renders links which pass access control. It gets all the information it needs to render a tree of links with a very small number of queries, because it knows it can handle all of them. We have a different situation now - potentially lots of routers that need to be built in order to get to their auth handler to figure out whether to render a link.

Then again, maybe I'm creating this problem by trying to push menu_router down. Dunno. But there be dragons in that muddiness...let's bite them before they bite us.

so much for terse. meh.

Agreed

Posted by catch on May 16, 2011 at 4:13am

Currently page caching in core only works because we have a hard coded routing workflow in bootstrap.inc.

People reading this likely know this already, but to reiterate for this discussion:

Request comes in.
Check it's GET plus a couple of variables (which can be set in settings.php to avoid both database lookups and loading the full variable cache).
If those variables are set, and there's no $_SESSION, go into a different code path which will do the absolute minimum to load the page from cache if at all possible.

It is very easy to cause regressions in page caching, by adding even the smallest additional steps in PHP execution to this workflow. However, we should be aiming to replace what is basically a big series of hacks in bootstrap.inc with the new response system if at all possible. If we get to that point and it doesn't measure up, we might want to keep full HTML page caching in more or less as it is now, but at least we'll have tried, and I bet other requests will be the better for it.

For me this means putting as much of the routing logic in plugin configuration as possible. That work probably needs to be done as part of the configuration initiative rather than baked in to the context work, although it would be great to avoid too many interdepencies.

So for example, if we were able to get this far:

Request comes in.
Request gets routed to whichever controller handles GET and text/html (this would be plugin configuration, stored on disk, the configuration system itself should be pluggable so it can make use of hidef/apc/chdb or similar for actual retrieval). Even $conf and the variables cache could hold that basic map of routing.
That controller can then contain current page caching logic - check if there's a session and load straight from cache. If not, pass on to to the actual menu router.

That would allow other response types to potentially finish without db initialization, without having to do things like the many patches I did in D7 that just swapped out database queries for cache_get() - this is not a long term plan.

It feels feasible to me to do these steps without initializing the database, currently #2 would be consulting $conf + the variables cache.

I'm not sure what you mean about menu links though - usually what's expensive is running the access callbacks, not finding out which callback to run. I guess we could just make it worse by adding time to find the callbacks too. Fully agreed that access/auth needs to be tackled up front though.

However, we should be aiming

Posted by Crell on May 16, 2011 at 5:10pm

However, we should be aiming to replace what is basically a big series of hacks in bootstrap.inc with the new response system if at all possible.

Exactly. Although I'm figuring that the page cache system will remain, albeit probably revised to be a "response cache". If we can deliver a JSON or AHAH response with, essentially, print cache_get(); exit(), so much the better. Cleaning it up would be a good thing, but ideally the performance hit of not getting a cache hit will be less. Actually, perhaps the things that currently happen on hook_boot() (that require bootstrap modules to fire in the first place) could be pushed to hook_exit() instead. Even if we have to trigger a DB init at that point, for stats tracking and such, the page is already printed to the user, so while we're using CPU cycles the response time is not affected. Random side thought...

Currently we're relying on the Config Initiative to come up with something more robust than $conf + variables for configuration, so accessing non-trivial config data becomes an fopen() rather than a DB call. Details there are still being worked out.

I am not sure if the caching should be part of the router or a separate component. I was hoping to keep it separate, but I can see how putting it into the router could make sense. I worry about having to duplicate functionality that way, though.

Hm. Unless "page cache" simply becomes a special case of "block cache", since for HTML pages we're hoping to turn everything into infinitely nested blocks anyway?

Page cache could still get up

Posted by pounard on May 19, 2011 at 9:01pm

Page cache could still get up this early as the actual page cache.

But using a response object would probably not answer the cache problem, it will answer many others but not this one. To answer the cache problem, you need to switch to a real PAC/HMVC model where each controller is cachable or not, and has its own caching policy (may be depending on context).

The controller would give itself the cached data and set it up into the response object.

Then the response object business as nothing to with contextes, strong decoupling here, easy and nice for maintainance.

Then the controllers become responsible for cache handling, and that's all good! Because as they run the business logic, they are highly coupled to context by design.

The response object would be responsible of response nature (JSON, HTML, other) but would also be the error controller (any error that happens would switch a boolean there, in the response object, which could then turn into a trace renderding when in devel mode (instead of the basic error page) or into the maintenance error page display (for end user) etc, etc..).

Pierre.

Right I think it is feasible

Posted by catch on May 20, 2011 at 2:23pm

Right I think it is feasible (but not necessarily possible) to have equivalent performance to current page caching, but allow any response controller/object to implement the same thing.

i'd forgotten this discussion was here, but wrote up a bit more at http://groups.drupal.org/node/150149#comment-499354 - rather than paste it here, I'll just link it.

Oh and - while I don't

Posted by sdboyer on May 11, 2011 at 9:40pm

Oh and - while I don't realistically expect this will be feasible for a couple years at least, IMO the case to look at for something more webmachine, erlang-like would be mongrel2 + 0mq. would require a drupal version of the photon application server, of course, but still, not impossible at all - conceptually quite similar to the evented drupal work david strauss demoed at cph, except passing through a server actually designed for that sort of thing that you don't have to swim up backwards against.

such a (long-running) drupal application server would be equally useful for handling http requests as it would any other kind of request, too. over a nodejs socket, as a queue worker...but blah blah, i digress.

hook_menu_get_item_alter()

Posted by donquixote on July 9, 2011 at 5:51pm

First of all, what I read sounds reasonable.

primary routing should be pluggable

+1.
It'd also be nice if the thing those routing solutions are reusable, so that a module could implement some path-based mapping that has nothing to do with urls and web requests.
(yeah, this is rather an implementation detail and can be discussed later)

If only one response controller is found, we're done.

Some time ago we added hook_menu_get_item_alter() to D7, that is called with menu_get_item().
To me this looks like a "secondary router", that fires no matter what the first step returns.
Will we keep this hook, or would we replace it with something else? Do we consider it to belong to "primary" or "secondary" ?

I'm getting a little put off

Posted by pwolanin on July 9, 2011 at 7:27pm

I'm getting a little put off by the push to make everything generic and pluggable. I think we are going to hit performance problems and potentially going to face problems where lots of modules break if you swap out the standard back-end for a different one.

At the top you state that everyone agreed that the primary routing should be pluggable. Why should the primary routing be pluggable? Maybe I'm missing some important use case, but I don't see any explained.

This trend also makes code damn hard to read and follow when there is always an extra level of indirection (or 2 or 3).

Three reasons

Posted by Crell on July 10, 2011 at 4:58am

In this case, there's 3 general reasons/use cases I see for pluggable routing:

1) Faster backends. I think it's a safe bet that core will ship with an SQL-based routing mechanism, because that's the only backend we can be guaranteed is available. However, menu.inc is already swappable in D7. We may find that it's faster to implement in MongoDB, Redis, or something that hasn't been invented yet. We want a site that needs even faster routing to be able to cleanly implement a faster routing backend. That also serves as a good place to allow contrib-based experimentation of faster routing algorithms.

2) Dedicated applications. Right now there's a lot of variables caught up in routing. That makes things like the installer or updater scripts... well, ugly would be being very polite. If instead, those could be an alternate router implementation with hard-coded information, that could help simplify those systems and, potentially, allow for interesting other experimentation.

3) General good architecture. The routing mechanism should be a black box behind an API to anything outside of it, if at all possible. That makes it easier to develop, easier to unit test, and easier to debug. At that point, pluggability is more a side effect than a goal but they overlap nicely.

I agree that excessive abstraction can make code difficult to follow. But, we've already gone down that path in other parts of Drupal (FAPI, Render API, etc.) far far more than anything we're proposing here, I think.

This is a little bit

Posted by joachim on August 14, 2011 at 7:50am

This is a little bit tangential to the topic, but I think it's relevant.

I remember from Crell's session on Butler at DrupalCon CPH that one of the scenarios that Butler makes possible is serving up just block content, which would allow portions of pages to be updated with AJAX.

Now current implementations of this sort of thing work on a custom dedicated path, say '/block/render/MODULE/DELTA'.

But to go back to our canonical example block that's showing information about the current issue node's project. In order to work, that needs to know both the issue node content and the parent project node context. But if the block is rendered by requesting '/block/render/MODULE/DELTA', those are missing.

So it seems to me we'd need to request the original path, say, 'node/12345', and add on a query string for the element we want to render, so '/node/12345&render='block/MODULE/DELTA'.

Should we be thinking of this kind of usage in secondary routing?

Good question

Posted by Crell on August 14, 2011 at 4:38pm

I had been thinking of doing it the other way around, actually. block/MODULE/DELTA?nid=123&uid=12&lang=en

That would show data about node 123, which is a project node, but happens to be an AHAH call from a page that is showing node 500, which is an issue node.

Of course, that may not work at all. Not sure. That probably warrants more discussion.

But what if there's another

Posted by joachim on August 14, 2011 at 8:22pm

But what if there's another block shown on node 123, and the two blocks share information in some funny way that we can't yet imagine?

I think also block/MODULE/DELTA?nid=123&uid=12&lang=en means that we have two base paths to the same thing -- or rather, to the page and to a fragment of the page. On the other hand, perhaps block/MODULE/DELTA?as_seen_on=node/123 would work.

I think what I'm trying to

Posted by joachim on August 15, 2011 at 8:53pm

I think what I'm trying to say is that a path like block/MODULE/DELTA?nid=123&uid=12&lang=en is working the way the system thinks, because it's saying 'hey, render me a block with these context values'. That's great, because that's what the block rendering callback needs to work anyway.

But saying 'node/123?block=MODULE/DELTA' is working how people think (I think!), because it's saying 'I'm looking at node/123, but I only want a piece of the whole page that's visible at that path'. It does mean that to turn that request into a rendered block needs a bit more work -- but it's work the context system will be doing anyway to show the whole page.

And working how people think is usually the better option. Assuming, of course, that the way I think is the way people think ;)

How I see it

Posted by mikeytown2 on August 15, 2011 at 9:14pm

For getting a specific block, block/MODULE/DELTA?nid=123&uid=12&lang=en is correct. If you want to see what blocks are going to be rendered on a page then the node/123 path where I would look (something like node/123?blocklist=json.

What if you want the block and you really don't care about the node context; your example path doesn't work.

Yes

Posted by xtfer on August 15, 2011 at 9:22pm

EDIT: I basically agree with mikeytown2... looks like we were writing at the same time.

If I want to load block A, I'd like to be able to load it without having to know my context. While knowing the context may be useful, its not always required.

block/module/A

If I want to add additional context to that request, then I can add that through the query string (as described in the posts above).

If we use the other method, context-path?object=object-path, it mixes up context and the requested object in the query string, for example node/123?block=module/A&lang=en, which is confusing. How does Drupal know which part of that query string is context and which is the actual request?

Immutable

Posted by Crell on August 15, 2011 at 11:32pm

But what if there's another block shown on node 123, and the two blocks share information in some funny way that we can't yet imagine?

That's why the context object is supposed to be immutable. Blocks should not be able to "share information in some funny way". If they do that, then we cannot do block-rendering-in-isolation, which means no ESI, no partial page caching, etc.

Ok that was a bad reason

Posted by joachim on August 16, 2011 at 10:20am

Ok that was a bad reason :/

But the context system already knows how to turn 'node/123' into the right pieces of context the block needs. Why make the programmer have to repeat the same work to feed it the context pieces one by one?

There is actually a good reason for blocks to share info

Posted by febbraro on September 8, 2011 at 2:42pm

One case we have again and again is that The site owner has a curated list of nodes in a rotator or a list in one block and in another block they want to have a listing of all other content (assume the page is a filtered list based on a taxonomy term) that does not already appear in the curated list.

To accomplish this currently (we use Boxes) we have our auto listing box grab it's Context, find the other box and get it;s configuration to exclude from the list.

So in general it seems that as long as block in a page could somehow traverse to other components on the page via your context, that would support blocks sharing info in, truthfully, not so funny ways. The real concept is that the individually renderable units do need to know the "context" in which they are rendering, so blocks are not always intended to be isolated

No its a different way of

Posted by neclimdul on September 8, 2011 at 3:10pm

No its a different way of thinking. Your blocks being based strictly on context means they they aren't talking to some intermediary system. IE block 1 isn't going "foo module, is block 2 rendering like Y so I should render like X?" Instead Block 1 say "oh look, node 123, I'm going to render like X" and Block 2 says "Oh look, node 123, I'm going to render like Y" and you get the same result. In this way, they are isolated giving you consistent logic and testable output.

Maybe I'm missing the point

Posted by febbraro on September 8, 2011 at 5:56pm

Maybe I'm missing the point? To keep it on this very real example, is the point that the context holds all info that block 1 and block 2 might need that way they can do their work in isolation (knowing only about the context) and not each other? If so where does that configuration come from? Seem like block1 might need to delegate it's configuration up to the context so that other can react on it, or that config does not come from block 1 at all.

So to be more concrete - lets

Posted by Owen Barton on September 8, 2011 at 6:22pm

So to be more concrete - lets say that block 1 is a view of a "nodequeue X", and block 2 is a "view of nodes tagged with "bunny" that are not in nodequeue X" (so that you don't get the same node listed twice).

Let's say they want to totally rejig the nodequeue all at once, and have set it up as nodequeue Y. Currently, to do that you would go into block 1 and point the view at nodequeue Y (instead of X). To avoid having to do the same for block 2, you might add some glue so that it magically excludes whatever nodequeue block 1 is pointed at.

I think this use case should still to doable without blocks needing to check each others configuration, as long as there is a way to specify a "manual" context (i.e. one you set via a GUI, not sourced from the request data) at the pages/path level. This is analogous to context in panels, that can be passed into panes. Then you rework block 1 and block 2, so that they both react to this named context and filter (or exclude) that nodequeue as appropriate.

The problem is then

Posted by joachim on September 8, 2011 at 6:34pm

The problem is then inefficiency, because both blocks have to go and find out which nodes are in nodequeue X.

The way I see this could work neatly is if block 1 adds to the context an array of the node ids in the queue. This could be a general, extensible concept, "nodes listed on this page". Other blocks could participate, such as, say one that shows 'other nodes by this author'.

Block 2 would then get this list of nodes ids from context.

The only problem is -- contexts are locked. Which is something I think needs rethinking: perhaps only certain handlers should be locked?

For short nodequeues or

Posted by Owen Barton on September 8, 2011 at 7:20pm

For short nodequeues or similar data having the result built in the context and passed into the blocks/views/etc is a possibility, but I don't think that efficiency is really a good reason (this would be terrible for long lists, given that views would have to do "IN (2,4,6,8...n)" in the query) than just adding a join to each view query.

Echoing Owen

Posted by neclimdul on September 8, 2011 at 8:29pm

I'm going to echo what Owen said basically. That doesn't seem like the best solution. More likely you'd want to set the nodequeue in the context during load based on the available context, then manage that in your view with I guess a custom handler or something(I'm not sure it can take nodequeues as args). That or design it with some sane sub-queue implementation that would trigger off something in the context.

This is basically doing the same thing you've suggesting but we're front-loading the nodequeue decision so its consistently available through the entire rendering process even if the blocks get rendered in different orders or a block goes away or what ever.

Furthermore "nodes listed on this page" doesn't really tell us how they where listed. If something shows up in an ad or something in another block, gets added to this list, suddenly its excluded from block 2 in an unintended manner.

So basically what we've done here is back and forth around a design decision showing that while there is another way to do it, the way we've chosen addresses it in a more predictable though sometimes less straightforward way. Which is fine cause we can keep hacking forward with it the way it is. :)

Client-driven interfaces

Posted by batsonjay on October 17, 2011 at 6:39pm

So if you look around the web, the forefront of the web industry is moving to using something like backbone.js or other front-end (Javascript or other) frameworks, separating page structure from actual page content, having the server return content & let the client build the experience, assuming the browser has storage, persisting data over sustained socket connections to back-ends, etc. To quote Jacob Singh, that's a jargon mouthful.

But the logic is to put everything into the browser, grab data as and when it's required, optimize the server to do that rather than lots of what it does today, and (preferably) do it asynchronously & non-blocking. Which, of course, Drupal doesn't do today.

However, Joachim's comment starts to head us in that direction. Whether this strategy is good actually probably deserves a thread on its own. I'll keep this comment short and start another one.

Is there a summary of the

Posted by pwolanin on August 24, 2011 at 1:38pm

Is there a summary of the current state of decisions on this?

I am also curious on the

Posted by mfairchild365 on November 2, 2011 at 5:05pm

I am also curious on the current state of this.

This is a router that I have worked on. It serves me pretty well for my projects: http://github.com/unl/RegExpRouter

Primary routing

Secondary routing

Other considerations

Comments

sql-backed default backend? yikes

the routing cascade

life and times of menu_router

crap on a cracker, auth is scary

Group organizers

New groups

Group notifications

Hot content this week