WSCCI Routing in Denver

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Crell's picture

Update: Implementation has started.

Wow, that was an awesome DrupalCon! Kudos to the Denver team for a job well done.

As far as WSCCI is concerned, the most notable events actually happened just before and just after the conference. On Monday, several people met at the offices of Examiner.com, including Symfony lead Fabien Potencier. The topic on the table was the new routing system, which previously had not been sorted out. It was actually a very productive sprint, and we came to a conclusion far more easily than I expected. :-) The conversation also continued at the Friday Sprint, mostly with Lukas Smith, another leading Symfony developer and one of the people working on Symfony CMF.

Rather than go through all of the details of the discussion, I am going to go straight to the planned approach and its background. I think that will be easier to follow than a blow-by-blow. If you want the stream-of-consciousness notes (transcribed primarily by the amazing webchick) are also available but be warned that they lack context in many many places.

Background

First, some terminology. A "route" is a single entry for a how to handle a given request. Primarily, it defines the rules that allow the system to map the request to a controller. A "controller" is any PHP callable (function, object, name of a function, closure, etc.) that returns a Response object. "Routing" is the process of taking an incoming request and finding the route that is most appropriate.

Symfony2's Routing component consists of 3 main pieces: A Matcher, a Dumper, and a Generator. The Matcher is responsible for taking a path or request object and extracting route information from it; route information is, in its simplest sense, metadata about the request that is useful for figuring out what controller will be responsible for handling the request. Symfony provides a default Matcher, UrlMatcher, that works by matching a path against a RouteCollection, essentialy a glorified array of Route objects. While nice and simple, it requires all routes to be created on the fly and added to the collection, then matched against. That clearly will not scale to the hundreds of routes that even a simple Drupal site contains.

Instead, Symfony includes a Dumper. A Dumper is a tool that takes a collection of routes and "dumps" it into some serialized form. Out of the box, Symfony includes two dumpers: One that writes routes to Apache mod_rewrite rules (super fast, but not all that scalable or easy to modify) and one that writes to a generated PHP class (fairly fast, but runs into memory limits). The PHP class is the preferred mechanism in many Symfony projects, but is a problem for Drupal since writing executable PHP from Apache runs into security concerns. (Telling people they have to run a Drush script every time they edit a View is a non-starter.)

Finally, there is the Generator. Routes in Symfony are keyed not by path but by a machine name. That is important since a single path could map to multiple routes if one takes into account the HTTP method (GET, POST, PUT, etc.) acceptabe mime type (text/html, application/json, etc.), restrictions on the data type of a parameter (int, string), and so on. (And we absolutely want to do that.) The Generator lets client code ask for a link to a particular route by machine name, not path, and get back the appropriate URL at runtime. Essentially it serves the same purpose as Drupal's url() function, but it's far more flexible.

The importance of that machine name split in the Generator should not be understated. Currently, Drupal has paths hard coded throughout the entire code base, in every single call to l() or url(). That makes building custom administration interfaces and workflows extremely difficult. By linking instead to a route by machine name, we can make all paths within Drupal configurable without breaking anything. That's huge for custom admin interfaces, especially in distributions.

Drupal plans

Drupal's current router is the menu system. Routing is tightly coupled with link, navigation, access control, and various other things that have no business being tightly coupled. It also keys by path, which as noted above is not flexible enough.

Also as stated previously, the default implementations provided by Symfony won't scale to the level that Drupal requires. That is not a problem, however. What is important is the design of the above systems, and in particular the interfaces defined for them. Those interfaces allow for the creation of custom implementations that transparently work with existing code.

We will therefore develop our own Drupal-specific implementations of the UrlMatcherInterface, MatcherDumperInterface, UrlGeneratorInterface, and GeneratorDumperInterface. (Actually I think we may not even need the GeneratorDumperInterface in practice.) The MatcherDumper will generate an index of routes in SQL. The result will likely look an awful lot like the menu_router table in Drupal 7, with a few tweaks (below). The UrlMatcher will match a request against that data, using an algorithm very similar to the existing best-fit logic in menu.inc. The Generator will most likely utilize the same router table plus the path alias table to generate links. (This is, incidentally, very similar to how Symfony CMF is handling its routing.)

Of course, that still leaves the question of how to feed route data into the Dumper. We came up with a multi-step approach.

In the near term, we will introduce a new hook, tentatively hook_route_info(), which rather than returning an array of menu item arrays will return an array of Route objects. Along with each Route object will be an array of access rules that must pass for a given user to have access to the route. For now that will simply be a port of the access callback/access arguments logic from hook_menu. Later on, they will be replaced by a proper access plugin system, once core has a proper conditional plugin system. (Think ctools access plugins, only more generalized and flexible.)

The resulting hook will look something vaguely like this (not final syntax!):

<?php
function example_route_info() {
$routes['node_page'] = array(
  
'route' => new Route('node/{node}', ...),
  
'access' => array(
    
'callback' => array('function' => 'node_access'),
   ),
);

return
$routes;
}
?>

All of the other pieces of hook_menu (navigation links, local tasks aka tabs, local actions, etc.) will be left in hook_menu. That still leaves plenty of work to do to make sense of the other 4 systems that are tightly coupled in hook_menu, but that will be left for work outside of WSCCI. (Volunteers welcome.)

For performance, rather than building up a massive list of all module's declared routes and passing them through an alter hook (which is totally not good for memory usage), we will instead process each module's routes in turn. Pseudocode:

<?php
foreach (module_list() as $module) {
$routes = module_invoke($module, 'route_info');
drupal_alter('route_info', $routes);
// Delete all routes in the router table owned by $module.
// Add $routes to the router table, keeping track of which module originally declared them.
}
?>

It is possible there will be race conditions to worry about, but we will deal with that in implementation.

In the longer term, we also discussed dropping the hook entirely and using the new Configuration system instead. That has a number of advantages. For one, we never have to store route data in PHP. We can process each route individually, with no memory pileup. It also treats routes as configuration, which they arguably are. Alteration of routes changes from an alter hook to a simple CMI CRUD operation. That in turn makes it easier for a distribution to completely rip out existing paths and build any interface they want on top of Drupal.

However, there are also a number of challenges with that approach, primarily that the Configuration system is still very new and changing. We therefore decided to punt on that question, and intend to revisit it in a few months when CMI is more stable. At that time we may rip out hook_route_info in favor of config files. Stay tuned.

For the time being, we will continue to support hook_menu-based routes while we transition over to hook_route_info, much as we had a temporary backward-compatibility layer in the Drupal 7 database while queries were converted. That support will be removed ASAP but will allow us to crowd-source the process of converting existing page callbacks over, likely as part of the work of the Layout Initiative

This will necessitate a few other changes to the routing logic. Most notably:

  • The route matching logic will be expanded to also match against HTTP Method, accept header, etc. in addition to path.
  • We will drop the menu tail logic. Currently, if you go to node/5/haha-fooled-you, you will transparently get the page callback for node/5, with haha-fooled-you passed through as an extra silent parameter. That's bad for a number of reasons, and is mostly just an artifact of our current routing system. Symfony doesn't make it easy to do that, so we will drop it and be happier. That does mean that if you want a tail argument on a route path it must be specified explicitly. We decided that was a feature.

One final important change is that Symfony uses the IETF-Draft URI Template format for specifying path patterns. That is, node/{node} rather than node/%node. Symfony uses that format to do its reflection-based mapping of path arguments to controller parameters, and since we want to leverage that without having to do any actual work we will adopt that format for routes as well.

Advanced functionality

But wait, there's more! :-)

At the Friday sprint, Lukas Smith walked us through the Symfony CMF Chain Router. In layman's terms, it is essentially a glorified array of routers that get checked in series until one of them finds a route. At the moment it is tightly integrated into a Symfony CMF bundle, but Lukas plans to split it off into a stand-alone library.

The advantage there is that it becomes possible to dump frequently-used paths, or frequently-used aliases, to a fast route store such as mod_rewrite rules and then fall back to the database-backed router for everything else. That allows for a lot of site-specific optimization. We may not ship that with core, but we should make sure it's possible to do. It even includes a chain generator.

We also briefly discussed that Symfony only implements a portion of the IETF URI Template draft. It would be helpful to both Symfony and all Symfony-using projects (such as Drupal) to support the full spec, including default values. There's already a Symfony issue open to work on it. Any Drupalers who want to lend a hand to make Drupal's URI support more powerful, please jump in!

Finally, and most significantly, there is the question of accept headers. HTTP defines an Accept header for a Request, which specifies the mime types that the user agent (browser) will accept. The server can also specify the mime types that it is willing to deliver. However, the algorithm for determining how those two lists map together to the final format is not defined, and is made more complicated by vendor-specific mime types, versioned mime types to support different versions of an API, etc.

The most widely used algorithm for that matching logic is Apache's mod_negotiation. Lukas has been trying to get people together to implement that same logic in PHP, which, aside from providing a standard logic that we can use to support multiple mime types at the same URL (which we want to do) means that it's possible to include that information in a route export to an Apache .htaccess file as above for super-fast routing. (See also this pull request for Symfony's REST bundle to handle versioning.)

That's very enticing. The alternative would be to come up with our own, simplistic, Drupal-specific logic for matching on accept headers. (Symfony already has its own simplistic logic.) That would work, but it would be much better if we could leverage an existing standard. That could take the form of a merge to Symfony, or perhaps even a completely stand-alone library that Symfony and Drupal could both leverage. (I prefer the latter, myself.) If someone wants to really get their hands dirty with HTTP, this would be a great place to do it.

Action items

Our next steps, then, are more or less as follows:

  1. Finish the Kernel patch, with a bare-bones matcher for now that just wraps our current menu_router system. Fabien helped me refactor that patch a great deal at the sprint and I'll be posting an updated version as soon as I can. This should include a simplistic format negotiation routine. I will likely need help as well porting the Drupal 7 Ajax system over, since that will be a custom Response object. Volunteers who understand that system are welcome.
  2. Implement hook_route_info() as described above, and the related Dumper. While it won't actually be used yet I think someone can get started on this now, assuming proper unit test approaches are taken.
  3. Implement a non-hook extension mechanism for the View and Exception listeners. I believe this can and should be a separate patch from the initial kernel patch to keep the patch size down.
  4. Implement a new Matcher that leverages the hook_route_info-build table.
  5. Split the Chain Router Bundle off into a stand-alone library that we can pull in. Lukas is already working on this, and says it should be doable in the next week or two.
  6. Implement a PHP format negotiation library, either as part of Symfony or stand-alone.
  7. Use that format negotiation library in Drupal so that we can put multiple mime types at the same path with a more robust algorithm.
  8. Convert existing routes over to the new router system, probably piecemeal and crowd-sourced. This may overlap with or wait for the work in the Layout initiative to unify blocks and page callbacks.

OMG, do we actually have a roadmap? :-)

Comments

Format negotiation questions

effulgentsia's picture

Thanks for the write-up: very helpful for all of us who weren't there at the Monday and Friday sprints.

Please see pwolanin's comment on http://groups.drupal.org/node/218914#comment-721594 and the 5 or so comments after that relating to format negotiation. Seems like we should implement REST best practices here, which might be to redirect generic URLs to more specific URLs rather than allowing generic URLs to directly serve "sufficiently different" resources: though more clarification is needed on what makes a resource "sufficiently different".

Misunderstanding

Crell's picture

I don't think anyone is talking about putting different resources at the same URI. node/5 should always get you node with nid 5, period, always. However, that may be represented in HTML as a page, in JSON as JSON-LD or some such, in Atom, or whatever. It's still the same conceptual resource; it's not "sufficiently different" at all; it's the same thing, just presented in a different serialization format.

Link to separate thread on this

effulgentsia's picture

I wrote up http://groups.drupal.org/node/220519 based on my understanding of where we're currently at with this. Thanks for helping clarify in today's IRC meeting.

Subscribing. (We should

Xano's picture

Subscribing.

(We should really get a "Follow" button here too)

Links

Crell's picture

There's a bunch of unreadable links at the bottom of the post itself. Use those. :-)

Nice writeup! Looks like a

moshe weitzman's picture

Nice writeup! Looks like a solid roadmap.

It is telling that there is no http negotiation library for PHP. Thats because many folks do pseudo REST with single endpoints like Flickr API. I know it is heresy, but we should consider that approach if no progress happens on the library. I would much rather have single endpoint web services in D8 than none at all.

I'd love to see us use route requirements as soon as possible. This is where you require that menu params have right type or follow an expected regex. See 'requirements' at http://symfony.com/doc/current/book/routing.html. enhances Drupal's security significantly. Perhaps you could add this to the Roadmap.

It would be really helpful if someone could research what other systems like Symfony CMF do for menu links. Can admins construct custom menus and navigation in the UI? Our menu links system is a resource hog and not particularly approachable codewise. If anyone has ideas here, please post a new discussion and link from here.

Plan B

Crell's picture

Worst case scenario if we don't have any content negotiation logic, we can still do things like node/1/json, node/1/atom, etc. rather than using a single endpoint. We can do that now, and do, and we won't be losing that capability either way.

Routing requirements I think would be part of the hook_route_info patch. The Route class already supports a number of restrictions.

I completely agree about looking into Symfony CMF and others to see how they handle navigation menus. Any volunteers to do some field research? :-)

Pass as parameters?

adub's picture

Can we take a leaf out of Google GData (e.g. https://developers.google.com/youtube/2.0/developers_guide_protocol_api_...) and approach this as /node?id=5&alt=json, or views-type lists such as /node?q=blue&order-by=published&max-results=7. That's designed as an extensible model (so you could easily add parameters such as taxonomy). The use of parameters rather than slash delimiters also prevents ambiguity, duplicates and associated SEO issues. These could be canonical URIs used internally (path aliases just being shortcuts)

Single-endpoint

Crell's picture

That's the single-endpoint approach that Moshe was talking about. In that form, /node becomes the resource, and everything else is, essentially, a function parameter to it. I would really rather not go that route, as it's not very RESTful, not really that self-evident, and no easier to implement than the tools we have available. If anything I think that structure would be a step backward.

Well it's a single endpoint

adub's picture

Well it's a single endpoint to node but I thought Moshe was referring to a single endpoint to the whole API which I agree wouldn't be RESTful. In the YouTube example, there are endpoints for videos, channels, playlists etc., and we might expect them for resources such as pages, nodes, comments, users, search for example. I would expect parameters to hold context/filtering (also allowing the same endpoint to return collections as well as single items). Or would you see each node having its own endpoint?

The use of parameters rather

g1smd's picture

The use of parameters rather than slash delimiters also prevents ambiguity, duplicates and associated SEO issues.

Parameters increase the number of duplicates.

For starters:
/node?q=blue&order-by=published&max-results=7
/node?q=blue&max-results=7&order-by=published
/node?order-by=published&q=blue&max-results=7
/node?order-by=published&max-results=7&q=blue
/node?max-results=7&q=blue&order-by=published
/node?max-results=7&order-by=published&q=blue
are all dupliactes, as is
/node?max-results=7&order-by=published&q=blue&randomjunk=whatever

Whereas,
/node/blue/by-published/7 in a pre-defined order is canonical, and other orders or URL with appended junk, should return 404.

This is where mod_rewrite rules with strict RegEx patterns win out every time.

I was once working on the

Sylvain Lecoy's picture

I was once working on the Jazz Platform and more particularly on the Rational Team Concert server.

All the resources, given a UUID but also a bunch of services was served either as a full Page with HTTP, either as a JSON representation of a resource. When the client wanted a resource to be returned by the server in JSON, it added an 'Accept' header and that's it.

I don't fully remember how the content negotiation was implemented, but I can find it back if needed. Also it is Java so this is likely to be set in the configuration.

Our discussion last Monday

pwolanin's picture

Our discussion last Monday was that we should not try to support regex at this point or partial menu parts (e.g. /article-{node}/) because of severe uncertainly about how to make those scale and perform compared to the current menu router.

However, in the room there was broad agreement that we should no longer accept arbitrary trailing path parts and instead let those 404.

Different regex

Crell's picture

Yes, we don't want to support article-node. I think Moshe was referring more to the extra rules that Route objects can support, such as "parameter 2 must be an int", which are implemented via a regex. That's a separate matter, and I think is reasonable for us to try and support even if we do so only in a second in-memory pass.

Just want to declare my

katbailey's picture

Just want to declare my willingness to help out with this initiative in any way I can. I'm fairly familiar with the D7 ajax system so that would probably be a good fit. Is this blocked by the kernel patch for the time being?

Good news - ChainRouter is

moshe weitzman's picture

Good news - ChainRouter is now its own library. Maybe core should have a FastRouter and only put the homepage in there by default. The FastRouter doesn't have to actually do anything differently, it just shows the enlightened way.

Some cost

Crell's picture

Possibly, although I don't think that the front page is necessarily the best thing to single out as the "super fast default". Also, every failed hit in one of the chained routers throws an exception, and exceptions are not cheap. So if you do chain multiple routes in production you need to make sure it's worth it.

That said, being able to push high-traffic links into a smaller, faster matcher is exactly the point of ChainRouter. I am not sure if we keep ChainRouter around after we complete the migration from the old matcher to the new one, but I'm open to it if we can leverage it. If not, the by design a particular high-traffic site should be able to swap it in on its own if desired.

FYI, Implementation has

moshe weitzman's picture

FYI, Implementation has started. Crell has committed some initial work.

Module developer perspective

donquixote's picture

Hi,
I appreciate the work being done here.
I am now wondering what this means for module development.

The latest concrete proposal mentioned here is the hook_route_info() and hook_route_info_alter().
And then the suggested move to configuration + CRUD.

I imagine, while the internal implementation will be based mostly on symfony components, we are quite free to design the API for modules. E.g. we could keep the old hook_menu(), and just stuff the result into the symfony routing machine. But we could also do something completely different.

Some poor implementation and coupling set aside, the benefits with hook_menu() were:
- You can delete the entire menu_router table, and it will rebuild in the exact same state it was before.
- You don't need a hook_update() to change a route in your module.
- It was very easy to register an admin url, and get a link in admin_menu, breadcrumbs and tabs "for free". This kills some flexibility, but who cares, for those administration pages? There were some technical problems with this, but the idea did have its benefits.

hook_route_info() looks very much like hook_menu() with the metadata stripped out. This means, modules need a separate place to register metadata for breadcrumbs, tabs and menu items.
What we do not want is to let the user manually define the admin menu and all tabs and breadcrumbs.

Method chaining instead of config array?
It could be preferable to pass an $api object as an argument to hook_router_info(), providing methods for route registration, instead of letting it return a nested array. This is a matter of taste, though, and I don't have a strong opinion about it yet.

"new Route(..)" ?
There is this explicit "new Route(..)" in the hook_router_info() implementation. I wonder if this is really a good idea, or if we should rather let Drupal instantiate the Route, based on configuration returned by the module. If we move to a configuration-based system, we have to do this anyway. Imo, the "Route" class is an implementation detail that the module should not care about.

CRUD?
I wonder what kind of CRUD this is going to be.
If it is just an OOP wrapper around the alter hook: Great.
So the process of a "rebuild" would be:
- Assemble static configuration and/or hook_route_info()
- Wrap the result into an object with alteration methods, and expose that object to hook_route_info_alter() or something equivalent.
- Save the information to the router table, where it can be used by the symfony-based routing system.

If, however, CRUD means that for every route change in a module we need to add a hook_update_N() to actively write that change to the router table, then ouch. Sorry, this is how I understood the "CRUD" idea when chx first mentioned it.

Config files:
We should allow modules to define routes based on dynamic state such as custom entity types, user-defined paths, views etc. Just reading a module-provided config file will not do it. We need to have something dynamic such as hook_route_info() or equivalent. But, a combination of static config + hook_route_info() + hook_route_info_alter() or equivalent would be ok.

OOP hooks?
There is a discussion about replacing the Drupal hook system with something OOP.
This does not change my comments above, most of this would still apply.

Web Services and Context Core Initiative

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week