Unified redirection framework for Drupal 7

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Dave Reid's picture

We've got a lot of overlap and duplication of efforts between modules like Path redirect, Global redirect, Secure pages, etc which could potentially really benefit from a unified redirection API and UI. Another advantage is if we can make this redirection API/UI simple, it could be a good candidate to be considered for Drupal 8 Core/CMS.

So...what APIs or features do we need to provide?

Here's my thoughts:
- The one "master" implementation of hook_init() or hook_boot() to catch paths for redirections so not every module has to implement them.
- A master redirect_goto() function as a wrapper around drupal_goto() for things like ensuring the query string is preserved through redirection. If modules need to alter the actual redirect, they can use hook_drupal_goto_alter().
- A {redirect} table for specific path to path redirections.
- An hook for specific path to path redirect types (Feedburner would register its own redirection type so it could check user agent strings prior to redirection).
- A unified admin/config/search/redirect UI page/path where everything else can stuff its options.
- Redirection loop detection and prevention (working implementation using core's flood system)

Comments

An all-inclusive patch for D6

donquixote's picture

An all-inclusive patch for D6 globalredirect (heavy refactoring + mixed language redirects + hook for other modules) can be found here:
http://drupal.org/node/803830#comment-2991412
("A bit of refactoring / code readability")

The one "master" implementation of hook_init() or hook_boot() to catch paths for redirections so not every module has to implement them.

In this case it's hook_init, simply becaused it's based on globalredirect.

A master redirect_goto() function as a wrapper around drupal_goto() for things like ensuring the query string is preserved through redirection. If modules need to alter the actual redirect, they can use hook_drupal_goto_alter().

I called this thing _globalredirect_goto(). It does not use drupal_goto, because that would make it depend on the global $language variable, and make it impossible (or dirty) to change the destination language.
The query string is preserved, but for the sake of backwards compatibility it can not be altered.

A {redirect} table for specific path to path redirections.

This could be done in a new implementation of hook_globalredirect_destination_alter().

An hook for specific path to path redirect types (Feedburner would register its own redirection type so it could check user agent strings prior to redirection).

That's now hook_globalredirect_destination_alter()

A unified admin/config/search/redirect UI page/path where everything else can stuff its options.

For now that would need to happen with hook_form_alter() on the globalredirect settings page.

Redirection loop detection and prevention (working implementation using core's flood system)

The patched globalredirect module will simulate repeated hook_globalredirect_destination_alter() calls, until all modules leave the destination unchanged. There is a counter limit, and a detection of repeated destinations.

Open things to discuss

donquixote's picture

What we still need to look at is if any of the other redirect modules do special things that are not supported in hook_globalredirect_destination_alter(). Such as, playing with headers. We could give the hook more parameters (or an $options array), but we need to be careful what is considered same or different destinations. And maybe some of the modules need to change the way we check if a redirect is necessary or not.

Thinking about scalability

Andy Inman's picture

Hi folks ... well going back to my design proposal in http://drupal.org/node/775748#comment-2866748 - here's a summary of what functionality I suggested there in addition to providing actual redirection:

  • Caching (performance, scalability)
  • Prioritising of redirection rules (concept of "weight" in case of conflicts)
  • Loop detection/handling (logging, configurable bail-out page)
  • url and link generation (hook into l() and url() to fix the problem at source.)
  • Logging/reporting services (because frequent redirects may indicate a misconfiguration or problem elsewhere.)

So, for now I'm going to tackle the first item on that list...

Scalability

Assuming a typical situation where the vast majority of requests do not end up redirected, it follows that we should optimise performance for those cases, i.e. as quickly as possible decide that a request should not be redirected, and exit. We certainly want to avoid running lots of PHP and several db queries on every request to a popular page only to confirm that, yes, that's the correct url, no need to redirect!

So, my thoughts on how to achieve this:

  • Modules (plugins) must report to the master whether or not their result for any given url is cacheable (will rarely or never change.)
  • If all modules report cacheable results, then the master can cache the final result
  • We distinguish between results that will rarely change and those that will never change, and cache them separately. This allows one cache to be cleared when needed, without affecting the other.

So, in the case of final-destination==initial-request we need to maximise performance. We have a path (plus query string maybe) that we know should not be redirected. So:

  • Statically cache the most-frequently requested urls, e.g. load them into an array at start, and then just need to check if the current request is in the array.

  • Cache the rest, maybe, elsewhere (e.g. use Drapal core caching, or store in our own table). I say maybe cache, because whatever method is used needs to be more efficient than just running through the full process of calculating the final target. That in turn depends on whether the redirection modules need to do db queries etc in order to generate their result.

Provide a mechanism for purging cached values.

Taking my own module, MultiLink, as an example - it might redirect de/page to en/page, but when en/page is later translated to German, the redirection will no longer be needed - the result could be cached. Now, if for some reason the German translation is later deleted, redirection is required again. MultiLink internally traps node additions/updates of translations and so knows to clear its own cache, but in this scenario the master redirection cache would need clearing too. Ideally there would be a way to clear only affected cache entries rather than invalidate the entire cache, but the latter is of course much easier from a design perspective and probably the only viable option.

Just a reminder, I'm talking about caching a list of urls which will not be redirected - i.e. the majority of them. By implication, these will probably be the most frequently requested pages. So, rebuilding the cache after clearing could have performance impact. I think rather than just let it be rebuilt ad hoc (where many db writes could happen), the smart thing would be to rebuild the most-frequently accessed urls - i.e. run through them, calling all redirect modules, and caching the result. From the above, we would already know the most-frequently requested urls, so this would no be too difficult.

Ok, that's me done for now - input and better ideas welcome!



Currently part of the team at https://lastcallmedia.com in a senior Drupal specialist role.

I agree with your ideas,

donquixote's picture

I agree with your ideas, but..
When I started working on the patch, I noticed I had to scrap a lot of my ideas about different modules adding their redirect logic in a hook implementation.

Most importantly: Not all redirect rules are the same.
There are some rules that change the system path or the language, and don't care about the alias. There are other rules that assign aliases to system paths, or change some spelling inconsistencies. And then we have headers, $_POST, frontpage-specific rules, permission checking etc.

Even with the best priority management, those rules do not all fit into a generic mechanism.
What we can achieve, though, is a hook for rewriting of system path, interface language and maybe a few other things, and specialized (non-hookable) logic for all the rest.

Once we have that, we are already a great step forward, and can think about a more flexible API.

And yes, caching is a good idea, but as an intermediate step, we should simply try to be not slower than GlobalRedirect is nowadays.

Dead or alive?

Andy Inman's picture

Hey Dave and others, is this dead or alive?



Currently part of the team at https://lastcallmedia.com in a senior Drupal specialist role.