A small but important fix to our dataflow

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
geek-merlin's picture

This is the story about a small but imho important issue i want to expose to some people's eyes before D8 doors close.
It's about dataflow and the data providers i talk about are the info hooks. Maybe they will die some day but apparently not in D8.
Think of hook_entity_info(), hook_field_info(), hook_hook_info(), even incognito ones like hook_menu() or hook_permission().
(It's a good idea to change them to a common naming scheme but that's another story.)

The problem

So what's wrong with info hooks? Nothing. They are asked, they provide info. That's their job.
Sometimes info changes. And that's when some real developer pain starts:
Say, we add a view with a page display and a menu path. So hook_menu() has news!
Views goes and clears menu cache by calling menu_router_rebuild(). Ouch, that hurts!
Why does it hurt? Because Views has to make assumptions about who and how consumes hook_menu() info.
What if mymodule also consumes hook_menu() info? OK, we might always cache_clear_all().
Apart from this being a waste of reaources, what if mymodule stores some hook-menu-dependent artefacts aside from cache, say in file system?

Symptoms

Many issues of the pattern "Config change XY only takes effect after cache clear" are in fact instances of this design gap.
Adding more and more if-module-baz-exists-clear-baz-cache is no fun.
Or: I wrote a module that implements hook_variable_info() dynamically.
Had to announce changes. Asked variable module maintainer, but for reasons stated above this did not feel clean and can break anytime.

The root problem

Put in one simple sentence: Our dataflow architecture lets consumers pull info, but we have no clean way to let providers push info.

The canonical cure

We can cure this if, for every info hook, we use a dual info_change hook.
So in the last example, when views has new hook_menu() info,
it will invoke hook_menu_change(), and info consumers can take appropriate action.
So menu.module will implement hook_menu_change and clear its cache.
(You might wonder why not call it hook_menu_cache_invalidate() or so. Clearing cache is only one possible consumer action so let's be exact here.)

The job

If we want this the job is to go through core

  • enumerate info hooks (grepped 27 plus say 15 "incognito" ones).
  • on the consumer side: for every hook that is invoked and cached implement a consumer_foo_info_change() hook that clears cache
  • on the provider side: grep for corresponding cache clears and invoke hook_foo_info_changed()
  • document.

The risks

Minimal: The patch is simple to review because most is a simple code-move refactoring.
Forgotten hooks won't break code and can be fixed anytime.

The issue

See here: Let info hook prividers announce changes (info_change hooks)

Comments

Namespace question

geek-merlin's picture

donquixote wrote in the issue:

I want to kindly mention that with our poor hook naming scheme, every hook can make name clashes (or for D8, we should rather say, it increases the chance for unintended hook invocations in future contrib).

I remember that namespace post of yours and think it really makes a good point. I might also call myself namespace-paranoid so i fully support this cause and hope we will get module namespaces for D9.

We generally ignore this problem, but we should be careful introducing a ton of hooks all at once.

I thought about that but could not find a solution with takes fewer namespace room. Also the namespace room is quite foreseeable: Like now, when hook_foo_info() comes in company with hook_foo_info_alter(), it's a new friend hook_foo_info_change().

Should we not rather introduce just one hook? Or would that have performance implications?

This would mean something like

<?php
function hook_info_change($op) {
// take some action
}
...
// announce change of hook_menu()
module_invoke_all('hook_info_change', 'menu')
?>

Yes this would have serious performance issues: Instead of a specific hook invoking 3 functions we would have a general hook invoking 97 functions.

And hey, we just dumped tons of $op in D7.

Hook discovery

donquixote's picture

Ok, that's only 3 implementations per hook, but we still need to check all modules for function_exists(). Yes, this is cached, but quite often we need to flush that.
Yes, this is probably still better than just using one hook. Just saying it is not "for free".

How many consumers?

donquixote's picture

Not sure what's the best place to reply. The issue or here.

  1. A lot of times there is only one hook "consumer" in the first place. E.g. other modules typically call sth like libraries_get_info() or menu_get_router(), instead of invoking the hooks directly.
    Ok, I agree it is not safe to rely on that :)

  2. That's coming in the next post :)

Persistent cache vs in-request cache

donquixote's picture

I think there are two parts of the problem:

  1. Static variables, which avoid calling the same thing twice in a request. I once made a proposal to resolve this,
    http://drupal.org/node/632434
    "Allow static variables to 'subscribe' to global events for reset"

  2. Persistent cache, which remembers the data between requests.
    More on this one in the next post :)

Dependency matrix

donquixote's picture

And here is my idea for the persistent cache.

  1. Every "cache" gets a unique string key (defined by the module). "Cache" can mean sth in the dedicated Drupal cache table, but it could also be sth the module defines its own storage for.

  2. Dependency matrix:
    A database table filled with dependencies of one "cache" to another.
    Modules can declare those dependencies via hook_cache_dependencies_info() or similar.

  3. Cache invalidation table
    For each "cache" we keep a flag telling us whether that is still up to date or not.
    Instead of clearing the entire cache whenever something wants that, we just set the flag.
    Whenever something wants to read from the cache, it first needs to check that flag (which can be made available in memory for easy lookup)

  4. Recursive cache invalidation mechanic
    If you set one key to "out of date", the system will let that trickle down to all the other keys that depend on it. This operation can be very fast, because it is just one class or function setting a number of flags.

This way we need only one new hook (or a few, to be decided).
Why I am still not convinced of too many hooks, I will post in another post :)