This is the story about a small but imho important issue i want to expose to some people's eyes before D8 doors close.
It's about dataflow and the data providers i talk about are the info hooks. Maybe they will die some day but apparently not in D8.
Think of hook_entity_info(), hook_field_info(), hook_hook_info(), even incognito ones like hook_menu() or hook_permission().
(It's a good idea to change them to a common naming scheme but that's another story.)
The problem
So what's wrong with info hooks? Nothing. They are asked, they provide info. That's their job.
Sometimes info changes. And that's when some real developer pain starts:
Say, we add a view with a page display and a menu path. So hook_menu() has news!
Views goes and clears menu cache by calling menu_router_rebuild(). Ouch, that hurts!
Why does it hurt? Because Views has to make assumptions about who and how consumes hook_menu() info.
What if mymodule also consumes hook_menu() info? OK, we might always cache_clear_all().
Apart from this being a waste of reaources, what if mymodule stores some hook-menu-dependent artefacts aside from cache, say in file system?
Symptoms
Many issues of the pattern "Config change XY only takes effect after cache clear" are in fact instances of this design gap.
Adding more and more if-module-baz-exists-clear-baz-cache is no fun.
Or: I wrote a module that implements hook_variable_info() dynamically.
Had to announce changes. Asked variable module maintainer, but for reasons stated above this did not feel clean and can break anytime.
The root problem
Put in one simple sentence: Our dataflow architecture lets consumers pull info, but we have no clean way to let providers push info.
The canonical cure
We can cure this if, for every info hook, we use a dual info_change hook.
So in the last example, when views has new hook_menu() info,
it will invoke hook_menu_change(), and info consumers can take appropriate action.
So menu.module will implement hook_menu_change and clear its cache.
(You might wonder why not call it hook_menu_cache_invalidate() or so. Clearing cache is only one possible consumer action so let's be exact here.)
The job
If we want this the job is to go through core
- enumerate info hooks (grepped 27 plus say 15 "incognito" ones).
- on the consumer side: for every hook that is invoked and cached implement a consumer_foo_info_change() hook that clears cache
- on the provider side: grep for corresponding cache clears and invoke hook_foo_info_changed()
- document.
The risks
Minimal: The patch is simple to review because most is a simple code-move refactoring.
Forgotten hooks won't break code and can be fixed anytime.
The issue
See here: Let info hook prividers announce changes (info_change hooks)
Comments
Namespace question
donquixote wrote in the issue:
I remember that namespace post of yours and think it really makes a good point. I might also call myself namespace-paranoid so i fully support this cause and hope we will get module namespaces for D9.
I thought about that but could not find a solution with takes fewer namespace room. Also the namespace room is quite foreseeable: Like now, when hook_foo_info() comes in company with hook_foo_info_alter(), it's a new friend hook_foo_info_change().
This would mean something like
<?php
function hook_info_change($op) {
// take some action
}
...
// announce change of hook_menu()
module_invoke_all('hook_info_change', 'menu')
?>
Yes this would have serious performance issues: Instead of a specific hook invoking 3 functions we would have a general hook invoking 97 functions.
And hey, we just dumped tons of $op in D7.
Hook discovery
Ok, that's only 3 implementations per hook, but we still need to check all modules for function_exists(). Yes, this is cached, but quite often we need to flush that.
Yes, this is probably still better than just using one hook. Just saying it is not "for free".
How many consumers?
Not sure what's the best place to reply. The issue or here.
A lot of times there is only one hook "consumer" in the first place. E.g. other modules typically call sth like libraries_get_info() or menu_get_router(), instead of invoking the hooks directly.
Ok, I agree it is not safe to rely on that :)
That's coming in the next post :)
Persistent cache vs in-request cache
I think there are two parts of the problem:
Static variables, which avoid calling the same thing twice in a request. I once made a proposal to resolve this,
http://drupal.org/node/632434
"Allow static variables to 'subscribe' to global events for reset"
Persistent cache, which remembers the data between requests.
More on this one in the next post :)
Dependency matrix
And here is my idea for the persistent cache.
Every "cache" gets a unique string key (defined by the module). "Cache" can mean sth in the dedicated Drupal cache table, but it could also be sth the module defines its own storage for.
Dependency matrix:
A database table filled with dependencies of one "cache" to another.
Modules can declare those dependencies via hook_cache_dependencies_info() or similar.
Cache invalidation table
For each "cache" we keep a flag telling us whether that is still up to date or not.
Instead of clearing the entire cache whenever something wants that, we just set the flag.
Whenever something wants to read from the cache, it first needs to check that flag (which can be made available in memory for easy lookup)
Recursive cache invalidation mechanic
If you set one key to "out of date", the system will let that trickle down to all the other keys that depend on it. This operation can be very fast, because it is just one class or function setting a number of flags.
This way we need only one new hook (or a few, to be decided).
Why I am still not convinced of too many hooks, I will post in another post :)