stapler.module: Parameterized inheritance (TM) and code generation

Events happening in the community are now at Drupal community events on www.drupal.org.
donquixote's picture

Consider you have a class named "DbSelect".
Objects of this class allow the following (among other things):

<?php
$select
->where("x = 5");
?>

Now you as a module author are not happy with this. You want to add additional methods, but the class definition is packaged in a contrib module that is not yours.
You could suggest a patch, but maybe the maintainer is not interested.
You could write a child class, but this locks you into the linearity of inheritance: What if you want this to apply to all sub-classes of DbSelect? Also, you would now have to make sure that whenever a DbSelect object is to be created, a MyDbSelect is created instead. And finally, if other modules try the same, there is a conflict, because you cannot merge the two child classes.

How would it be if you could instead write a parent class to DbSelect, and cheat it into the inheritance chain?
Existing: class DbSelect extends DbSelectBase
Yours: class DbSelect extends MyDbSelect extends DbSelectBase
Other: class DbSelect extends OtherDbSelect extends DbSelectBase
Merged: class DbSelect extends MyDbSelect extends OtherDbSelect extends DbSelectBase

Scary? Yes.
But it does the job.
Can this work?

Is this a good design, to allow arbitrary modules to throw in base classes?
In many cases it is not. But in some cases it might be.
The alternative is usually to use composition instead of inheritance. This would result in some fancy plugin architecture, maybe with added magic methods. Which one is more desirable, depends on the situation.

Let's have a look at some C++, and then have a look at the stapler module.

C++ "mixin" templates

C++ templates are scary. And I love them.
With C++ templates, you can parameterize your classes (and functions and methods) at compile time.

You can also parameterize the parent class. The result is a "mixin".
(my C++ is a bit rusty, so please forgive my pseudo. Also, I'm trying to get nice syntax colors..)

<?php
template
<AnimalType>
class
Dead : AnimalType {
  public
void isAlive() {
   
// no, it's not alive.
   
return 0;
  }
}

class
Cat {..}
class
Dog {..}

class
DeadCat : Dead<Cat> {..}
class
DeadDog : Dead<Dog> {..}
?>

The result:
DeadCat extends Dead<Cat> extends Cat.
DeadDog extends Dead<Dog> extends Dog.

The benefit:
We can reuse the code in the Dead<*> class template, to use it for any dead animal, without having Dead as a common base class. This way, we avoid multiple inheritance (which would be possible in C++, but let's just assume we want to avoid it).

The problem:
The class declaration needs to know what

Stapler: eval'd glue layer

In our example from above, we had:
Existing: class DbSelect extends DbSelectBase
Yours: class DbSelect extends MyDbSelect extends DbSelectBase
Other: class DbSelect extends OtherDbSelect extends DbSelectBase
Merged: class DbSelect extends MyDbSelect extends OtherDbSelect extends DbSelectBase

The exact chain would depend on which modules are enabled.
However, the author of DbSelect or MyDbSelect do not know which modules will be enabled. The parent class has to be hardcoded.
So, if the inheritance is hardcoded as class DbSelect extends DbSelectBase, then how can we change the base class later on?

The key is a glue layer.
Hardcoded, we have
class DbSelect extends StaplerGlue_DbSelect
class MyDbSelect extends StaplerGlue_MyDbSelect
class OtherDbSelect extends StaplerGlue_OtherDbSelect

Then, at a time when we know which modules are be enabled, stapler will glue these together with some eval'd code.

<?php
eval('
  class StaplerGlue_DbSelect extends MyDbSelect {}
  class StaplerGlue_MyDbSelect extends OtherDbSelect {}
  class StaplerGlue_OtherDbSelect extends DbSelectBase {}
'
);
?>

And there we go. DbSelect does now unify all the features (methods and attributes) of the different stapler layers.
Yeah!

Afraid of the eval() statement?
Seriously, this is one of the less critical situations for using eval.
And if you don't like it, just imagine it was using code generation instead, which is almost equivalent.
The really dangerous thing is to use eval() on user-contributed strings, or anything coming from the database. Which happens a lot in Drupal.

This does solve the problem of making a class modular.
It also kind of breaks inheritance, but that's the price. And, there is still the "private" keyword, if a layer wants to keep something for itself - remember?

Stapler (next version?): Parameterized PHP

One problem the above does not solve, is the reuse of a layer of inheritance.
Such as, the Dead<AnimalType> in the C++ example.
This was not even a requirement in the first place, but it is so nice in C++.. why not have the same in PHP?
In fact, why not have modular inheritance with reusable mixins?

What we want, is something like this:
class DeadCat extends DeadAnimal extends Cat
class DeadDog extends DeadAnimal extends Dog
where the DeadAnimal is to be reused.
But, damn, these two lines can not coexist. DeadAnimal can either be a Cat, or a Dog.
So, let's do the following instead:
class DeadCat extends DeadAnimal_Cat extends Cat
class DeadDog extends DeadAnimal_Dog extends Dog
but reuse the code to generate the DeadAnimal_* classes.

So, we are at code generation.
This is quite a popular concept in the world out there, but Drupal has some special use cases, so we do our own.

We start with a template PHP file.

<?php
class DeadAnimal_[suffix] extends [parent] {..}
?>

We read this file with file_get_contents(), then we replace the [suffix] and [parent] with given parameters, then we either eval(), or we save the file somewhere and require() it. The latter is called code generation. The filename can be some md5 or sha based on the parameter values for [suffix] and [parent]. The location would be a writable folder somewhere in the sites dir.

We are almost there, but there are two tiny cosmetic flaws:
1. The [suffix] and [parent] will screw up syntax colors in most editors.
2. We cannot use the strings [suffix] and [parent] anywhere else in this file, since they will be replaced everywhere they occur. This is a rare problem, but it does make the solution less complete/universal.

Solution: Use in-file parameter aliases.
(similar to the custom heredoc / nowdoc keywords)

<?php
/**
* Stapler aliases (yeah, we should get a more well-defined syntax for that)
* [suffix] = xxx
* [parent] = yyy
*/
class DeadAnimal_xxx extends yyy {
  function
foo() {
    return
"The string '[suffix]' will not be replaced.";
  }
}
?>

Stapler (next version): Reusable mixins.

With our new code generation toolbox, we can now build completely syntetic inheritance chains.

<?php
// an md5-ish syntetic classname.
$deadWhiteCatClass = stapler_mix(array('Cat', 'Dead', 'White'));
// or a more human-readable classname.
stapler_define_mixed_class('DeadWhiteCat', array('Cat', 'Dead', 'White'));
?>

where different modules can have a say about which components should be added into the mix.

Stapler: autoload is our friend

Stapler uses autoload as a trigger.
Whenever it PHP wants to autoload a class that stapler recognizes as one of its syntetic hybrids, it will determine which code files need to be included or newly generated, or which code needs to be eval'd.

Comments

Heh you're still playing with

wapnik's picture

Heh you're still playing with these ideas :) Anyway, with that code generation you're not far from my own old experiment. As i remember, there were some bigger concerns regarding to performance, so i generated the files on clear cache (in the case of the Views implementation). For valuable error messages it's better not to use eval(), and give the generated files more meaningful names. The thing that was worrying me the most was the possibility of conflicts in such a dynamic inheritance chain. Like for example there can be a situation that you have three modules, every one with a class definition, every two of them working together, but the three together not working at all, or just in the right order. How to estimate the order?

How to estimate the

donquixote's picture

How to estimate the order?

With stapler, hook implementations can set the weight of each inheritance layer.

give the generated files more meaningful names

I was thinking about two possible approaches:

  1. Every generated class gets a name describing its role.
    This can either be a one-off name, such as my_module_MyDynamicClass, or it can be a synthesized name, with the name of a content type etc mixed in.
    These synthesized classes are saved in files following a pattern similar to PSR-0, or whatever is the most practical. The location would be at sth like sites/all/stapler, or sites/all/codegen.
    The files are flushed on module update and enable/disable, and maybe some other occasions.
    And of course, you can flush it manually.

  2. Every generated class name includes the hash of a definition array.
    So, we first build an array of the layers and their parameters, sha it like git, and store it at this name.
    Whenever the array we build is different from that, we use a different file. Even if the role of the class or object is still the same.
    The bad thing, we'd have to build this array each time we want to instantiate the class. Or at least, in every request.

I think it all depends on the situation. How dynamic does it need to be, etc.

Contributed Module Ideas

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week