Token enhancement proposal

Posted by fago on August 21, 2007 at 8:48am

Token is great, however there are still some problems that need to be solved. Here is a proposal, which tries to do so.

Problems I noticed are:

missing handling of multiple values
no way to get tokens of a certain node type and no more
no way to get token help for only raw values or formatted ones - which might lead to XSS created by unexperienced users
performance: getting all token values for a node takes about 10ms!

I ran over the performance issue while developing workflow-ng. I think 10ms just for replacing a token is far less than ideal. I noticed that user tokens are working quite fast, so I assume that getting and caching all token values takes that long. So I propose to do the replacing in two steps, first off it is determined for each token, if it has been used.Then on the second step, the value for each used token is determined and replaced.
So token needs to get only the values for really use tokens - this should work fast!

Unfortunately, this and the improved formatter handling requires API changes. So consider this as a proposal for token 2.x.

Here are more details:

This is how the token list could look like

<?php
/**
 * Implementation of hook_token_list()
 */
function node_token_list() {
  return array(
    array(
      '#entity' => 'node',
      '#name' => t('content'),
      '#types' => array('page', 'story'),
      'nid' => array('#label' => t('label..')),
      'title' => array('#label' => t('content title'), '#formatter => array('plain')),
      .....
    ),
    array(
      '#entity' => 'my_own',
      'label' => array('#label' => t('label..')),
      ...
    ),
  );
}
?>

A #formatter 'raw' is per default available and has to be supported by each token. Then I thought of this properties for a token:
# label - for the help
# formatter - any further available (except from raw)
# multiple - If there are multiple values, e.g. for CCK multiple fields or user roles.

Note that I've used "entity" instead of object, so it's more clear that arrays are supported too.

Token values

Then the hook_token_values implement needs to be changed. It could look like this:

<?php
/**
 * Simple example implementation of hook_token_values()
 **//
function node_token_values($token_info, $entity = NULL) {
  return $token_info['#formatter'] == 'raw' ? $entity->{$token_info['#name']} : check_plain($entity->{$token_info['#name']});
}
/*
format of $token_info:
$token_info = array(
  '#entity' => 'node',
  '#types' => array('page'),
  '#name' => 'nid',
  '#formatter' => 'raw', /('plain', or whatever specified)
);
 */
?>

Token help

<?php
function theme_token_help($selection = array(), $formatter = 'raw', $prefix = '[', $suffix = ']', ) {
 //get the token lists, merge together the definitions for same entities, cache this -> (in an extra funciton)
 //then go for the selection and render the help
}
?>

e.g. usage theme('token_help', array('global', 'my_own', 'node' => array(#types => array('page'))
or just theme('token_help', array('global', 'my_own', 'node') to get all node tokens.

default tokens would be [content:title] or [my_own:label].
for theme('token_help', array('node')) the tokens would be only like this: [title]

$formatter may be 'raw', 'all' or 'formatted'. 'formatted' means all formatters but raw.
So if multiple formatters are available for one token the replacements would be:
[content:date:year] or [content:date:short]

multiple fields handling

Consider [content:textfield] to be a mutiple textfield.

[content:textfield] should return an appropriate replacement for all textfields, e.g. comma separated.
[content:textfield][2] should return only the second textfield, if exists.

Of course, we would have to add the information of the required field to $token_info, so that the module can return the appropriate value.

enhanced entity handling

Often there are "foreign keys" which point to another entities, e.g. $node->uid or CCK node and user reference fields.
So, if one wants to use the user tokens for an author, currently we have to re-add all the user token definitions.
To avoid this I propose to make it possible to specify related entities.

Example node author:

<?php
/**
 * Implementation of hook_token_list()
 **//
function node_token_list() {
  return array(
    array(
      '#entity' => 'node',
      '#name' => t('content'),
      'nid' => array('#label' => t('label..')),
      'author' => array('#entity' => 'user', '#label' => t('Content author')),
      .....
    ),
  );
}

//When token tries to get the value, it has first to get the node author entity:

//Implementation of hook_token_values()
function node_token_values($token_info, $entity = NULL) {
  if ($token_info['#name'] == 'author') {
    return user_load(array('uid' => $entitiy->uid));
  }
  return $token_info['#formatter'] == 'raw' ? $entity->{$token_info['#name']} : check_plain($entity->{$token_info['#name']});
}

//Then it can get the value as usual.
?>

The token help would have to include all user tokens for a node. They could be used e.g.
by using [content:author:mail].

callbacks

We could also go for callbacks instead of the hook_token_values. I think this way the code could get even a bit more cleaner and we save some time invoking the hook.

<?php
/**
 * Implementation of hook_token_list()
 */
function node_token_list() {
  return array(
    array(
      '#entity' => 'node',
      '#name' => t('content'),
      '#types' => array('page', 'story'),
      'nid' => array('#label' => t('label..'), #callback => 'node_token_values'),
      'title' => array('#label' => t('content title'), '#formatter => array('plain'), #callback => 'node_token_values'),
      .....
    ),
  );
}
?>

Opinions?
I could go for implementing the changes, however I think I would need some help converting all the actual token hook implementations.

Comments

thought I commented...

Posted by greggles on October 19, 2007 at 12:48pm

I thought I commented on this, but I don't see it.

First, I think it's great that you are putting so much thought and effort into making token better.

The problem I see with this is that it is pretty complex. As it is now module developers are hesitant to add the token_list and token_values hooks to their project. That said, I think this is basically necessary for this to become scalable and provide all the features that people need...

My personal focus is on making token robust. Regardless of how we structure the API there are situations where the tokens get populated with bad data or not populated at all. That is where my focus is.

I know Eaton likes to work on the new stuff - hopefully the two of you can sort this out (hint, hint, eaton!).

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

concentrate

Posted by fago on October 24, 2007 at 3:43pm

hi!
yep, I agree that it gets a bit more complex, so I suggest we concentrate on the more important points. I think they are:

there is no way to get token help for only raw values or formatted ones - which might lead to XSS created by unexperienced users
improving performance

ad1. urges us to introduce the possibility to specify some metadata about a token. If we do this right, we could add more features later, without changing much code.

ad2. As I've already described:
I propose to do the replacing in two steps, first off it is determined for each token, if it has been used. Then on the second step, the value for each used token is determined and replaced. So token needs to get only the values for really use tokens - this should work fast!
For this I think a callback system would fit well.

If we can agree on this points I can help getting this in 6.x.
Imho the sooner the better, so that other module authors can use a stable token API for 6.x

How about a token configuration page?

Posted by cwgordon7 on October 21, 2007 at 12:52am

Why not create a token configuration page, where the admin selects which tokens (or sets of tokens, in the case of modules) to enable, and which tokens to leave disabled? This should be relatively simple; compile a list of default tokens and modules providing tokens, and don't display/replace tokens that aren't enabled.

This may not be the ideal solution, but it would, say, let me speed up page load times by disabling all tokens from the og module, none of which I use. It also does not complicate the use of the api, as nothing has changed.

that's a solution with the wrong timeline

Posted by greggles on October 21, 2007 at 9:01pm

So, for "token2" that shouldn't be a problem because of the formatters and/or callbacks.

For token1, that's a little more work on "adding features" than I think makes sense given my current goal of working primarily on stability and robust token parsing.

--
Knaddisons Denver Life | mmm Chipotle Log | The Big Spanish Tour

knaddison blog | Morris Animal Foundation

more information on performance

Posted by greggles on September 3, 2008 at 5:07pm

Here is a benchmark on token performance specifically for profile fields:

http://drupal.org/node/125640#comment-992782

http://drupal.org/files/issues/after_profile_patch.txt
http://drupal.org/files/issues/before_profile_patch.txt

Basically it shows that there was no difference from those tokens! I still need to repeat the test to try to confirm it, but I'd say that this might show that we are being overly cautious in keeping tokens out of the module for fear of performance issues...

--
_{Growing Venture Solutions | Drupal Dashboard | Learn more about Drupal - buy a Drupal Book}

knaddison blog | Morris Animal Foundation

Token enhancement proposal

This is how the token list could look like

Token values

Token help

multiple fields handling

enhanced entity handling

callbacks

Comments

thought I commented...

concentrate

How about a token configuration page?

that's a solution with the wrong timeline

more information on performance

Tokens

Group organizers

New groups

Group notifications