Summary of Configuration Management sprint

webchick's picture

Last week, before DrupalCamp Colorado, Greg "heyrocker" Dunlap, David Strauss, Larry "Crell" Garfield, and Karoly "chx" Negyesi met to hash out architectural details and whip up some prototype code for the Drupal 8 Configuration Management initiative.

What problems are we trying to solve?

  • Currently there is no good way to move Drupal configuration information between environments because this data is scattered throughout the database in a variety of formats, oftentimes intermingled with content.
  • This also makes it impossible to version control this information, to store history, and to be able to rollback changes.
  • Every module stores their configuration data in a different format, there is no standardization at all, even within core.
  • There is also no standard API for saving this information (aside from the simple case of the variables table) so developer often roll their own solutions.
  • The entire contents of the variables table is loaded on each page request, even for rarely-accessed data, leading to memory bloat.
  • It is cumbersome to manage information that is different between server environments for the same project (database information, api keys, etc.)

We specifically are NOT (yet) trying to solve the problem of contextual configuration; only the underlying API that handles getting/setting this configuration data and shuffling it around to different sites.

The code that was developed at the sprint as a prototype is available at http://drupal.org/sandbox/heyrocker/1145636 for your reviewing pleasure. The main things to look at are the includes/config.inc and modules/config/config.test files.

What follows is a summary of the results. Your feedback is welcomed!

Overall Architecture

The proposed configuration system employes a three-level architecture:
Diagram summarizing the information below.

Level 1: Signed file storage

At the lowest level, all configuration data will be stored on-disk using digitally signed JSON files in a sites/$sitename/config directory. See also "Security considerations" below for more information on the information in this section.
Individual configuration files (prefix.example.json.php) will look like the following:

<?php die(); 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "value1",
    "value2"],
  "boolean_value": false,
}

The first line prevents direct browsing of the file, as well as provides a digital signature to verify the integrity of the files.
Typically, a module will expose one such file with all of their configuration data, like module.$modulename.json.php. Core might do files like core.site_information.json.php, and something like Flag which other modules hook into to provide "discoverable" stuff might do both a module-specific configuration file module.flag.json.php for "global" module settings, as well as define a flag.$flagname.json.php.
The advantage of storing configuration this way is that rather than the current system where the entire contents of tha variable table is loaded into memory on every page request, thus limiting the data that can be stored there, we can instead target loading of only specific parts of configuration we need for the current request.

Level 2: Active configuration wrapper

This layer moves the configuration data from the filesystem to something that can be read and accessed much more readily. For the vast majority of Drupal sites this will be database storage, but for high-performance sites could be something like MongoDB or Redis.
In the default configuration, this will push data into a central table called "config":

CREATE TABLE config (
   name varchar(255) NOT NULL DEFAULT '' COMMENT 'The identifier for the  configuration entry, such as module.example (the name of the file,  minus .json.php).',
  data longtext NOT NULL COMMENT 'The raw JSON data for this configuration entry.',
  PRIMARY KEY (name),
);

While at first glance the structure looks like the "variables" table in Drupal 7 and below the fundamental difference is this table stores configuration objects (say, every site information setting: name, mission etc) the variable stored single values (like the site name). Also, as said above, we are not loading the whole table in memory.
All site configuration data gets read out of this wrapper. The data here gets updated on two conditions:

  1. UI changes (automatic): When the save button is clicked on an admin page, data gets written to both the wrapper and the .json.php file from layer 1 (the digital signatures for the changed file gets regenerated).
  2. Code changes (manual): When migrating configuration from dev to prod, for example, the underlying files will have changed, but the data in the database table will not. Data will continue to be read from the active store so that the site doesn't break. Site admins can replace the contents of the active store with the contents on disk via an administrative interface and/or a drush command.

Level 3: Configuration API

At this level are the actual API functions that module developers will interact with in order to manipulate configuration values; essentially, a replacement for variable_get()/variable_set().

<?php
// Load a set of configuration out of the active store.
// 'prefix.name' refers to the filename of the .json.php file, without the extension.
$config = config('prefix.name');
// Access a single value out of the store.
echo $config->value;
// Change a value and save it back to both the active store and the filesystem.
$config->value = 'new value';
$config->save();
?>

I'm a module developer. How do these proposed changes affect me?

variable_set()/variable_get()

In Drupal 7 and below, all variables are global, so accessing and saving them is done this way:

<?php
// Load the site name out of configuration.
$site_name = variable_get('site_name', 'Drupal');
// Change the site name to something else.
variable_set('site_name', 'This is the dev site.');
?>

In Drupal 8, configuration will only be lazy-loaded when needed. The above code would therefore change as follows:
<?php
// Load the site name out of configuration.
$site_name = config('core.site_information')->name;
// Change the site name to something else.
$config = config('core.site_information');
$config->name = 'My Awesome Site';
$config->save();
?>

For "discoverable" chunks like entities or views, you can load all "sub-configuration" files (e.g. list of all views, list of all node types) with the following call (exact API call TBD): config_get_names_with_prefix('entity.node.');. This will return something like array('entity.node.page', 'entity.node.article');. So to retrieve all settings for the page node type run config('entity.node.page'). As a side note, this does mean that config('entity.node') will not return anything (TBD whether it throws an exception).

Declaring your own configuration options

In the past, configuration was declared in a mish-mash of ways. system_settings_form() would automagically save all form elements as variables in the variable table. Modules like Flag and Views would employ a hook_X_default() type of pattern, and so on.
Under the new system, declaring your own "variable" style configuration options happens in a module.$modulename.json.php file shipped with your module. You declare the defaults for these variables in just this one place, as opposed to every time you try and retrieve them.
An example file might look like:
module.book.json.php:

<?php die();
{
    "book_child_type": "book",
    "book_block_mode": "all pages",
    "book_allowed_types": [
        "book"
    ]
}

During module installation, the configuration file will be copied to a user's sites/$sitename/config directory, and from then on the site-specific settings will be stored there.
Defining default views, flags, etc. could be done much the same, but ultimately get stored in the active store and .json.php files.

But my configuration data is way more complex than that!

Because we are using JSON as a storage format, configuration objects can be arbitrarily complex. There is, however, one catch. JSON does not differentiate between an object and a hash (what PHP calls an associative array or dictionary), so we are only able to support one of them. After some discussion we determined that mapping all such data to an associative array would provide the most robust structure, as it allows for discrete ordering of properties, and $object->1 is invalid.

<?php
$config
= config('module.mymodule');
$config->alternate_name; // A simple property
$config->instances[0]['name']; // An array
// You can't do this...
$config->foo->bar->baz = 1;
// But you can do this instead...
$config->foo['bar']['baz'] = 1;
?>

If your configuration data is especially complex, you're probably a fancy-pants developer and want to use a custom configuration class.

I'm a super-fancy-pants developer and need something more powerful than this. How do I override it?

The $config object will be a of a common class that provides the ->save() logic. For advanced use cases, it will be possible to override it, like so:

<?php
class MyAdvancedConfig extends DrupalConfig {
  function
someHelperMethod() {
   
$this->pants = 'fancy';
    echo
"Ma'am, you have my compliments on your fancy pants.";
  }
}
$config = config('core.site', 'MyAdvancedConfig');
$config->someHelperMethod();
$config->save();
?>

Migrating configuration from dev to prod

The overall workflow here would be as follows:

  1. On your development server, perform whatever configuration changes are needed through the UI. Create Views, check checkboxes, etc. These changes will get written to *both* the database table *and* the file system so the two are in sync (it will also re-generate the digital signatures of the underlying files).
  2. When finished with your edits, review the changes in the sites/$sitename/config directory with $vcs diff (or your tool of choice) to confirm they look as expected.
  3. If everything looks good, move the changed files to your production server in the usual way (SFTP, $vcs add/commit/push, $vcs update/pull). Nothing will immediately change, as your site will still be reading from the active store.
  4. Finally, go to admin/configuration/system/config or run drush config-update (note: final location/command likely different; making these up right now). This will outline the differences between on-disk and the active store. If the changes seem good, go ahead and confirm to overwrite the content of the active store with the on-disk configuration. The admin tool will offer to regenerate the file signatures if necessary.

Handling local overrides

It's often handy, for things like API keys or $db_url, to have site-specific configuration that is not checked into version control.
In Drupal 7 and below, you'd do the following in settings.php (or settings.local.php):

<?php
$conf
['site_name'] = 'This is the dev site';
?>

Under the proposed system, there would be a special file where all local overrides are kept (this file would not be checked into version control):
/sites/default/config/local.json.php
Within this file, would be particular keys/values that you're overriding on a per-value basis:
<?php die(); 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "core.site_information": {"site_name": "This is the dev site"}
}

Security considerations

A bunch of talks were had with greggles, coltrane, and chx from the Drupal security team about the security of the web UI writing files directly to disk. Since these files may contain extremely sensitive data, such as database passwords and whatnot, it's imperative that they not be readable by outside eyes, nor writable by rogue processes.
Here's what was figured out:

  1. Within the root /sites/sitename/config/ there will be a .htaccess that does "Deny from all" (and IIS web.config that does the equivalent) so that these configuration files cannot be read at all, under normal conditions. Attempting to access the files directly will result in a 403 error.
  2. However, "just in case" this protection fails (or for web servers that do not support these override systems), all configuration files end in a .php extension. In this case, if you access them directly you will just get a PHP syntax error. There will also be a status check under admin/reports/status that throws flaming red warnings if .htaccess protection is turned off.
  3. In the root of the /sites/sitename/ there will be a read-only key.php file that will be written during installation, and contains a site-specific private key. This private key will be used to digitally verify that the contents of configuration files are not roguely changed on disk. An example key.php would be:
    <?php die(); c3a408abee8c0bf3c2d302392
  4. The first line of all configuration files will consist of a automatically-generated digital signature, based on the private key:
    <?php die(); 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
    // All JSON data follows here.

    If there's a mismatch between this hash and the file contents, the configuration system will refuse to load the file unless the administrator explcitly approves it. This will prevent someone from writing mischievous values to the file through some other exploit.
  5. The goal here is "defense in depth", so an attacker would need to break through multiple layers in order to break anything undetected.

Comments/Feedback welcome!

Please chime in with your thoughts on this architecture, play around with the code, find holes, etc. Let's discuss! :)

Comments

Overall this seems like a

pwolanin's picture

Overall this seems like a reasonable direction.

One simple suggestion for enhancing security might be to make it possible to omit the secret key file and require the user to enter the key into the UI or drush in order to load configuration from disk. This would alleviate the need to have the secret key file checked into the $vcs.

We'd need to decide how the absence of a key file affects the saving of configuration - at the very least it would obviously have to prevent writing out configuration to disk without manually entering the key.

In any case, some additional option like this could further enhance the lock-down of configuration on a production site if desired.

Maybe in generation two?

David Strauss's picture

One simple suggestion for enhancing security might be to make it possible to omit the secret key file and require the user to enter the key into the UI or drush in order to load configuration from disk. This would alleviate the need to have the secret key file checked into the $vcs.

We'd need to decide how the absence of a key file affects the saving of configuration - at the very least it would obviously have to prevent writing out configuration to disk without manually entering the key.

I like this idea, but I'd prefer to put it in the backlog rather than block until it's implemented. There's certainly nothing in the current architecture that would prevent this from happening. We could even do fun things like saving the key in a short-lived secure cookie for a bit to allow updating site configuration on several screens without constant interruption.

Additionally, if we're adding more security around the key, we could opt for public and private keys (if PHP has reasonable support). It would be great to be able to verify files but not sign them with the key on disk.

Well, it has

Two Questions

mfer's picture

I have two questions I'm curious to see addressed.

  1. A module, when installed, has its settings copied to sites/$sitename/config. What happens when these are changed from the defaults, the module creates/deletes variables, and the updated module is installed? How are these changes managed?
  2. When someone is developing new code in a module that creates new variables or deletes old ones, what is the workflow for doing this? Do I need to add the variable, go somewhere to generate a key, apply an update from the defaults to what my site has in sites/$sitename/config, etc? What does this process look like?

1) They would have to write

heyrocker's picture

1) They would have to write an update hook (just like now if you make a change to schema API)

2) You do need to go to the defaults file and add the variable. Then in your update hook, you would update the live file and call the API function to re-sign this file. Then you would call an API function to reload the active store from the files. That's pretty much it. There will also be some other tools to manage this. For instance I assume there will be drush functionality to re-sign all files/a single file.

Or even better...

David Strauss's picture

Greg is right here, but I'd like to add that it wouldn't be hard for a contrib module (or even core!) to provide a function call that copies over any new defaults present in the module's defaults file but completely missing from the site's configuration. This would work for new, top-level configuration keys.

Minor nitpick

dmitrig01's picture

It'd be great to have a different directory name than sites/$sitename/config -- this directory would be useful for things like automated backups as well. Could we think of a name in which both could coexist? (http://drupal.org/node/1186026 for example)

Thoughts after a quick read

bjaspan's picture

My initial thoughts after a quick, sleepy read:

  1. Copying declarative data describing site configuration does not address the need for imperative statements (i.e.: update functions) to migrate from one config to another. For example, when you migrate Field API settings, you have to perform database changes. It may not be (probably is not) possible to determine the correct sequence of events in all cases automatically by inspecting the before and after declarations.

  2. It seems like the config files should be stored out of the web server's docroot directory entirely, just like the private files directory should be. The sites directory could store the path to this location so we do not have to require the same directory structure at all hosts. Managed Drupal hosting environments like Acquia Cloud and Pantheon could then write the location they create for this purpose into the file during deployment, thereby telling Drupal where to read/write the files.

  3. Perhaps I am missing something, but it is not immediately obvious to me what benefit the read-only key file provides. An attacker that can write to the config files can write to the key file. I can almost think of a way to make this useful: The key file can be owned by a user other than the webserver and (if different) the PHP user, but have the same group as the PHP user. This allows PHP code to read the file, but even an attacker that takes control of the web server or PHP cannot modify it unless he also breaks root on the server (in which case, game over). This is only almost useful because if the key file perms are that way, then there is no reason all the code file perms cannot be that way, again making the key file not useful. (FYI, Acquia Cloud actually uses this approach for all of a site's code.) What am I missing?

Great work, though. This is absolutely a critical requirement for Drupal, and I'm glad to see the list of excellent people working on the problem. ;-)

config location

catch's picture

If the files were to always be stored outside the webroot, would there be a specific need for the <?php die; ? I can't think of anything that offers apart from making the files non web-readable which seems like it could be equally handled by storing outside the webroot, but may have missed something.

I started to reply to

mike stewart's picture

I started to reply to @dmitrig01 (above)... IMO, automated backups should be outside of webroot, but when it comes down to it... so should any files that contain passwords or other sensitive (hacker useful) info. this would also address problem like needing: .gitignore (sites//settings.php)

IMO, it should be optional during install... the secure/right way (that implies you need to understand the ramifications of what you're doing), and a default wordpress/"easier way." (cuz I don't really want to know what I'm doing). I feel there is a use case for both and Drupal should cater to both.

--
mike stewart { twitter: @MediaDoneRight | IRC nick: mike stewart }

No, but we have to ship

heyrocker's picture

No, but we have to ship Drupal with a default that is inside webroot (just like we do with private files) and thus we need to have a solution that works with that setup. I don't really have a good solution to that, would love to hear more suggestions.

If we offer the option to

catch's picture

If we offer the option to store config outside the webroot, and there are .htaccess/webconfig rules for them when they're inside the webroot, can that not be enough? There's only so much we can do, and there is always a trade-off between flexibility (i.e. having the json files be valid json without any trickery) vs. stopping people shooting themselves in the foot.

I agree. It is really

moshe weitzman's picture

I agree. It is really bothersome that the files are not not plain old .json. The die() solution is quite clever, but it is a DX pest. catch's proposal is a good compromise IMO.

It's true that the die() is

greggles's picture

It's true that the die() is an annoyance, but it's a matter of defence in depth that I think is worth the pain.

See http://www.youtube.com/watch?v=AnOAeVaU5xM for an example of how the lack of defense in depth lead to exposure of backup_migrate files. Everything before 4 minutes is background, 4 minutes is where he starts to actually get the data.

Since it seems we're are

glennpratt's picture

Since it seems we're are building DB storage for this from the outset, why don't we just skip config file storage if you don't have secure private files?

The header also contains the signature

David Strauss's picture

As long as the files are writable by the web server, we'll need signatures. And, as long as we need signatures, we'll need a place to stash them. The first line of the file currently serves a dual purpose of both armoring the file from reading ("<?php die();") and storing the signature hash. Since we'll still need most (by byte count) of the content stored on the first line even outside the web root, I would probably opt for keeping headers the same, including the PHP armoring, even outside the web root.

Make it "configurable"

chx's picture

We can add a constant to index.php say define('DRUPAL_CONFIG_DIR', 'site/%site/config'); and then we can work from there. %site will be str_replace'd with conf_path, if the first char is not / then DRUPAL_ROOT is attached. This is so low level that any solution involving any sort of variable wouldnt quite work -- where are you going to read that variable from :) ?

I'm not sure that this would

cweagans's picture

I'm not sure that this would be a good solution.

First problem: it encourages people to hack core. Yes, it's one line. Yes, it's designed to be modified. I still don't like the idea of asking people to modify index.php.

Second problem: If I'm running a multisite install with user1.com, user2.com, and user3.com, and I want to store config for those sites in each of those users' home directories on the server that's running the site (for whatever reason), how would I do that? Are you suggesting that I put that logic into index.php? That doesn't seem very clean to me.

Alternate solution: we already will have a file that holds the signing key. Can we add some options to that file that will allow defining where the config is located?

--
Cameron Eagans
http://cweagans.net

sites.php

catch's picture

This sounds like a good use for sites.php to me.

Also sleepy, but similarly curious about the key file

muriqui's picture

Like bjaspan, I'm probably too sleepy to be thinking about this, but I have a quick question about the key file as well... If I'm an attacker and I've already gotten far enough that we're worried about me writing a malicious value to the config files through some unknown exploit, then what's to stop me from using such an exploit to read the key file and perform the same automagic calculation to sign my changes to the config in the same way that Drupal would when changing a config value?

Agreed. I don't understand

glennpratt's picture

Agreed. I don't understand how we can securely hash the file without a public/private key system that requires the user to submit the private key to make changes.

Anything less seems like security through obscurity.

Incorrect

David Strauss's picture

That's incorrect. Unless someone dropping malicious configuration files has the ability to either (1) read the key or (2) overwrite the key, the signatures provide us with tamper-evident configuration files. A vulnerability allowing writing of arbitrary files to files/ or config/ wouldn't generally allow (1), and (2) is avoidable by having the key file be read-only and owned by a user other than the web server's.

Also, as someone who frequently engineers systems for security, I'm quite annoyed when people claim "security through obscurity" without any evidence that the security of the system hinges on lack of knowledge of the design. A system doesn't merely "seem like" security through obscurity; it's the accuser's burden to demonstrate what part of the system design being public knowledge weakens it.

Thanks for the explanation. I

glennpratt's picture

Thanks for the explanation. I said "didn't understand" and "seems like"; truly didn't mean to offend. Your comment will help others understand, since I'm one of at least three in this thread.

The config directory will be writable by the process running PHP and this is protecting against the chance that someone could drop a file in config without executing arbitrary PHP code or otherwise gaining access to the file system.

The config directory will be

David Strauss's picture

The config directory will be writable by the process running PHP and this is protecting against the chance that someone could drop a file in config without executing arbitrary PHP code or otherwise gaining access to the file system.

Yes, that's correct.

Thanks for the explanation. I said "didn't understand" and "seems like"; truly didn't mean to offend. Your comment will help others understand, since I'm one of at least three in this thread.

The phrases "don't understand" and "seems like," as you used them, serve more to soften the blow of criticism than to express uncertainty. Consider the following: "I don't understand how we can build a bridge 50 feet long with a pile of only 10 bricks. It seems like you didn't consider the necessary materials to actually construct this bridge." It's a pretty similar statement, and the speaker doesn't have a doubt in her mind that building the bridge is impossible.

There are better ways to ask for an explanation (if that's what you're seeking) than to add qualifiers to a critical statement. For example: "Knowing how the key works and having heard about 'security through obscurity' risks, I would like to understand whether this system has that problem."

And, to clarify, I'm not offended. I just often encounter claims of "security through obscurity" based on a gut feeling or requiring data (usually a private key) unique to a single deployment to be secret, rather than requiring the design or implementation to be secret (which is actual security through obscurity). It gets tiresome to refute these claims because they're not well-founded, leaving no way to argue against them other than defining what "security through obscurity" is and pointing out that there's no basis for the claim. I would much rather explain the concept in a non-confrontational context.

The more I consider the

Josh Benner's picture

The more I consider the key/hash approach, the more I tend to agree with muriqui and others who point out that a compromise allowing writing to config files is likely to be a compromise that allows me to read the site's key. Like glennprat says, adding the extra step of signing seems to be just that: an extra step (obscurity) -- as opposed to an extra layer.

Layers of security seem to best be identified by their distinct natures, not by the number of actions required to take advantage of a vulnerability. For instance, requiring a user to enter a root password for each sudo command is not multiple layers of security, but rather multiple instances of a single layer, that once bypassed, is no longer effective in any of its instances.

In the case of writing to the config file vs. signing it -- is this not actually slightly different instances of the same layer of security (privileged file access)? Of course, this is based on the assumption that if I can write to your config files, I can also read your key file. Is this assumption false?

The more I consider the

David Strauss's picture

The more I consider the key/hash approach, the more I tend to agree with muriqui and others who point out that a compromise allowing writing to config files is likely to be a compromise that allows me to read the site's key.

Any compromise that would allow you to drop a file and read the key would also be a compromise that lets you read the current settings.php file. You haven't demonstrated any new attack vector.

Hopefully, some answers

David Strauss's picture

Copying declarative data describing site configuration does not address the need for imperative statements (i.e.: update functions) to migrate from one config to another. For example, when you migrate Field API settings, you have to perform database changes. It may not be (probably is not) possible to determine the correct sequence of events in all cases automatically by inspecting the before and after declarations.

Imperative changes would have to happen when the site updates the "active" configuration from the on-disk one. We would certainly confirm with the administrator if anything potentially destructive would happy by applying the new configuration.

When merely looking at the current and desired states is insufficient for (safely) applying the configuration changes, I would imagine some hinting would happen to guide the process. Let's say you had a field named "Prince." If you renamed that field, the updated field might include in its configuration the item: {"formerly_known_as": "Prince"} so that a system could know that it should rename the field "Prince" to the new name, not delete "Prince" and create a new field. Hinting could even record information about multiple previous states, if a site is worried about two renames happening before the first is fully deployed. Ideally, hinting can get cleaned up, but I doubt even large sites would have too much cruft from this sort of data.

More likely, we'll transition more things to be tracked by UUIDs internally or keep the machine name immutable. Then, no hinting would be necessary to identify the configuration lineage of items like fields.

It seems like the config files should be stored out of the web server's docroot directory entirely, just like the private files directory should be. The sites directory could store the path to this location so we do not have to require the same directory structure at all hosts. Managed Drupal hosting environments like Acquia Cloud and Pantheon could then write the location they create for this purpose into the file during deployment, thereby telling Drupal where to read/write the files.

We intend to support this configuration, but we didn't consider it critical to the basic system design.

Perhaps I am missing something, but it is not immediately obvious to me what benefit the read-only key file provides. An attacker that can write to the config files can write to the key file. I can almost think of a way to make this useful: The key file can be owned by a user other than the webserver and (if different) the PHP user, but have the same group as the PHP user. This allows PHP code to read the file, but even an attacker that takes control of the web server or PHP cannot modify it unless he also breaks root on the server (in which case, game over). This is only almost useful because if the key file perms are that way, then there is no reason all the code file perms cannot be that way, again making the key file not useful. (FYI, Acquia Cloud actually uses this approach for all of a site's code.) What am I missing?

The document must be unclear. We intend the key file to be owned in a way the web server cannot write to it or gain write privileges, as you suggest. The intention of a read-only key file is to allow configuration files (since they're writable by the web-server) to be tamper-evident if an attacker exploits a security hole in a module's file handler to overwrite or add a configuration file. If, say, a flaw in a file attachment module allowed you to write to the configuration store, it's unlikely that same security hole would allow you to read the signing key or force the server to sign the file. It's defense in depth.

OK, that makes some sense, but...

muriqui's picture

I get what you're saying here, but without getting into a debate over how likely or unlikely it is that a flaw would allow access to both write the config and read the key, I'm still concerned that having the signature inside of the same file that it's supposed to be verifying (and thus able to be rewritten in the same operation as the config data itself) will prove less secure than it seems. Could we perhaps store the verification signatures separately? An exploit that allows write access to one file and read access to another may be unlikely as you suggest, but an exploit that allowed write access to two separate files in two separate locations would imply that the attacker has unfettered write access to the filesystem, in which case there's worse they can do than overwrite config.

Yes

Crell's picture

Could we perhaps store the verification signatures separately?

That's the plan. The key would live in sites/default/key.json.php (or something like that), while the files signed by it would be in sites/default/config/. key.json.php would not be apache-writeable. It would be locked down about as much as settings.php is now, and even in the same directory. (If we can make the install process suck less than it does now in that regard for setting up the file and permissions it would be great, but that's another story.)

You're correct that if someone manages to get access to the sites/default directory to write arbitrary code there, you've got bigger problems as it's much easier to just hack a .module file directly at that point. :-)

Actually, I think you

muriqui's picture

Actually, I think you misunderstood my comment. I get that the key is stored separately. My concern is that the computed digital signature for a config file is written at the top of the config file itself, and thus any exploit that allowed you to tamper with the config file would also allow you to overwrite its computed signature.

David was of the opinion that it was unlikely for an exploit to allow both write access to a config file and read access to the key file. I'm less sure of that, but that's hypothetical and not worth debating. Assuming it WAS possible, though, the attacker could then recompute the signature and cover his tracks. My question was whether it is feasible to also store the computed digital signature outside of the config file. That way, the two would only be writable via the same exploit under a worst case scenario in which overwriting config would be the least of our concerns.

Ah

Crell's picture

Ah, I see. Yes, I did misunderstand what you meant.

We originally had the has file separately: foo.bar.json and foo.bar.hash. However, after conferring with the security team members present and adding the <?php die(); header, we concluded that since the file was no longer "pure" JSON anyway we could just toss the hash in there as well and safe a file operation.

We could still store the hash in a separate file, but if it's in the same directory then there's no change in security. The odds of someone gaining access to edit sites/default/config/foo.bar.json.php but not sites/default/config/foo.bar.hash.php are, well, I'm not actually sure how you could do that. You'd have to move the hash files to another directory, and I'm not sure how a second writable directory (sites/default/config_hash?) with the same permission settings as the config directory would be an improvement, since both would still need to be apache-writeable.

True enough.

muriqui's picture

OK, I concede. :) And yes, I was thinking of having the hashes in another directory, but you're right; there's not much (if anything) to be gained by that. I guess that makes me "+1" on this plan.

We're using the standard approach

David Strauss's picture

It's completely standard digital signature procedure for the signature to accompany the data being verified. It's how DKIM for SMTP works (in an SMTP header). It's how SMIME and PGP/MIME work (in the body or as attachments). If you're at the point where someone could update the data and resign the content, the signature system is considered compromised.

Now, the configuration model we're using still requires another weakness in the system for you to even get to the point where, having the private key, you could cause trouble. The signature system is defense in depth.

It's how DKIM for SMTP works

muriqui's picture

It's how DKIM for SMTP works (in an SMTP header). It's how SMIME and PGP/MIME work (in the body or as attachments).

True, but in those systems, the private key is typically not sitting within reach of the signed content when it is verified. The private key remains with the sender, while verification by the receiver uses a separate public key... Not that I am arguing for such a model in Drupal, mind you, just making a comparison. The signature system is a good idea, but it's not a full-blown PKI for digital signatures, nor (practically) can it be and we shouldn't regard it as such.

The signature system is defense in depth.

And yes, I get that. I'm well versed in security engineering, too. The "in depth" line has been mentioned several times in this discussion... With respect, I don't think it needs more repetition. I fear (from reading some of the other comments here), that this discussion is becoming needlessly heated between the team that created this plan and those of us asking questions about the implementation. So please, David, don't take any of what I've said here as an attack on you or the ideas presented here (and if I have misinterpreted your tone and intent, I apologize). I don't think anyone is arguing that adding layers of protection around the config is a bad idea. Nor am I saying that the signatures are unnecessary; quite the opposite. My concern was simply whether or not this implementation sufficiently protects the signatures so that they can do their job. But as I said above, I've already conceded this point... As you and Larry have already demonstrated, there doesn't seem to be much more we can do to make it more secure without becoming needlessly complex, nor does it really introduce a new attack vector vs. the current settings.php model.

True, but in those systems,

David Strauss's picture

This isn't really about winning any debate or conceding. It's about reaching consensus that this is the best balance of security, usability, and compatibility we can achieve. In that light, I'll go ahead and address your concerns.

True, but in those systems, the private key is typically not sitting within reach of the signed content when it is verified.

I didn't get into the location of the private key because this is the objection I addressed, which doesn't mention anything about key access:

My concern is that the computed digital signature for a config file is written at the top of the config file itself, and thus any exploit that allowed you to tamper with the config file would also allow you to overwrite its computed signature.

I responded that it's a well-accepted security model to bundle signatures in the same unit (file, email message, etc.) as the data being signed. Any concern about key access is wholly separate, but I'll go ahead and address those now, too. :-)

When it comes to key availability, I explained that an attack vector allowing writing an arbitrary file seems unlikely to simultaneously allow reading arbitrary files. To go into more detail, this is because uploads are where most custom module code goes to work, while downloads are usually directly through the web server or (in rare cases) Drupal core's private file delivery mechanism. Since they don't really share a code path (other than the HTTP server), it would be hard for an exploit of one to stem from the same problem as an exploit of the other.

Now, if .htaccess to deny PHP execution in files/ and config/ is broken and you're allowed to write arbitrary files to places like config/, then you could write an executable PHP file. That file could print the secret key, but it could also deliver the malicious payload directly without bothering with any signature system. This exact risk exists now with the files/ directory, so we're not introducing a new one with a second, more carefully protected, config/ directory.

Not moving the goalposts, clarifying my position.

muriqui's picture

Actually, if you look back to the top of this reply thread, you'll see that my concern regarding overwriting the signature was only as related to the accessibility of the key, in that I was not convinced that both could not be exploited simultaneously (though, as you've said, it's unlikely). It makes no sense to try to make those into two separate arguments, since modifying the signature without the key would achieve nothing.

And, as I've said repeatedly now, I've conceded this point. As Larry said above, separating the hashes achieves no tangible benefit, and I agree. I've already given my +1 to the plan as written. Please, let's move on.

Well, shoot, now you've

muriqui's picture

Well, shoot, now you've edited your comment, and my last reply makes no sense. :) In any event, it sounds like we're on the same page here.

This isn't really about winning any debate or conceding. It's about reaching consensus

I was using the word "conceding" somewhat tongue-in-cheek... Sorry if that didn't come across. Consensus has been reached... For my part, at least. :)

Yes, I did edit my post after

David Strauss's picture

Yes, I did edit my post after realizing that it wasn't productive to accuse you of moving the goalposts. I should have clarified the changes. Sorry.

No problem. :) Glad we got

muriqui's picture

No problem. :) Glad we got this all hashed out. (pun not intended)

David: Thanks for providing

Josh Benner's picture

David: Thanks for providing the extra detail on the potential attack vectors. This both makes a lot of sense and helped me wrap my head around the utility of the signing.

It's standard when the

pounard's picture

It's standard when the protocol or formalism allows you to. You are basically creating a new file syntax. Personally, if I edit JSON, I pretty much like my editor telling when there's a syntax error.

If JSON was made for carrying a hash, why not. But right now, this horrible new bastard syntax will only do one thing: create buggy behaviors with most syntax highlighting enabled editors.

Best case: the editor just does nothing.
Anooying (and more likely to be) case: The editor attempt to parse it as PHP and displays red and errors everywhere.
Even more annoying case: Eclipse, Netbeans all other complete IDE with error reporting will mark all full projects as being erroneous because of these hash.
Worst case: weak editor that crashes (it can happen).

Almost all cases: if two uses recompute the hash on their side, then commit their files, pretty much all VCS will conflict (even git probably will). If the files were just nicely modified normally, merge would happen, but the hashes won't be mergeable at all here.

Nice twisted JSON, this will be the worst DX I will ever experience, quite sure of it.

Pierre.

easy to test

catch's picture

It is a pain to embed images in groups posts apparently, I had a nice vim screenshot, but g.d.o has a syntax highlighter, so we can see what it looks like with that at least.

foo.json.php in vim was picked up as PHP file on my localhost as you'd expect, and highlighting rules were applied the same as g.d.o does it:

<?php
die(); 723fd490de3fb7203c3a408abee8c0bf3c2d302392
 
"string_value": "string",
 
"integer_value": 1,
 
"array_or_object_value": [
   
"value1",
   
"value2"],
 
"boolean_value": false,
}
?>

It's not actually as bad as I thought it'd be, but probably depends on the editor as you say. Since one of the goals here is to have human editable files, I don't think it's unreasonable to ask how editable they'll actually be.

Hmmm. If PHP needs to be able

bjaspan's picture

Hmmm. If PHP needs to be able to write to the config files, then the approach I described of making them not writable obviously won't work. In that case, having a key file that is special and non-writable could be useful.

I think I'm too tired at the moment to think clearly about this.

The key file is not in the

heyrocker's picture

The key file is not in the same location as the config files, it will go in site root with settings.php and will assumedly be set to be read-only by the web user.

Like the idea, can't get the code

johnbarclay's picture

When I use the git command in your sanbox:
git clone http://git.drupal.org/sandbox/heyrocker/1145636.git configuration_management_initiative

I get the error:
warning: remote HEAD refers to nonexistant ref, unable to checkout

Anything in code in this area is great! My feeling is that the line between content and configuration is ambiguous and the site admin/developer needs to be able to pick what is content and what is config. Just as with features and exportables there will be modules to push the edge of what is called configuration. Since you are focused on the api, I think its important to keep this use case in mind.

You need to check out the

David Strauss's picture

You need to check out the 8.x-file-config branch. Use the "-b" option when cloning.

There's no master branch on

xtfer's picture

There's no master branch on that project. Try

git init
git remote add origin http://git.drupal.org/sandbox/heyrocker/1145636.git
git pull origin 8.x-file-config

I don't see the json format

pounard's picture

I don't see the json format point at all if it's not for import/export ability. I mean if you store JSON in bastard .php files with PHP syntax in it, it's not JSON anymore. Plus, you'll have to encode/decode JSON without being able to properly export it as-is without having custom code that removes the first line.

This doesn't make any sense IMHO. Either store you files in .js files (easy to read, easy to write, easy to import/export) or store it as PHP arrays if you store it in .php files (better performances and can be cached by OPCode caches). Mixing the two seems obfuscated.

And if I want to edit manually my files, how do I compute the hash? The whole goal of having a predicactable and clean API is to allow people that deploy Drupal sites (for instance sysadmins) to manually edit the files using a well know formalism without having to click into the UI. And this format seems to go definitely into the wrong direction. It's not KISS (Keep It Stupid Simple), take all the cons of both PHP and JSON without the pros while even PHP arrays would be simpler to handle, simpler to edit, as efficient for diff with VCS, and incredibely more faster to read.

Plus, with the proposed API, which should I do: $foo = variable_get('a.b.c')->d or $foo = variable_get('a.b')->c->d or $foo = variable_get('a')['b']->c['d']? Who decides how it's being cut, and why would'nt be able to directly $config->a->b->c->d or config('a.b.c.d')? Seems a bit obfuscated also, because in a hierarchical/schema based/registry style configuration, you need to know the key path, but here you need to know how it's being splitted moreover!

If there is business functions into the configuration object, what's the point of making it exportable if some stuff are computed at runtime?

Pierre.

I had the same concerns as

aspilicious's picture

I had the same concerns as you yesterday but I can answer on one part.

"And if I want to edit manually my files, how do I compute the hash?"

If I understand hejrocker you don't need the initial hash. Because it will get added during module instalation. And if you edit a file manually the hash wont change so that case is covered to. If you copy a config file from one test site to another the signature of the config wont match anymore BUT the UI will notice this and say, hey you changed the config file of module x, go to that settings page to load the new config.

See also the part about version control systems. And why should you remove the first line while exporting? Are you going to use your drupal config files in an other system? If not the UI will notice the changes and handle it.

I can't answer on the splitting stuff because I have a question myself about it. But I think that every module is an object so you always have to split on the module name but that part is still a little confusing.
1) Core will be an object holding sitename and other core stuff config(core.xxx)
2) All modules (and probably all themes) will be an object to like: config(module.mymodule) or config(theme.mytheme) (I think)

And to answer on Pounard you always will have to split on the last part. So config(a.b.c)->d (with d a config in file a.b.c)

OK now my questions:

config_get_names_with_prefix('entity.node.');. This will return something like array('entity.node.page', 'entity.node.page.article')

Are you sure the last part is 'entity.node.page.article' ? (because that would mean the theory is wrong)

If there is a possibilty for entities and views to split up their config files into sub pages. Will their be a possibility too for other modules to do that? If I have 1000 configuration options in my module. 500 for the frontend and 500 for the backend. Can I split it into: module.mymodule.core and module.mymodule.admin? Or is it advised to put everything into one file. If it is lazy loaded it would load 500 in stead of 1000 settings.

Suggestion

I don't know how similar php is comparing to java but I would avice throwing an exception if it's not allowed to execute the config('xxx') command on xxx. You could handle it with a try catch to search deeper in the array.

I'm going to reply on my own

aspilicious's picture

I'm going to reply on my own post. It doesn't make sense for a module to split it into several config files. If there are config items needed in the admin UI and the frontend you would need to load 2 files. And thats terrible.

I think its prety easy.

memory and i/o

catch's picture

I spoke to chx about this in irc a bit (not this question directly but the general issue of memory and i/o when loading config options), he suggested putting that into a sandbox issue rather than here, so I opened http://drupal.org/node/1187726.

I'd think if you had a module that had 1,000 config items, you'd probably want them split into several files rather than one or two - assuming they weren't always going to be requested together. An example of a module that could end up with this is the Field API - we specifically need to move away from loading every field and every instance on every request as we do now, so either storing each as an individual file, or grouped by bundle, or something else may end up making sense.

By bundle

chx's picture

It certainly makes sense to store 'em by bundle. Think of looking at a page node with its fields. Almost certainly you dont need the field items on, say, Commerce line items :)

Thaks, fixed

chx's picture

It's now entity.node.article as it should be.

PHP files are actually not

heyrocker's picture

PHP files are actually not all that faster to read, json_endode/decode are incredibly fast operations. I also don't see the opcode cache as being all that much of an advantage. On most production servers we set the cache to never reload unless we specifically tell it to, which means that in order to deploy configuration you would have to reboot apache! Pass.

You always variable_get() the tip aka the actual filename, anything else will fail. I'm not completely sold on this actually, but this is how it works at the moment. I also disagree that PHP files would be easier to edit. JSON is very human-readable.

It's my understanding the

catch's picture

It's my understanding the actual json files will be read from very rarely - only when syncing with the read backend, which is going to be a specific admin operation (ui or drush).

In that case it makes more sense to keep them out of an opcode cache - since they will likely always be an opcode cache miss if this is the case (which is going to be worse than no opcode at all for that request, and potential evict some other item if you ran out of cache space). Additionally, if you read a file stream, you can free up that memory during the request once you're done syncing or it goes out of scope, this isn't the case if you include a PHP file.

This wouldn't stop someone from building a PHP file read backend for the config system - sync from json to PHP files then read from the PHP files. There's an issue in the drush queue for a drush cache which is doing something similar (the PHP file caching, not the canonical json storage).

To go further than

pounard's picture

To go further than performances whereabouts (which indeed as says catch, these files won't be read often) is more the PHP/JSON merge which is disturbing, non standard, weird. If you store JS, please call your files .js, if you store ini, call it .ini, if you store PHP arrays, call it .php, but do not make this obfuscated, this really doesn't make any sense.

Pierre.

We can't hide the .json files

heyrocker's picture

We can't hide the .json files from web browsers any other way, except to rely on .htaccess which we have been informed is not enough. Do you have an alternative suggestion?

Configure your web server.

pounard's picture

Configure your web server. Drupal already uses rewrite rules and such anyway (and if you don't use the core .htaccess, your .git or .gitignore files will be also accessible.

Pierre.

And if there's the option to

catch's picture

And if there's the option to store config files outside the webroot (and this is strongly encouraged same as private files with flashing red warnings or whatever), then you would need to both have the folder misconfigured and have your webserver misconfigured. By that point you probably have other issues too.

As I mentioned earlier this

greggles's picture

As I mentioned earlier this is far too common to just say it's the user's fault.

It's a nuisance when the .info and .module files can be read. It's closer to a security breach when the config can be read. We should ship a product that is known to disclose this information for ~5% of our target audience.

Even if we have huge flashing

catch's picture

Even if we have huge flashing warnings telling people to put config files outside the webroot?

I think I'm not a stupid

aspilicious's picture

I think I'm not a stupid drupal user. But I never put something outside the root. I would not know how I can do it, I even don't know if I can (with my cheap shared hosting: one.com). So the warnings wont help if the users don't get it.

I have never seen a host that

glennpratt's picture

I have never seen a host that doesn't allow you to store things out of webroot and I haven't used a framework or large application that doesn't store things outside of webroot.

I really think we should consider requiring a directory out of web root. Every Drupal site I've ever made stored something outside of webroot. Today I store our settings.blah.php out of webroot and include it from settings.php.

All my vhosts look like this (and have since shared hosting days WAYYY back):

my.host.com/www/index.php

or

my.host.com/html/index.php

People like me just copy

aspilicious's picture

People like me just copy drupal and place in the root.

mysite.com/index.php

Thats maybe plain stupid but nobody every told me not to. As I said I'm a typical stupid user on the technical side of a website. Most users act like me (with small personal websites).
So if that person installed his website without making a www folder he is screwed? (without reinstalling)

Or maybe I'm wrong again which prooves again how silly it is to think people know how to secure their site :)

I'm not saying you should do

glennpratt's picture

I'm not saying you should do things like me, I'm saying it's been possible for as long as I can remember.

What I am suggesting is that requiring you to pick a private config directory seems perfectly reasonable to me; especially if it helps us have a better DX for configuration files.

Given the architecture that was suggested, we could just leave configuration in the database for users who are incapable of setting up a private config directory.

This line of reasoning is

greggles's picture

This line of reasoning is done.

  • We must allow for sites where people can't or won't move things outside the webroot.
  • We must provide for defense in depth in cases where .htaccess/webconfig will NOT be read

There is a large number of Drupal users who are in those situations and we can't turn our backs on them.

assertions not reasoning

catch's picture

There is no reasoning in your comment, just "we must".

The numbers you posted below do not add up (the number of sites that have a ?q=something page indexed as opposed to sites with something actually exposed).

So please justify "large", and "turning backs".

VCS

Crell's picture

Putting config outside of the webroot also makes version control more annoying to deal with, as you have multiple "roots", or else need to version control from one level up from both webroot and config. I frequently don't have access to the parent of my webroot on a host where I don't have root.

Of course you will version

Sylvain Lecoy's picture

Of course you will version control from one level up from webroot. Imagine your .git folder visible for public ? No way..!

Majority case

Crell's picture

The majority of Drupal sites, if we are honest, are run by people who are lucky if we can get them to use a version control system at all. There are plenty of sites that have a .git directory or lots of .svn directories in a public webroot. That's why we filter those out in the default htaccess file. (Remember, lots of companies use SVN or something other than Git for version control.)

I don't disagree that in the ideal case you'd want your webroot to point to somewhere inside your repo and your config it outside it. I do think that the idea of putting the entire code base outside the web root (as some projects insist on doing) is foolish.

I have, however, never encountered that ideal case in the wild. That's simply a fact of life in the PHP world.

Increasingly I see sites that

catch's picture

Increasingly I see sites that are putting modules, themes and Drupal core at the same level in vcs, then symlinking sites/all/modules and sites/all/themes, it's not an uncommon pattern.

Completely agree that putting the entire code base outside the webroot is unrealistic.

I am maybe a not common user

Sylvain Lecoy's picture

I am maybe a not common user but my git repository is:

.git
.gitignore
.gitmodules
htdocs
- includes
- modules
- profiles
- ...
assets
- psd
- bitmaps
conf
- apache-conf.site1.conf
- apache-conf.site2.conf
...

Then my virtual folder link to htdocs, and configurations files for those websites links to the conf files.

This ensure the .git folder is non readable by the webserver, but of course this is not an argument for svn which is by folder.

Fair enough, I'm realizing

glennpratt's picture

Fair enough, I'm realizing I'm not a common user.

I would still argue that the option to store 'sites//config' outside of webroot as plain JSON files would provide the best DX, but I'm sure we don't want two solutions in core.

Possible

Crell's picture

One possibility we discussed was a boot.json.php file (or something like that) which was hard coded to sites//config, which could optionally specify where the rest of the configuration lived by absolute path. That's probably something that could be added later, although even then I'm certain we do not want to have a different file format at that point. That would make the parsing far more complicated.

Putting vcs one level above

xtfer's picture

Putting vcs one level above your site root is normal practice, afaik. If Drupal came already built that way, it would be even easier. The Drupal folder structure could well look like

<

pre>
config
www
index.php
includes
modules
sites
etc...
files-private

already a problem

Gábor Hojtsy's picture

That is already the case for your .module, and .info files at least right? How come configuration is that much bigger a concern?

It is going to have much more

heyrocker's picture

It is going to have much more potentially damaging information, like for instance your database password, paths to file locations, any number of possibly desctructive things.

OPCode caches compile PHP and

pounard's picture

OPCode caches compile PHP and keep the compile bytecode (actually pseudo-compilation but same advantages) in memory. This means that this code is shared to all PHP threads (common memory, fake as it is because immutable, but still shared). If you compile a PHP file with an array inside, you won't have any I/O at all at load time because the file will be compiled into memory, and it won't cost more than the array instanciation itself (from already compiled code) while json_decode, as fast as it can be will always need to parse file contents, do sanity check, instanciate the whole lot of information, all that at runtime.

JSON is very human-readable.

Agree, but PHP json_encode() function will print an ugly an non readable one-liner. I hope you didn't plan to write your own replacement for that.

Pierre.

If we store the right stuff

catch's picture

If we store the right stuff in these files though (i.e. things that get updated at most once per day if not once per week on a live site), then it is going to be the difference between a json_decode and an APC cache miss, I'm not sure the APC cache miss will win in that case (although that could be a fun thing to test :))

APC also optimize include and

pounard's picture

APC also optimize include and require operations, which could make it faster even with a cache miss than loading file buffer using the stream wrapper API then removing the first buggy PHP line, then end up with content parsing at runtime.

EDIT: I see your point, interesting debate though, but this is a bit off topic, sorry :)

Pierre.

then end up with content

David Strauss's picture

then end up with content parsing at runtime

You need to re-read the specification. The files on disk aren't parsed at runtime.

Agree, but PHP json_encode()

David Strauss's picture

Agree, but PHP json_encode() function will print an ugly an non readable one-liner. I hope you didn't plan to write your own replacement for that.

Of course we're writing our own replacement for saving configuration files. We want nicely structured, human-readable, diff-friendly output. Also, Zend has already published a "pretty JSON" output function as a starting point.

Not a pure tree

Crell's picture

Although we originally wanted to have the config system represent data as a unified tree, as earlier writeups from Greg hoped, we had to drop that concept. Magically detecting where we "should" split different files turned out to be far too complex. So don't think of it as a big tree that's being "split" in various places. Think of it as a hierarchical naming scheme for moderately complex configuration data objects.

To know what the name of a given config object is, you would need to know the module you're writing against. That's no different than with variable_get() now, really, where you need to know the variable name. However, with the hierarchical naming pattern for the config objects there is at least more consistency; if you guess that a random module's configuration will be at config('module.$modulename'), there's a 99% chance you're right. And if not, either that module has very good reason for being different (extremely complex configuration) or you should file a bug against it. :-) That's a considerable step up from now, where the variable table is only sometimes kinda sorta standardized, and many modules have a separate, completely custom system anyway.

Agree that JSON is not necessary +1 pounard

sidharth_k's picture

I understand that JSON is being used as a "better" and less "clunky" XML (which has been traditionally used in many projects especially in the Java world to store static configuration). But I don't see these configurations having any other use outside Drupal and therefore it might be a good idea to stick with native PHP arrays in the config files. As mentioned there would be performance gains and opcode caching which may not happen if we used JSON.

Actually there are lots of

heyrocker's picture

Actually there are lots of situations in which making these configuration files available to external tools would be useful. Think about deployment / server management tools like puppet or chef. See my reply to pounard's comment for thoughts on the opcode caching issue.

Yes why JSON is the selected

Sylvain Lecoy's picture

Yes why JSON is the selected format ?

PHP arrays are also human readable, but also IDE-friendly (Eclipse PDT have build in DLTK: Dynamic Language Tool Kit which transform the structure of a language into a tree that is understandable by programs/plugins/tools).

Moreover, serialization/deserialization introduces overhead. And as far as I remember, json_encode return a flat string, which is not really readable.

Also, If you think using hash like we are doing with the registry_file to check if the content has changed, this has a performance impact to consider. I'm not sure this will be very scalable. Less scalable in my opinion than a opcode cache miss.

PHP arrays are also human

David Strauss's picture

PHP arrays are also human readable, but also IDE-friendly (Eclipse PDT have build in DLTK: Dynamic Language Tool Kit which transform the structure of a language into a tree that is understandable by programs/plugins/tools).

Moreover, serialization/deserialization introduces overhead. And as far as I remember, json_encode return a flat string, which is not really readable.

We're not using the standard PHP json_encode in the final implementation. We'll use a custom JSON generator that makes human- and diff-friendly output. (Or, we'll reformat the json_encode output.) Even if we wanted to use PHP arrays, we would still have to create a similar generator. PHP does not have a built-in "generate executable PHP that would create this object/array" capability.

The serialization overhead is irrelevant to the design because it's only encountered when updating configuration, not when normal users are on the site. json_decode() has been benchmarked to be extremely fast, and caches like memcached and APC already require lots of unserializing of data that we should focus on before worrying about the config system.

Ok this is all about

pounard's picture

Ok this is all about reinventing the wheel. You should have said it upper that you don't like using whatever anyone else done before.

Pierre.

Is there an existing

Josh Benner's picture

Is there an existing implementation of a capable data storage format that is:

  1. Human-readable/modifiable,
  2. Diff-friendly,
  3. Storage-only (as opposed to executable),
  4. Comparatively succinct, and
  5. Signed?

These are all good things. Even if straight PHP were to be used, equivalent or more engineering would be required to meet these goals. JSON (documented standard growing in popularity) plus some minor engineering on top of existing code (ie Zend or other's JSON formatting) seems like the opposite of re-inventing the wheel.

Signed is the tricky one

Crell's picture

There are lots of data formats that meet 1-4 to some degree. JSON, YAML, and XML all come to mind, and we considered all of those during the sprint.

The problem is the signing. I do not know of any file format that has a built-in signature system and still meets criteria 1-4. That means whatever file format we use will have to be "hacked" to include signing some how. It also needs to be non-readable from a web browser when you go to example.com/sites/default/config/super.secret.config, because the majority of sites will be storing their config file there, and not outside the webroot somewhere. (And I don't believe that "well don't do that" is a viable option).

Given all of those, JSON with a fake PHP wrapper came out as the least ugly option. I hate the PHP wrapping too, but so far we've not identified anything more secure.

I'd like more feedback on

glennpratt's picture

I'd like more feedback on http://groups.drupal.org/node/155559#comment-521809

Hoping for us to use a plain, valid format, with the security+hash.

Is JSON such

pounard's picture

Is JSON such implementation?

JSON: Nope, missing 5
INI / Info : Nope, missing 5
XML: Node, missing 5, 1 is discussable (but I tend to say yes)
YAML: Same as XML (IMHO)
Custom file: Could be all 5, but why do you embed JSON then?
I miss a lot, probably.

But now, if I put the signature aside instead of inside the file, 5 becomes definitely yes for all.

Re-inventing the wheel is putting a PHP executable tag inside the file, and appending the signature in a non standard way of doing it. Whatever you do with your nice die(), if files are modified, it can disappear, and if attacker is doing fuzzy searches, it will see a nice WSOD that says "Hey, I'm here!" instead of a natural 404 or 403 error.

Pierre.

That's the point, some

glennpratt's picture

That's the point, some invention was required to meet the security concerns. That isn't why JSON was chosen.

The point is there doesn't seem to be a standard solution that protects users from the exploits suggested in this thread. My only question is, can we make the file valid JSON.

yes

chx's picture

i already wrote how please check it up.

Yes, I really like something

glennpratt's picture

Yes, I really like something along those lines. I had a suggestion:

http://groups.drupal.org/node/155559#comment-521809

That was my point. There is

Josh Benner's picture

That was my point. There is no file format that naturally meets all these needs. The solution proposed in the original post outlines a minimally hackish approach to using JSON and still trying to keep it relatively secure. You accuse them of re-inventing a wheel, but the original wheel doesn't exist.

You try to criticize the approach by asserting that modification of the file can lead to removal of the PHP die(), which in turn can allow the hacker to inject arbitrary code to be executed (or configuration to be loaded, depending on how you read your remark). This is true, if you ignore the note in the original post that details the .htaccess placed in the config directory.

Now, if there is a .htaccess misconfiguration AND the attacker can write to your config (removing the php die()) -- the severity of the compromise is identical to the same scenario with the current files directory protection (based on .htaccess).

Why signed ? Is settings.php

Sylvain Lecoy's picture

Why signed ? Is settings.php file need signature ? I don't think, where in the whole configuration software management science you encountered a requirement for having signed configuration ?

PHP does not have a built-in

Sylvain Lecoy's picture

PHP does not have a built-in "generate executable PHP that would create this object/array" capability.

No, but views export does, and it has been proven to work well.

changes?

fago's picture

Have you planned to support the ability to react upon changes?

In practice this ability is often needed. E.g. to set-up database tables for new fields, initialize new search indexes, track name changes of configuration objects, ..

Yes

David Strauss's picture

Have you planned to support the ability to react upon changes?

It's intended to be part of updating the "active" configuration from the "on-disk" configuration.

overriding signatures

drifter's picture

It would be cool if there's be a way to ignore or override the signatures. I imagine it would often be faster to just go in and edit some configuration in the file itself, rather than clicking through the interface. JSON is a nice format that is both readable and writable, so it lends itself to this. Dangerous, I know, but useful.

There will almost certainly

heyrocker's picture

There will almost certainly end up being a drush command to re-sign and reload all your files.

Security considerations

valthebald's picture

I have 2 concerns regarding suggested solution, both from security perspective
1. Digital signing of config files would not help if hosting environment is compromised (shell access or something like this). Obvious solution to this is encoding config files with PHP encoders like SourceGuardian, Zend Encoder or similar (approach that we use on production servers, and it works well). Since config files are not valid PHP files, encoders won't be able to encrypt them
2. Our production servers use read-only file system for drupal installations, with files and cache folders being symlinks to writable filesystem. I don't think making config writable is a good idea

1. Digital signing of config

David Strauss's picture

1. Digital signing of config files would not help if hosting environment is compromised (shell access or something like this). Obvious solution to this is encoding config files with PHP encoders like SourceGuardian, Zend Encoder or similar (approach that we use on production servers, and it works well). Since config files are not valid PHP files, encoders won't be able to encrypt them

That practice is just security through obscurity because anything that executes the PHP would have to be able to decode it. It's like adding an extra deadbolt to your front door but giving everyone the key.

2. Our production servers use read-only file system for drupal installations, with files and cache folders being symlinks to writable filesystem. I don't think making config writable is a good idea

That is not an argument. You've just stated what you're currently doing with an unexplained objection to the specified approach.

That practice is just

valthebald's picture

That practice is just security through obscurity because anything that executes the PHP would have to be able to decode it. It's like adding an extra deadbolt to your front door but giving everyone the key.

Consider we were talking about Java or .NET installations. Of course, runtime has an ability to decode and execute bytecode/CLR, but without source code it's much, much harder.
PHP encoders do exactly that - provide 'executable' opcode without original PHP source. By the way, your metaphor more suits original suggestion - digital signing sounds like extra deadbolt to the front (attacks from web), while on the physical level file contents can be read by anyone having access to the file system (everyone has the key). To continue metaphor, PHP encoder hides mentioned key somewhere around, making robbery more difficult

Sorry

chx's picture

I had a rather condescensing reply here which I deleted and want to apologize for it.

IHowever I still would really love if people would think through the possible scenarios where they perceive a security weakness and check whether they could own Drupal 7 with it. If yes, then there's hardly anything we can do. For example, even if you encrypt a file with one of these encoders and you have local read access then you can read the file and run a simple script it on your laptop which includes it and dumps the database array. Note that a better word for these products is obfuscator which should raise a huge red flag as in "security by obscurity".

As for read-only file systems, yes, we will need some mechanism to stop configuration from being changed. I didnt post here (sorry!) but I can totally imagine a scenario where you have a "beachhead" production frontend where the config files are (the other webfrontends dont need them) and where you load them into a Redis master and the other frontends use Redis slaves. Edit: the beachhead will, of course, be firewalled off from the rest of the world. Also note that we already used a such setup to roll changes to production, the beachhead was used to check for a catastrophe before rolling to every other server.

i18n

JeremyFrench's picture

Firstly, I'd like to say that I think this approach is a great start. You guys have covered a lot of ground and come up with a very good solution.

I have been following #154434 which is taking a very different angle on the configuration management theme. While I think that using fields will not fly for configuration management it does show that the current approach doesn't nativity support internationalization.

It may be hugely beneficial to bake language coding into the settings system. Site name being an example of a string which may be different, but there are other potential differences, in date formats, currency formats etc.

I don't know the solution. I figure we can either change the api so that locale is included $site_name = config('core.site_information','en')->name or we flag that a value that is retrieved needs localization.

user flagging and the rest of the glue code

Gábor Hojtsy's picture

For translation in reality we can provide great defaults, but users will need to be able to flag configuration pieces that are localized at the end. For contact forms say, some sites will want to translate the addresses for the contact forms, others will only want to allow translation of the subject and autoreply. This needs to be configurable and of course will mingle with non-translatable pieces like the checkbox on whether you send an autoreply or not. This sounds like being in the plans for later with supporting context specific configuration. However, language context can change often on a page. For example, a language list you might want to show all the languages with their native names, which required loading language names each in their own language context. Also, my favorite example is contact module, which can currently send up to 3 emails in up to 2 languages in 1 HTTP request, so it will need to set the language context for each properly.

Finally, as you pointed out, this is not yet taking the rest of the configuration "glue" code into consideration. Namely the UI, widgets, permissions, validation and rendering, which will still be custom glue code as of now. Eg. when we think of rendering the site name rendered for email will look different from site name rendered for email, and custom code will handle that. Clearly this is still to be looked into later, and we should not expect the first config sprint to solve all the problems, so no worries :)

i18n

David Strauss's picture

Just for the record, we've extensively explored the i18n options for a "Layer 4" in this design. Integrating all the context (including i18n) support turned out to be too heavy for us to include in our three-day sprint. We are confident that a good i18n system can be layered on top of the Layer 3 configuration API.

Integrate version control?

ontological's picture

Would it be practical to integrate version control instead of files on disk? This is probably too simplistic/complicated and we may have moved beyond this type of idea, but perhaps a workflow like:

  • Make configuration changes on development.
  • Put site in maintenance mode.
  • Go to the admin screen and click - save configuration snapshot.
  • On same screen click upload to version control.
  • Go to screen to enter your GIT credentials etc.
  • Save a Drupal site configuration snapshot to your version control system.
  • Go to production site.
  • Put site in maintenance mode.
  • Go to the admin screen and click - load configuration from version control.
  • Go to screen to enter your GIT credentials etc.
  • Choose the snapshot configuration you want to load.

Just thinking that to a certain extent we are recreating the version control wheel and if there is a way to leverage some existing version control system then it might save time in the long run. Also, issues like security are presumably already baked in to the version control system. Systems like GIT are available online, so setting up access from your sites should not be too difficult. I think there is PHP integration support for some systems. Perhaps choose the simplest version control system that does the job and allow module developers to add support for other ones if needed.

Since configuration changes only happen periodically, it would be possible to make time-stamped snapshots for the entire site at once, rather than track configuration changes module by module in real time, if that performs better and is simpler.

Some kind of consistency checks might need to be done, such as to ensure that the same modules and their versions are present on the system that the snapshot is being applied to.

Confusion on performance?

Josh Benner's picture

A lot of the comments seem to indicate that people are assuming these files are read on every page load. My understanding is that they are only ever read when syncing to the "active" or "live" database copy of the configuration. If my understanding is correct, then performance of hashing, parsing JSON, opcode caching -- are all basically moot.

Right?

It is, but why JSON would be

Sylvain Lecoy's picture

It is, but why JSON would be a value of choice, its not more database friendly than php arrays, it is ?

And about the readability, remember that json_encode return a flatten tree, which mean a single-lined string, which is in my opinion not more readable. Moreover, to allows extensibility, I took the example of eclipse DLTK (http://www.eclipse.org/dltk/) which can understand, navigate, and write PHP arrays. This tools can help managing configuration with IDE.

I am actually writing a plugin for eclipse (Drupal Development Tools) to have .info editors, .module generators, and configuration management. As eclipse got already lot of useful things (Git plugin, auto-completion, etc.), extending for drupal specifically is easy. But then, why having JSON, which is neither DB friendly, neither PHP friendly ? It may not have performance impact on runtime, but the choice is a bit wierd don't you think ?

Ok let's have an example:We

Sylvain Lecoy's picture

Ok let's have an example:

We agree that the level 1 is a conf file.

The level 2 is the Database representation of the conf file.

Now the level 2 is swappable.

I choose to implements my level 2 as APC cache.

Now, because the level 1 is a JSON representation, I had to parse this JSON file, and write somewhere a PHP representation, which is cached for APC. This is just transformation between level 1 and level 2 but this is an extra step which is not needed anymore; and can be skipped, using straight the conf file in APC. Less overhead, more conventional (PHP arrays for PHP, JSON for JS), as friendly as JSON, not less extendable. Do you see what I mean ?

Now, because the level 1 is a

David Strauss's picture

Now, because the level 1 is a JSON representation, I had to parse this JSON file, and write somewhere a PHP representation, which is cached for APC. This is just transformation between level 1 and level 2 but this is an extra step which is not needed anymore; and can be skipped, using straight the conf file in APC. Less overhead, more conventional (PHP arrays for PHP, JSON for JS), as friendly as JSON, not less extendable. Do you see what I mean ?

I wouldn't call that out of the question. We could move the serialization into Layer 2 instead of Layer 3, making it equally pluggable for this sort of situation. I do have to dispute the "less overhead" claim for anything but APC, though. JSON decoding in PHP is basically on par with serialized PHP decoding. The only reason "less overhead" does apply to APC is because APC silently uses PHP serialization and unserialization, and I don't think you can turn that off.

Level 2 must be persistent

catch's picture

The level 1 level 2 model is not permanent storage vs. caching layer. It is canonical storage vs. current storage, which is a completely different distinction.

You need to have the level 2 storage somewhere persistent, because that is used to diff against the level 1 storage during deployments (and this process is going to be relied upon to do things like create field tables if there is a change in field configuration - although how exactly that bit works is still under discussion).

So your level 2 storage can not solely live in memory, because if it disappears, you are not going to be simply rebuild it from the json files on a cache miss - built into the design is that the level 1 and level 2 storage may be different, and they are only re-synced via an explicit administrative action (admin screen or drush).

This means that level 2 storage needs a persistent backing. That can either be database, or database + caching, or redis, or PHP files on the file system again, - but it can't be APC or memcache solely.

I, too, have yet to be

Josh Benner's picture

I, too, have yet to be convinced JSON is the ideal choice. I can see arguments in its favor (primarily potential for integration with 3rd party tools, maybe a reduced potential attack surface) as well as in favor of PHP (consistency, familiarity?).

Furthermore, while I think keeping in mind developer experience is important, making the argument for one technical approach over the other based on the capability of a specific IDE to manipulate configuration files beyond simply editing the text is, IMHO, of very low value comparatively. I'd rather see a decision of this type made based on technically sound principles than IDE editing preference.

primarily potential for

Sylvain Lecoy's picture

primarily potential for integration with 3rd party tools

What are you calling 3rd party tools ? Eclipse is a 3rd party tools. I gave you a possible implementation, while you are speaking generally on hypothetical 3rd party tools.

I think keeping in mind developer experience is important

Me too, let's forget the one-line flat string JSON representation, which will need some pre-process functions to format correctly for readability.

I agree with your arguments, not with your conclusions.

What are you calling 3rd

David Strauss's picture

What are you calling 3rd party tools ?

Chef, Puppet, BCFG2, Capistrano, Fabic, Jenkins/Hudson, ...

Yes I think people have

catch's picture

Yes I think people have mis-read this a bit as well, or possibly are ignoring it because they don't like json.

I would be concerned if syncing the config files to the active/live store was a particularly expensive operation (especially if there's any kind of locking involved or if it can't be done in such a way that processes can read the old config until it's finished) but the bottleneck there is fairly unlikely to be the storage format anyway.

Don't get me wrong catch, it

Sylvain Lecoy's picture

Don't get me wrong catch, it not that I don't like JSON, its more that I believe we should use the right tool for the right job. Is there any added value using JSON instead of PHP arrays ?

Main argument was extensibility, but no one gave a concrete example of a third-party tool except me with eclipse IDE.

Also, and maybe you start to know me, I come from a Java background, its more that I will try to push this representation. Here, really I don't see in what JSON would be better than PHP's arrays.

Here is a first benefit using

Sylvain Lecoy's picture

Here is a first benefit using PHP arrays: http://groups.drupal.org/node/155559#comment-520399

Here is a second benefit: As a drupal back-end developer, you don't know Javascript because you will work only on back-end. That is, LAMP integration, PHP scripts and Database. Then having this JSON representation may increase your learning curve with Drupal. And we all know that the learning curve of drupal is exponential already. Having PHP's arrays is just simple as it should be. Why the hell we are using a Javascript representation (JSON mean JavaScript Object Notation right) for PHP to DB ? Are we using JS scripts to load the files ? This should be straightfoward, not sneaky.

JSON -> PHP -> Persistence layer is sneaky
PHP -> Persitence layer is straightforward

Edit: sorry I have to actually edit my post; this is more like:

1) Stored as JSON string
JSON in File -> JSON String through PHP scripts -> JSON in Persistence layer -> json_decode
2) Stored as PHP arrays
PHP arrays -> json_encode -> JSON in Persistence layer -> json_decore
3) With APC storage configuration
PHP arrays (that's it!)

How are you going to store

glennpratt's picture

How are you going to store these PHP arrays?

Serialize isn't human readable (and thus almost useless for version control) and unserialized means our configuration will be stored as executable PHP and or will need to be eval()'d. That's unappealing to me.

Edited...

Storing a php array is as

Sylvain Lecoy's picture

Storing a php array is as simple as declaring a php array.

Maybe I don't understand well your question, so here an example:

<?php

$conf
->core->site = 'MyAdvancedSettings';

?>

You then include the file.

With APC: opcode cache will handle that.
With DB: same as with JSON string, but you need to json_encode before to store the serialized version on the db obviously.

stored as executable PHP

Do you have example of non executable PHP ?

No I don't, that is the

glennpratt's picture

No I don't, that is the point.

The plan that is proposed has configuration in files and in the database. I prefer to have that in a format that is not executable.

As I said, serialize() is not human readable and eval() is insecure and slow.

EDIT: What you are proposing has Drupal writing executable PHP files. We have never done that, and IMO that's a good thing.

I prefer to have that in a

Sylvain Lecoy's picture

I prefer to have that in a format that is not executable.

If this is just a personal preference, I think you gonna have to tell people why this is better. Otherwize I don't have to explain myself, you know..! Because in that case i prefer to have that in a format that can perform well with APC :-)

What is the difference between Drupal writing executable PHP files, and Drupal reading, deserializing, and evaluating a content ? Your concerns are about security ?

I understand, but the scope is limited to write only arrays, or objects, it shouldn't be a problem don't you think ?

Performance

Crell's picture

The read performance of these files is not a serious issue, as 99% of all reads will go to the active store, not to disk. If it takes 0.0001 vs 0.0004 seconds to read a file, it will not matter. Runtime lookups will be hitting the database anyway (or redis on the high end). Plus with APC, as Greg noted, you'd have to restart your webserver to allow a file to be updated if you're running in no-stat mode. That is a non-starter, of course.

Well I think nothing can be

Sylvain Lecoy's picture

Well I think nothing can be really modified in that case, each time an idea is submitted to the community for "feedback" its actually an exposé of your decisions. The conclusions are always: "Well, after a chat on IRC with dude1, dude2, and dude3, we finally stick to what have been said, no matter of your propositions"...

Then in that case: Great plan ! Let's do it, we are wonderful, let's rock-on !

If you modify a file, it will be reloaded by APC. Apc work with the timestamp of the file. And you can rebuild the cache, without restarting PHP. And if you run with APC in non-stat mode, this mean you have minimal skills with some administrative background, so in my opinion you'll be able to understand the caching policy.

I know you guys like having complicated things, but why complexify code where it could be simple ? On your writings, you tend to minimize the impact, maybe this would need some comparative benchmark to be sure.

We did our homework

Damien Tournoud's picture

Well I think nothing can be really modified in that case, each time an idea is submitted to the community for "feedback" its actually an exposé of your decisions.

On the contrary. The "storing configuration as PHP files" idea have been suggested (by myself), studied and rejected during the sprint. Basis of the rejection:

(1) There is no tool to modify configuration structured as a PHP file (Augeas has a lens for "PHP variables", but it doesn't work for arrays)
(2) The active store needs to use the database, and this cannot use PHP files (we studied the possibility of writing a Streamwrapper to load PHP files from the database, but it turned out to be too hackish for our liking - and APC is semi-broken for remote stream-wrappers in no-stat mode, as it incorrectly relies on realpath for the canonicalization); it is useful and more consistent to have the same format for the canonical store and for the active store
(3) The canonical store will be read once in a blue moon, so performance is not a concern

Damien Tournoud

Ok.. Then I'm sorry Crell, it

Sylvain Lecoy's picture

Ok.. Then I'm sorry Crell, it is then my fault not having been there during the decision...

But everything is not yet set in stone is it ? While I understand why it has been rejected, I still believe using PHP is possible and less complicated than JSON.

APC

catch's picture

While having the level 1 store in APC or not is a dead end argument (not the bootstrap files, but everything else), it would not require a server restart to refresh the file contents. APC provides http://php.net/manual/en/function.apc-clear-cache.php to do this.

Since a sync from Level 1 store has to be triggered by an admin, this would not be a big extra step for a site running with apc.stat = 0 (not to mention they'll need something to do this on code pushes too).

That is not in itself an argument in favour of using straight PHP files for the level 1 store, but APC is not an argument against it either (if there is any argument against it when using APC, it is more around cache pollution as pointed out above).

APC is not the question at

pounard's picture

APC is not the question at all, it's a code optimizer. You have to embrace the fact that apc.stat = 0 is something you never have to carry about. It's a system administrator's choice to set it or not.

Generally speaking, you set it to 0 only once your site as reached a stable state and is ready to production. Once you do that, you know that you will never make any PHP files modification again, it's both optimization and security protection. Don't try to override this behavior, it's the system administrator's choice to freeze the site, not yours.

A system administrator, just next to me is saying that if file modification time is important, and if you need to force configuration refresh, then it's a software problem, not a system one, then it's your job to do a stat() on the file and load it

I would add that configuration is basically a static thing, configuration refresh if needed still is the job of the system administrator (the one that modified it), because if any other person modify the configuration, it will be through the UI so the file refresh problem doesn't apply since you are manipulating database.

Pierre.

There is a difference between

glennpratt's picture

There is a difference between executing and serializing/deserializing. If you allow Drupal to include() PHP files from a directory that is writable by the PHP process, there is no way to limit scope; those files can do anything PHP can do. If you read the files in any other way, APC won't help you cache it.

This would go back on a history of Drupal security AFAIK. This is the reason we didn't have a automated module updates for a long time.

PS I'm not "in the loop" on any of this, I'm trying to understand it all as well.

Although it was mentioned in

catch's picture

Although it was mentioned in irc that we'd replace settings.php with a hardcoded boot.json.php (and associated boot.local.json.php) - that is just two json_decode() and some munging, but given recent page caching haemorrhaging in D7 due to 'things that should be cheap' I would want to do very focused benchmarks there since those couple of files will have a completely different usage pattern.

settings.local.php?

drupleg's picture

Hi, I've seen a few post mentions about "settings.local.php". Angie's post states "In Drupal 7 and below, you'd do the following in settings.php (or settings.local.php):". Does Drupal 7 support settings.local.php? I can't find any documentation for it if so.

Aegir's settings.php files

Josh Benner's picture

Aegir's settings.php files include settings.local.php.

Not directly

Crell's picture

No, but a lot of higher-end sites follow a common pattern, as inspired by Drupal.org itself. Put the following into the end of your settings.php file:

<?php
if (file_exists('settings.local.php')) {
  require_once(
'settings.local.php');
}
?>

And then don't check settings.local.php into your VCS.

Great tip

drupleg's picture

That's a handy thing to know! As I aspire to be a higher-end Drupal developer this little snippet will be quite helpful. Thanks!

More override/alter support

stevector's picture

The above example has the core book module declaring "book_child_type" as "book." Contributed modules, distributions and even simple Features modules might need to change this value for each installation of the contrib module/distro/feature. Then individual site admin might want to change this value through the UI for any given site using the contrib/distro/feature module.

I propose allowing modules to declare values for the same variables/settings as other modules. This could be implemented just on config file names. So if the core book module has book.bookconfig.json.php then a contrib/distro/feature module could use exactly the same file name. So only one of these files would copy to sites/$sitename/config.

I see this as the same concept as hook_menu() and hook_menu_alter(). Any number of modules can try to claim node/%node in hook_menu_alter but only one will actually get stored in the menu cache.

The solution I propose is not perfect. Ideally, I'd like to see support for overriding single variables inside of a config file so that the contrib/distro/feature can override just "book_child_type" without overriding every variable in the file.

There is also the issue of weight. These files could be processed in module weight order as alter hooks are. I could also see each config file declaring it's own weight.

Thoughts? Can this be done with or without making a hook_config_alter()?

Use cases?

David Strauss's picture

I propose allowing modules to declare values for the same variables/settings as other modules. This could be implemented just on config file names. So if the core book module has book.bookconfig.json.php then a contrib/distro/feature module could use exactly the same file name. So only one of these files would copy to sites/$sitename/config.

The current proposal only copies a defaults file named after the module being installed, so it would not be possible to overwrite another module's config file without causing other module name collision problems that already exist. But, an install hook can perform any action it wants, including installing/merging defaults into other files or setting keys outside module.$module.json.php. It's not hard for a module to override a single key that way.

The Views use case

stevector's picture

The install hook approach would fall short if a Drupal has a reason to move again a config file from module to local.

I am operating under the assumption that modules like Views will ship with predefined Views in files named like views.glossary.json.php. If a site builder overrides this locally using the UI then the corresponding information will change in the active store and local config. I am also assuming that the site builder can "revert" this View and doing so would recopy views.glossary.json.php from module to local.

I am looking for a standard way to modify that glossary View, or any setting from any module, any time it moves from the module to the local config. A change made to the glossary View in the install hook get ignored when the revert happens.

A mechanism like this would greatly help when working with distros like Open Atrium. If a large organization wants to deploy multiple Open Atrium sites and wants to consistently a given attribute on a View, like the items per page, there are two main options:

  1. Clone and rename the View, then make modification and export that to a new custom feature.
  2. Use an alter hook inside of a custom module which will modify the View at run time.

The first option makes upgrading difficult because any future change to the View in the official Open Atrium would need to migrate to the clone. The second option assumes a high level of expertise and each exportable has different hooks when it comes to run-time altering. A standard way of altering Views/Panels/Flags/etc when they move from module to the local config would be hugely helpful for sub-distributions and sub-modules.

We could handle your use case

David Strauss's picture

We could handle your use case by invoking a hook after a configuration file gets installed or reverted. Any module could re-apply changes on top of the default configuration at that time. It's essential that the read path for configuration have really good performance, so we can't invoke hooks then.

I agree

stevector's picture

I will proceed then by posting a feature request patch in heyrocker's sandbox: http://drupal.org/sandbox/heyrocker/1145636

I'll post back here when I have a patch issue number to cross reference.

JSON format

heyrocker's picture

chx also brought up the PHP issue and he even wrote an implementation (it used a very clever stream wrapper to be able to include() straight out of the database.) On the Saturday of the camp we had a very long discussion about the formats. It was myself, crell, chx, damz, davereid, webchick, cwaegans... and a couple others im forgetting. In the end we all agreed JSON was the way to go (for the record we also compared XML and YAML in this discussion.) It is human-readable, highly interchangeable, a growing standard, the PHP implementation is present by default and highly performamt, and it actually offers some really cool opportunities for client-side firebug-like tools for browsing your configuration, seeing what modules are overriding things, etc. PHP's only real advantage was that it is fast, but as has been pointed out, this is really not an issue on a day to day basis and actually json_encode() and json_decode() are very very fast. These functions are FAR faster than serialize()/unserilize(). If we were going the PHP route we would just store raw arrays using drupal_var_dump() or something like that and include them.

Yes we will have a function to pretty-print the JSON.

I agree the PHP/JSON hybrid is somewhat ugly, the groans when it was first brought up were loud and emphatic. But on a day to day basis you will never have to touch these files at all and it will be invisible. If this is the hackiest thing we have to do in this initiative I will be extremely happy.

Thanks for everyones work on

glennpratt's picture

Thanks for everyones work on this, I think this is a huge step forward for Drupal in general. Dev-stage-prod baby!

Though I'm not crazy about the PHP/JSON hybrid, I understand the reasons and it is clever. Personally, I'll take it over raw PHP.

No, why you should have to

Sylvain Lecoy's picture

No, why you should have to use drupal_var_dump ? Just json_encode them, its just one more step and provides the same result as if it was a JSON file..!!

PHP implementation is present

pounard's picture

PHP implementation is present by default and highly performamt

Except that you are going to write your own implementation in the end, so this is a null argument.

Pierre.

We are only going to wrap

heyrocker's picture

We are only going to wrap json_encode() with a pretty printing function. We are not going to write our own implementation of that.

Just the same.

pounard's picture

Just the same.

Pierre.

This sounds great. My initial

webkenny's picture

This sounds great. My initial thoughts:

  • Storing configuration data at the file level, outside the docroot, is critical in my opinion. One misconfigured .htaccess rule and who knows what sensitive things an attacker would be able to bleed from the system.
  • JSON, FTW. Another Drupalist and I had a chat about it and I am with heyrocker. Simply, it is somewhat ugly but the speed gains are huge.
  • When can I start testing? ;)

Kenny Silanskas
Proud Member of the Acquia Support Team

Storing configuration data at

David Strauss's picture

Storing configuration data at the file level, outside the docroot, is critical in my opinion. One misconfigured .htaccess rule and who knows what sensitive things an attacker would be able to bleed from the system.

They wouldn't be able to read anything because we use .php extensions and the create the files as syntactically invalid PHP. We did this both for cases where the user isn't properly using .htaccess and for users who accidentally misconfigure .htaccess, even briefly. Both .php execution and .htaccess have to be broken to read a configuration file, but that isn't any less secure than our current situation with settings.php.

Valid JSON

chx's picture

It occurs to me that we can do something like

{
  'security': '<?php exit;';
  'hash': [SHA512 hash here],
...payload
}

We can string parse the first two out and check the payload hash. PHP will most often than not die with a pretty parse error due to having an odd number of single quotes after the opening tag. The only problem here -- not that it's not a problem with the original! -- is if a string contains literal ?> then everything after that will be displayed. We might need to escape those things.

Ick, please no. :)

webchick's picture

The DX is bad enough with the current proposed approach, let alone all the opportunities for typos and other things to screw up if we went this way.

Reading the comments here, I really, really like the approach suggested above that's basically:

  1. Store the /config directory outside the web root, in the default configuration. http://drupal.org/node/22336 would maybe provide some nice options here.
  2. In the event you have some catastrophically crappy host which doesn't give you access to the directory above web root (and I've seen some in my day), you store it in sites/$sitename/config, and .htaccess is your only hope.
  3. Store the files as pure JSON, plus line 1 as the signature, with a .json extension:
    483748923749832749832749237497238949234
    {
      "foo": "bar",
      "baz": "blee",
    }

Only arguments I've seen so far against this are:

  1. It's not strict JSON because of the signature line. But neither are our .json.php files. Either way, a tool like Capistrano, etc. would need to strip off line 1 to get valid JSON.
  2. There will be some portion of Drupal users who will fall into the bucket of "can't physically put things outside of web root" and also ".htaccess broke". Those people would be utterly vulnerable to attack with their DB passwords, etc. just hanging out there for script kiddies to slurp and for Google to cache. And they're definitely going to blame Drupal, not their catastrophically crappy web host, if their site gets compromised as a result.

I know Greg expressed some concerns about #2. I wonder if there's any way of knowing what percentage of users would be affected. 5% seems high to me, but not sure...

It's easy to make the file

David Strauss's picture

It's easy to make the file pure JSON by dropping the signature into .sig instead of the header. The main reasons we integrated it into the file were (1) if we're adding "<?php die();" to the first line and need to strip it out to parse the JSON, we may as well stick the signature there, too (2) it's easier in version-control tools like git to "git add" one file name instead of having to "git add" the changed configuration file and the updated signature file.

Oooo!

webchick's picture

Well pure JSON would obviously be preferable.

I'm not that fussed about module.foo.json and module.foo.json.sig, personally. I can just git add * in the config directory when I'm done futzing in the UI.

Scary

heyrocker's picture

I'm not at all comfortable with just leaving the people in #2 wide open and left out in the cold. The video greggles posted above is an example of the kind of disaster this can wreak on someone who accidentally misconfigures their site, even temporarily. I know that I have many times, for instance, deployed a site without an .htaccess because "svn add *" doesn't include it. Usually I will catch this pretty quickly but it underlines how easy this is to screw up. As Angie points out, these people will inevitably blame Drupal for this problem (regardless of the reality of the situation) and we owe it to ourselves to do everything we can to protect against this.

You did not address the part

moshe weitzman's picture

You did not address the part of the proposal where Drupal yells at you that you are storing your config inside of web root. This would not be a silent fail. People who got burned by this would have to ignore a red warning everytime the visited /admin and /admin/status. The .htaccess analogy is misleading IMO.

Drupal also yells at you

heyrocker's picture

Drupal also yells at you about security updates, and yet how many sites have you logged into that do not do this one basic thing? The same goes for settings.php. We also have this warning about private files (admittedly not as bright and red as we're talking about here) and yet I have never encountered a setup that actually did this. We have to face the fact that people are not going to set these things up in an optimal way. The misleading thing is going forward under the assumption that people can and will act in their best interests and do everything they can to set their sites up properly. Designing this system under the assumption that people will do anything beyond a one-click install is not something I can get personally behind, but if there is community support behind it (including signoff from the security team) then I'm willing to accept it.

Right, if you use private

catch's picture

Right, if you use private files but have them in the webroot, then you risk information disclosure.

If you run Drupal and don't apply security updates you risk your site being hacked.

If you give anonymous or regular auth users permissions to use the PHP input format or don't include an HTML filter, then you at even more risk of your site being hacked.

Drupal does not do strange things to ameliorate any of these, at best we shout at people. There are issues open to shout at people more if they have poorly configured text formats or similar but those have proved hard to solve. They all seem much, much more likely to me than someone both not storing files outside the webroot when shouted at and having a poorly configured server with no .htaccess or equivalent.

Not only this, but we could still put the actual database credentials in settings.php more or less unchanged from now, since those essential files are going to be a necessary special case anyway (and that'd be a good place to store the path to the config directory too). That is the likely worst attack vector for a site actually being pwned - there may or may not be sensitive information in other config files but it is less likely to be something that could be turned into an automated exploit for the subset of sites that have this problem.

What if we took cues from

q0rban's picture

What if we took cues from jsonp here? Drupal reads in a config file and creates a hash by the contents of the file, for instance acbd18db4cc2f85cedef654fccc4a4d8. Then the security file is actually php with just a function in it:

<?php
function acbd18db4cc2f85cedef654fccc4a4d8() {
  return
file_get_contents('core.site_name.json');
}
?>

The function would never get called if the hashes were incorrect based on the contents of 'core.site_name.json':

<?php
$hash
= sha1(file_get_contents('core.site_name.json') . $salt);
if (
function_exists($hash)) {
 
$json = $hash();
}
?>

The security files would then

Josh Benner's picture

The security files would then be webserver-writable PHP files executed by Drupal. Is this something we want to avoid?

core.site_name.sig.php:

<?php
function acbd18db4cc2f85cedef654fccc4a4d8() {
  return
file_get_contents('core.site_name.json');
}
malicious_intent();
?>

Good point. :)

q0rban's picture

Good point. :)

Yes, but, there would be an

greggles's picture

Yes, but, there would be an .htacess much like the one we drop into the /files directory that would help prevent execution of these files. Of course I've been pointing out that this isn't a 100% solution, but it's better than nothing.

With a jsonp-style approach,

Josh Benner's picture

With a jsonp-style approach, the file would need to be included by Drupal to then execute the function, which would execute the malicious code, as opposed to execution by direct serving by the web server (which yes, is ideally stopped by .htaccess).

I don't see where the hash is

pounard's picture

I don't see where the hash is really useful. If someone has successfully modified you files, it means it has an access over your server already. You can safely assume, in that case, that it has the power to read and modify all other files. An attacker that can do this can:

  1. Modify other PHP files if they are not correctly protected.
  2. Read the full configuration, access the database.
  3. Access and modify everything in the database, there of no security anymore -at all-.
  4. Recompute hashes itself.

In this case, this means the full application has been compromised (Drupal itself).

So the hash protects against file modification, but once the file can be modified, it means that the attacker could already have been recomputed its own hash. A hash protection would be efficient only if the software that checks it is totally separated from the application.

Hash is useless here.

Pierre.

Just because a user has

q0rban's picture

Just because a user has compromised the one directory that contains the config, doesn't necessarily mean they have compromised the whole docroot. It's another layer of security, as
A.) The key is stored in a separate directory.
B.) The hash created would need to be generated with the key
C.) They would need R access to sites/example.com, and if they have that, they have your db credentials.

If they have read access to a

pounard's picture

If they have read access to a directory with HTTPd credentials, this means they can access to anywhere the same HTTPd can.

Pierre.

But if they have read access,

Josh Benner's picture

But if they have read access, that does not mean they have write access. And if they have write, that does not mean they have read.

The hashing deals with the scenario where someone gains write access to the config area, which would be a location Drupal CAN write to (hence making it an attack surface to address). Even though the attacker can write to config, he cannot necessarily read the key in order to sign any config he writes. Furthermore, since the config files are not executed code, he cannot write arbitrary code to be run by Drupal.

Again, if an attacker can read from and write to all locations within your site, you're dealing with a complete compromise, and nothing about this config initiative has any bearing on or defense against such a case.

Assuming that if an attacker

Josh Benner's picture

Assuming that if an attacker can write to the config files, they can also read key file, is NOT a safe assumption. I was assuming the same, and if you read all the comments, you'll see some explanations about why this is not a good assumption.

IMO, the best description of why is here: http://groups.drupal.org/node/155559#comment-521269

It's definitely a lot of

greggles's picture

It's definitely a lot of people that we'd be leaving insecure if we moved to this system.

http://www.google.com/search?q=inurl%3Aq%3Dnode says there are ~38 million pages that demonstrate the site doesn't respect .htaccess.

Maybe some tests for "does the site have clean urls working" and "are any directives from .htaccess (like hiding .info files) being respected" could be added to the Acquia crawler project so we can make the decision based on more accurate numbers.

Not completely accurate

arcaneadam's picture

That's not completely accurate. That just means that there are some ~38 million sites that have had a paged linked to at some point using the ?q string. Just because you use the ?q doesn't mean the site doesn't respect .htaccess. For example you can still access this page at http://groups.drupal.org/?q=node/155559 but it respects .htaccess

I didn't check all of the sites on the first page but the first few had .htaccess running and wouldn't let me look at /modules/node/node.info for example

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

not a good stat

I already said it's not

greggles's picture

I already said it's not perfect, but it gives a sense of the scale. That's why I suggested it be added to the Acquia crawler.

You can page through that list and see a lot of sites are affected by it.

Seriously, the "problem" we are trying to solve here is not worth:

  1. The UX WTF behind why you have to learn what a webroot is and how to place something outside of it
  2. The risk it therefore presents to users
  3. The number of users/hosts who will not be able to run Drupal (or not run it securely) b/c of issues with no control outside the webroot and/or .htaccess/webconfig

a better solution

catch's picture

Than either not supporting extremely crap hosts or imposing hybrid file formats on everyone came up overnight.

This would be to make the level 1 file storage optional and/or pluggable - there are posts towards the bottom of the (very long and hard to follow at this point) thread suggesting this. If this can get security sign-off, I think it could also solve the UX and DX wtfs as well.

Then we might potentially be able to keep everyone happy. The actual changes to the main proposal here would be minimal doing it that way.

At least valid PHP

plach's picture

As someone was pointing out above, the *.json.php files are going to screw up (at the very least) Eclipse syntax check. What about having at least valid PHP syntax:

<?php die(); // 723fd490de3fb7203c3a408abee8c0bf3c2d302392 ?>
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "value1",
    "value2"],
  "boolean_value": false,
}

Accessing the file leads to a WSOD anyway, but at least it is valid PHP.

This is (as pointed out

Fabianx's picture

This is (as pointed out already) prone to 503 DDOS attack:

<?php die(); // 723fd490de3fb7203c3a408abee8c0bf3c2d302392 ?>
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "<?php invalid_statement()  $$$$ 444444 $$$ ?>",
    "value2"],
  "boolean_value": false,
}

=> 503 => DDOS possibility

Better is:

<?php die(); $code = "723fd490de3fb7203c3a408abee8c0bf3c2d302392"; /*
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "*/
?>
<?php invalid_statement()  $$$$ 444444 $$$ ?>",
    "value2"],
  "boolean_value": false,
}

but on second though still DDOSable via 503 (normally not cached).

Best Wishes,

Fabian

The only problem here -- not

David Strauss's picture

The only problem here -- not that it's not a problem with the original! -- is if a string contains literal ?> then everything after that will be displayed. We might need to escape those things.

If the file contains a literal "?>" and the file manages to be syntactically valid PHP (which it's currently not), then executing the PHP would still encounter the exit; or die(); before reaching the content to send. I'm assuming here that we would force the 'security' key to the top of the JSON in your model (while "<?php die();" is obviously on the first line if it's the header in the current implementation).

I'm less excited about the idea of the hash being in the JSON. It would be a pain to reliably integrate it and hash the rest of the content unless we're 100% consistent in where the "hash" key is. Even then, it would still be a signature of something less than the full JSON. I would worry about exploits like this (see the "security" line above the "hash" line):

{
  "security": "<?php exit;", "my_injected_configuration_key": "can't_touch_this",
  "hash": [SHA512 hash here],
...payload
}

When you separate what parses out the signable content (which certainly excludes the hash) and what provides the JSON content to decode (which would include the "hash" and "security" keys), you invite security holes that involve injecting content in a way that gets parsed with JSON but isn't subjected to signature verification.

We could use a hybrid approach that puts the "security" key/value at the top of the JSON but moves the signature to a .json.php.sig file. That would allow the configuration file to (1) be valid JSON (2) be useless when read and parsed as PHP (3) be signed reliably and completely.

We could use a hybrid

moshe weitzman's picture

We could use a hybrid approach that puts the "security" key/value at the top of the JSON but moves the signature to a .json.php.sig file. That would allow the configuration file to (1) be valid JSON (2) be useless when read and parsed as PHP (3) be signed reliably and completely.

That sounds good to me.

How do you make it be parsed

pwolanin's picture

How do you make it be parsed as PHP? You'd still likely need the <?php near the top, right?

Frankly, I think the original plan was a reaonable basis to start an implementation. These are files that people should rarely if ever hand edit or read, so having such an ugly 1st line doesn't bother me at all. For those with visceral reactions again this protective 1st line, I suggest:

tail -n +2

How do you make it be parsed

David Strauss's picture

How do you make it be parsed as PHP? You'd still likely need the <?php near the top, right?

I explicitly mentioned assuming so in my comment:
"I'm assuming here that we would force the 'security' key to the top of the JSON in your model"

Executing the file would give you this output:

{
  "security": "

Good compromise

dixon_'s picture

I don't have a problem with <?php die(); xxx at the top. Frankly, it's pretty darn smart imo :) But to compromise, this sounds like a good solution for totally clean JSON yet secure files.

Also, separating out the signature to a separate file could make sense since you don't really need to version control it, right? Just put *.json.sig in .gitignore and make sure to run drush config-sign after deployment to update all signatures in the production environment. Or did I miss something important here?

// Dick Olsson

If I don't have drush?

pounard's picture

If I don't have drush?

Pierre.

What if we required the first

glennpratt's picture

What if we required the first key/value to be the correct security entry / hash? Would that allow us to verify the remaining contents?

{
   "<?php die();":"32c4a50594fd61565cf30439ce63b3ba",
   "session_inc":"\/sites\/all\/modules\/contrib\/memcache\/memcache_session.inc"
}

For something like this to

Josh Benner's picture

For something like this to work and be secure, reading the file would need to follow logic along these lines:

  1. Read JSON file
  2. json_decode()
  3. Validate first member meets basic security criteria (key is php die text)
  4. Remove first member
  5. Generate hash for remaining data (which might likely include re-json_encode()ing it??)
  6. Compare hash from file (which was itself generated without the security member) to generated hash

The reason you cannot just "remove line 2 and compute hash" is for the reason highlighted by David Strauss above. Specifically, you could inject additional JSON on the same line as the security line, and it would be removed when re-calculating a hash to compare, and thus be cloaked from the hash.

Do json_decode() and json_encode() do a good job preserving ordering of the members? If so, this approach might work?

Agreed, the hash should

glennpratt's picture

Agreed, the hash should probably happen on the filtered data structure. That sounds like cleaner code and less chance for injection.

Just stripping line 2 also wouldn't work if the file was valid json, but not pretty printed.

LOL

chx's picture

that's just great. We should do that yes. Remove the first two lines, parse out the hash from the second line, add back a { and that's the payload.

OK, but one thought to that,

glennpratt's picture

OK, but one thought to that, why not just parse the JSON. Here's what I mean in code. (untested)

<?php
function config_prepare(array $data) {
 
$hash = drupal_config_hash(json_encode($data));
 
$data = array('<?php die();' => $hash) + $data;
  return
json_encode($data);
}

function
config_decode($config) {
 
$data = json_decode($config);
 
$hash = $data['<?php die();'];
 
array_shift($data);
  if (
$hash == drupal_config_hash(json_encode($data))) {
    return
$data;
  }
  else {
    throw new
Exception('Unauthorized configuration file.');
  }
}
?>

Why does the webserver need to write config files?

e2thex's picture

I see a lot of this threads ending because the webserver is going to be writing the config files to the file system. And I do not see what we get by allowing this? Can some one help me out here? We do have drush for things like this right?

IMO, because Drush is not in

glennpratt's picture

IMO, because Drush is not in core and is not appropriate for all users. What we get is configuration in a human readable, VCS friendly format that can be provisioned.

Ok I can see that and the

e2thex's picture

Ok I can see that and the drush comment is valid, I guess I feel that if we have a common way of exporting the config to say a down load about file. Then we can still provide for human readable and vcs friendly formats, but we do not get all of the baggage that comes with the webserver writing to the file system

so the work flow would be something like

go to interface and export content type article,
get a downloadable file/tarball
put file on site.

This is not that onerous of a process, and contrib could make it even easier (ie drush) and then we get to not worry about having writeable files?
It would open up options like the config being php and such.

Small unatteneded behavior

pounard's picture

Unwanted behavior due to the hash check and the caching method:

  1. Attacker (or entropy) modify a configuration file.
  2. At the same time (more or less) admin clear the cache.
  3. Any user bootstrap Drupal.
  4. Wrong hash is detected, file in unaccessible, won't be read.
  5. Cannot use the cached version, because it has been wiped out.
  6. Site is down.

This can happen more than you think, a lot of module often messes up with caches. The only thing you can do from here is fetch back the full configuration from an older state (VCS or anything) and put it back.

But the worst case scenario is:
-2. Someone made a configuration change using the UI (or a module does automatically after some business algorithm normal mess).
-1. File is saved, with new data.
1 to 6 is the same.
7. Admin set back the older files.
8. Changes are lost, forever, worst, some of the changes may have been done following consistency or environmental changes.
9. Site is down (twice).

Modified files are not accessible anymore, site is down twice, database information is only cache.

Plus, this proposal doesn't answer another real need:
- The need to know when a variable has been modified in the UI, but files tells different values (overrides and such) that's a place where strongarm is good (and probably the only feature I really like about that module).

EDIT: It doesn't solve another problem: multiple configurations (with a configuration switch), such as "devel mode", "production mode", "testing environment" etc... Those different profiles needs to be saved along the "normal" configuration (probably the "production mode") and the configuration should be switch-able using a single on/off or name switch somewhere like in settings.php file or an environment variable (env. variable can be set at the VirtualHost level, probably possible with all known HTTPd, it's really a good usage of it, plus, you can commit a common settings.php file and use the environment switch).

Pierre.

Not a cache

Crell's picture

The "Active Config" is not a cache. It does not use the cache system nor does it wipe itself when the cache system does. The only time the Active Cache would change is:

1) When an admin makes a change in the UI (and the active store is written to first, then the disk version).
2) When the admin actively tells the system to reload the active cache from disk.

2 would presumably be quite rare in production. So the race condition you're describing only exists if two ignorant admins are making heavy changes to the site at the same time. We cannot defend against multiple ignorant admins at the same time. :-)

The different-environment problem is handled in two ways. One, the override file. Most configuration should not change between different installs, just a few bits for API keys, DB credentials, and possibly changing the site name to "Dev site be careful". That's already addressed in the override file.

The other is contextual configuration, which above we called layer 4. We kicked that one around at the sprint for a while but couldn't figure out a sane way to do it, so punted as we're reasonably sure it can be built atop layers 1-3. The same system would be responsible for internationalization and things like domain access, spaces, etc. So that's in the "um, we'll get back to that" category for now so that we can get the rest of this stuff working.

Ok for Active Config, make

pounard's picture

Ok for Active Config, make more sense, then we agree that my scenario can't happen. There is still a problem though, the proposal stores serialized objects. It should probably store formalized data (using a good schema). MongoDB lovers will tell me this is nonsense to say that schema is evil, but MongoDB doesn't store serialized data either (BTW storing raw JSON into database is total nonsense, relational database is meant to use schema, not storing serialized stuff, don't forget that a lot of optimizations can be disabled on blob/binary fields for some DBMS because databases don't want to keep huge junks of binary stuff in memory while they can use it for something else).

I do not agree with the fact that most changing information is context. Context is business oriented, mainly, it alters runtime depending on what the user asked. Configuration is static, it doesn't depend on what the user asked but is fixed by the sysadmin and/or integrator (both actually a lot of stuff may be fixed/overridden by sysadmin).

I think that context is mainly out of scope of this discussion, since yourself kicked out my comments of the WSCII groups when I proposed that context could be a specialization of configuration layer (therefore making clear the dependency problem). You really like to double back as I see.
Plus, the actual configuration API's goal as proposed doesn't seems to integrate with such layer (the fact that configuration is weirdly split into many pieces just doesn't match that).

Profiling configuration is a real need, it's not only database and environmental, it's an error to assume that it is only that. I need to be able to switch a production site in debug mode if something wrong is happening and I need to introspect it, without changing the box it runs on.

Pierre.

Another point, having a full

pounard's picture

Another point, having a full registry-like configuration would be really nice for developing development tools and debugging UI's. This proposal splits configuration into many configurations, make this totally impossible, which seems to go the opposite direction than everything else that exists.

It also make virtually impossible things such as variable being declarative (allowing a more strict variable handling policy, and to create powerful UI's once again, but also which would allow to have a self introspecting API based on declared allowed variables, really useful for debugging and automatic variable removal when disabling or uninstalling modules, avoiding cruft and garbage).

Plus the place where you split seems be arbitrary.

How could I create a profile configuration file that literally override many places of configuration (many "configuration objects") if this is not centralized? I now it's technically possible, but it seems to be totally unnatural.

Pierre.

php/json hybrid in IDEs

te-brian's picture

I'm definitely not qualified to discuss security/signatures/etc., but I do have my DX concerns about the hybrid format.

I believe pounard may have mentioned it earlier but IDEs such as Eclipse will mark these 'php' files as invalid, and my whole project tree will have errors and warnings. This would be highly unfortunate.

Please excuse the fact that this is not a real D8 project tree or anything but here are two examples of what happens if I put one of these .json.php files in a project:

https://docs.google.com/leaf?id=0B7ZPYkeA3Y1zYWVmODE2NGUtZWY2MS00MmY2LTg...

https://docs.google.com/leaf?id=0B7ZPYkeA3Y1zZmYzNTMxYjYtOTIxNy00ZWEyLWJ...

If the files were XML, YAML,

Josh Benner's picture

If the files were XML, YAML, or Word97 format, they'd still need to be hybrid of PHP, because they exist in the document root and can be served directly by the web server. There is the .htaccess that protects them, but in the absence of PHP hybrid, that represents a single point of failure that could result in exposing (and getting indexed in google) sensitive configuration information (api keys, etc).

So, that basically boils down to the discussion that's been going on over whether the config files should be under the document root or not. If the config files are always outside the document root and hence not capable of being served by the webserver directly, they wouldn't have the need to be PHP files. The resistance to that seems to be that this would introduce a new site setup dependency, requiring users to understand/know how to put files above their document root.

There has been a few mentions of making this configurable... an idea that sounds good to me. However, were this to attempt to address the issue you highlight, I think we'd also need to either make the config file naming pattern configurable, or automatically use a different naming pattern if we determine the config location is set to a location outside the document root.

What I'd envision for

Josh Benner's picture

What I'd envision for configurable config storage in settings.php:

<?php
// Defaults:
// $conf['config_path'] = 'config';
// $conf['config_name_pattern'] = '%name.json.php';

// My overrides because I want my config out of docroot and IDE-parsable:
$conf['config_path'] = '../config';
$conf['config_name_pattern'] = '%name.json';
?>

More 'Drupalisms'

te-brian's picture

Alternatively, we use a custom extension because (frankly) this is a custom file format anyhow.

  • mymodule.conf
  • mymodule.data
  • mymodule.config
  • ... other better ones?

I have not tested these in my IDE though, so I'm not sure if any would be equally improperly parsed.

Edit: Just to clarify, I do not have any arguments against the functionality of the hybrid approach, I'm just trying to point out what would be a DX thorn in my side due to the file extension.

If they aren't named .php,

Josh Benner's picture

If they aren't named .php, the web server might not interpret them and just serve them directly, exposing their contents (especially since for this to happen, .htaccess or webconfig is not working). The file extension is directly related to the hybrid php approach.

We already use files not named .php

arcaneadam's picture

Here is what I don't understand about the argument that they have to be named .php: We already use so many different files that aren't .php i.e. .module, .info, .inc, .test. What's one more? Do servers without .htaccess serve those files up (Really do they? I've never looked)? Why not just call it .config and go about our merry ways.

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

More sensitive

Crell's picture

If someone gets read access to a .module file, they could examine the source code and find bugs to exploit. Or they could just go to drupal.org and download it. 99% of the code on any Drupal site is already public, so secret source code is not a security solution.

What those files don't have in them is sensitive data. If someone gains access to core.dbsettings.config, they can now (possibly) log into your database remotely and pw0n everything. Or get the credentials to your memcache server and poison your cache with fake pages. Or get access to your Facebook API key and masquerade as your site when talking to Facebook. Or various other sinister things.

Most of the configuration in the system is not really a threat if it becomes public. Node type configuration will rarely help anyone trying to hack the site. But there's just enough configuration data that is extremely sensitive that we need to be extra cautious.

There probably are developers who put such sensitive data directly into a custom .module file. Those developers are playing with fire. Please do not play with fire, at least not where your web server is concerned.

The reason is that the

cdoyle's picture

The reason is that the (potentially) sensitive information contained in the configuration files is more important to keep away from attackers than the contents of .info files. If you give the config file a .php extension, the server will attempt to parse it as a php file (thus the whole <?php die(); stuff) and quit parsing after it hits that bit of the file. Otherwise, it may just serve up the contents of the file if it's an unknown type (thus displaying your db credentials or whatever to everyone). In the case where you have a misconfigured or not respected .htaccess, these files may be served up by the web server and this adds another layer of security.

Makes sense

arcaneadam's picture

Yeah, I never really thought about it like that. Source code isn't really all that secret as far as drupal is concerned.

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

Really I don't agree at all with this proposal

pounard's picture

Just a question, but it seems that the whole point here is to make a good security over writable files. But as David Strauss said "Files won't be read at runtime". So, why are they writable then? Just, look, I'm just saying, but if there is an active database table for configuration, having writable files is totally useless. So, if they are not writable anymore, they should not need the hash stuff.

For the read security hole, those files should definitely be protected, that's why Symphony and Zend at least tells in their documentation some bits of web server configuration. Drupal cannot be omnipotent nor omnicient, it's already being a too high level of abstraction above the web server to be able to secure files by itself. Don't think you're smarter than the hackers, you definitely are not. Please, trust the lower layers, your Apache will do so much better a good job (and faster by the way) than you about protecting files, don't loose your energy on something that already is secure: just don't make them writable.

Drupal is the web application, and cannot control the lower layer on which it is relaying on, it's pretty much the opposite that happens in the real life. Drupal should care about functionnal security, but definitely not about hashing its own files, you are loosing so much energy on a level of security you will never be able to guarantee whatever happens.

For early critical configuration (database for example) the settings.php file is okay, at least it's secure. For others, ini files would be so much more human readable and editable, JSON is really a crappy choice (I repeat that this last point is only an opinion), it doesn't provide any good stuff. It's harder to read and write, and performance is not a problem, because "read doesn't happen at runtime" (David Strauss said it).

This whole "file configuration" proposal is just unfinished, you forgot the most important parts:
1. Ability for the sysadmin to enforce settings, non overridable ones.
2. Profiling (dev, prod, preprod) with profile that can be changed on the same site just with only one (environment) variable.
3. Pushing files with a hash inside git or svn is a problem, the hash will cause conflicts when working in teams.
4. The proposed API doesn't really seems to be sexy (ugly DX IMHO).
5. I really don't see how you can easily extends this API for contextes.

You are just showing out extremely complex security stuff that doesn't solve the initial problems: being industrial. It's not industrial, it's not simple. Both are related by the way, in order to be industrial you need your code to be simple to use, simple to understand for most people to learn how to use it fast. Plus, it doesn't seems to be extensible. Fighting for the files signatures is probably a good fight, but putting the signature into the files themselves is an error! You shouldn't trust Drupal itself for low level file consistency, but lower OS-level software.

While your expose about the API itself is just like a fuzzy draft, unfinished work while IMHO it's really the most important part that you should have done before even talking about security. You didn't even related import/export problems, polymorphic export file formats, pluggable backends, introspectable schema, UX problems, modules declaration and usage, these were original problems for the most.

Pierre.

Summary and regrouping

heyrocker's picture

I am going to try and summarize the major issues expressed in this thread, in an effort to get things back on track.

In the end, it appears that the most controversial part of this proposal is the usage of the mixed .json.php files. This decision came after 2.5 hours with Greg Knaddison and Ben Jeavons of the security team, as an added layer of protection against .json files being read should other protections (like .htaccess or web.config) fail. Since then Greg has responded several times in this thread in support of this decision. In particular see

http://groups.drupal.org/node/155559#comment-520594 and
http://groups.drupal.org/node/155559#comment-522274

I have also heard separately from one more member of the security team that supports it, as well as chx who helped to architect this system. I consider this an official decision that this is the level of security we have to support until I am informed otherwise, therefore this is no longer up for discussion and I would appreciate it if everyone would stop talking about that angle. The Drupal security team is very good at their jobs, and they serve a valuable and needed role in our community. Please respect this decision and move on, even if you don't agree with it (or take your argument to them, in a respectful and thoughtful manner please.)

Once that decision is made, we now know that any non-PHP file format has to have a .php extension regardless of what it is. Some options then are:

  • We stick with json.php, making the file a real PHP file, beginning with <?php die(); and followed by JSON. This file has a json.php extension.

  • We go to a different hybrid format which is pure JSON but has a key called <?php die(); ?> with a value of the hash of the rest. This file also has a json.php extension.

  • We move to pure PHP: we could use drupal_var_export() to generate valid and safe PHP code. It is possible to store this in the database and read it via a simple stream wrapper without needing to eval().

Leading up to this post, there was a lot of discussion around different file formats. XML and YAML were dismissed because of speed concerns and the lack of a native parser in PHP. Also, as noted above, these would inevitably become hybrid formats because of the security issue. This led to JSON and PHP, and Damien summarized the reasons behind the decision pretty well here

http://groups.drupal.org/node/155559#comment-521629

I am happy to see this discussion about the preferred format continue, but please focus on functional reasons why one format is better than another. DX is a functional reason, but again, please keep the comments constructive. Simply saying 'This is an ugly hack' doesn't really serve to move the discussion along. I am happy to shoot some screen shots of how this appears in all the major editors so we can have actual data on that point. Please realize while editing these files is a desired goal, we expect that in the vast majority of cases such an edit will be a rare operation.

There has also been a suggestion that we move the hashes out of the actual file and into a separate .sig file (for instance json.php.sig). Benefits would be that it would allow the actual config files to be more easily version controlled since the hashes wouldn't change, and while we would still have a hybrid file format it would be somewhat less so with the hashes separated. The counter argument is that since we already have a hybrid file format, why not just keep the hash in there as well and not clutter the file system?

So far as I can tell, these are really the two big discussions and everything else seems pretty well on track. Let me know if there is anything I missed. I have my own opinions on these topics but I will save them for another post.

The importance of the hash

Fabianx's picture

The importance of the hash being in a seperate .sig file is version control ability:

.sig files should never be version controlled, but rather re-generated on checkout.

Problematic scenario with hash in .json.php file:

  • Production configuration is updated and variables / configuration are set to something different => Files are written, Hashes change
  • New code is written and pushed to staging, Code is configured, lots of configuration changes => Files are written, Hashes change

Now its deploy time:

  • Push production changes to production branch
  • Push staging changs to staging branch

  • Merge production configuration changes into staging branch

Now we get conflicts as the hashes did change on both sides and that conflicts need to be manually resolved. And that each time.

With the hash in the .sig files, either you can tell to just ignore the conflict (git reset) or if you don't have version controlled .sig files, the merge will be fine (unless there is a real conflict).

In all three cases you will need to manually recompute the hashes afterwards anyway.

So for the sake of easy merging, please please please put the signature in a separate file.

You'll know the advantage when you start using this with a git workflow in production.

And if you don't do this, you'll always think about it when you again have to manually resolve the conflict even though it would not be necessary. You'll then remember my words :-D.

Best Wishes,

Fabian

This is an excellent point

glennpratt's picture

This is an excellent point Fabian. Sorry I didn't grasp it as it flew by in IRC.

I believe you also suggested something like this:

{
   "<?php die();":"DO NOT REMOVE DRUPAL SECURITY",
   "session_inc":"/sites/all/modules/contrib/memcache/memcache_session.inc"
}

David Strauss mentioned separate files in here as well. http://groups.drupal.org/node/155559#comment-521249

Thank you heyrocker and

glennpratt's picture

Thank you heyrocker and others for your hardwork, I really appreciate it and am excited about this in D8. With nearly 30,000 sites running Features, I think it's clear this is important to a lot of users.

I'm going to put my 2 cents in for plain JSON in *.json.php file, that requires our security string and hash as the first key value.

My reasoning is DX and compatibility with a standard, I don't want our parser to only work with a certain subset of the JSON syntax. I feel that's an unreasonable burden for developers, even if we assume the files will be edited rarely.

Personally, I hope to see this working with configuration management and provisioning tools that are not built in PHP. Chef for example, which uses JSON frequently (https://github.com/glennpratt/drupal-chef-repo). If the file contents are just JSON (and not Drupal specific JSON), it becomes easy to deal with, and signing with Drush is as simple as restarting a service. It requires no special knowledge of the JSON parser.

JSON example:

{
   "<?php die();":"32c4a50594fd61565cf30439ce63b3ba",
   "session_inc":"/sites/all/modules/contrib/memcache/memcache_session.inc"
}

PHP example (not exhaustive, just an example):

<?php
function config_prepare(array $data) {
 
$hash = drupal_config_hash(json_encode($data));
 
$data = array('<?php die();' => $hash) + $data;
  return
drupal_pretty_json_encode($data);
}

function
config_decode($config) {
 
$data = json_decode($config);
 
$hash = $data['<?php die();'];
 
array_shift($data);
  if (
$hash == drupal_config_hash(json_encode($data))) {
    return
$data;
  }
  else {
    throw new
Exception('Unauthorized configuration file.');
  }
}
?>

I intentionally use just json_decode and json_encode, no string parsing because I don't want this tied to a Drupalized subset of JSON. Pretty-printing is fine, I just don't think we should rely on it in the parser.

Others have mentioned moving

glennpratt's picture

Others have mentioned moving the signature to a separate file to avoid VCS churn, and I'm in favor of that. It should make the overall solution a bit more straightforward since we don't have to consider what could be injected into the signature line.

See Fabian's post above.

How about keeping the config path a secret?

grendzy's picture

It seems the config path could be:

<?php
'sites/default/conf_' . drupal_hmac_base64('conf', drupal_get_private_key());
?>

This way you have .htaccess protection layered with an unguessable URL. At this point I think it would be safe to store JSON data without the php die() padding.

As a third layer, system_requirements() could verify the .htaccess protection is working.

Smart!

Damien Tournoud's picture

I actually love this idea. I'm sure we can make that work.

Damien Tournoud

Obscurity?

Itangalo's picture

This sounds an awful lot like security-by-obscurity to me – or am I missing something?

(Note that I may very well be missing something – I am very inexperienced in these matters.)

//Johan Falk

Nope

chx's picture

To access the config files from a browser you'd need to guess a string that is "impossible" to guess (if you can guess that, you can guess session IDs) and never leaves the server so you can't eavesdrop on it. Security by obscurity would be using a long but fixed string for example.

Before people bomb this idea with the usual level of I-thought-this-over: of course, if you have local file system read access then you can see this path but then you can also read all the config already. This is a protection-from-the-browser level.

svn

catch's picture

What happens if your config is checked into svn, .htaccess protection for svn doesn't work - then you can see this in the .svn folder fairly easily no?

However I think this applies about as equally to the <?php die; trick, so again I'd be OK with relying on this as long as there is a hook_requirements() too.

Would that mean a secret

kika's picture

Would that mean a secret directory per module or is it meant only for JSONified settings.php?

Config shipped with modules

catch's picture

Config shipped with modules needs no special protection, this would be for a per-site config directory.

Yes

heyrocker's picture

I'm also a fan of this idea, and it seems to be the best compromise I've seen in this thread. I actually think we should use the config systems key.php key to generate the URL to avoid weird situations since assumedly 'drupal_private_key' will be stored in the config system itself, but that's just nitpicking. +1

While that does work and is

Fabianx's picture

While that does work and is better than any php die() stuff seen so far, it does not protect against a wrong module, which accidentally exposes the config path.

I am still against writing through by default (security risk is much higher than with just DB), but using this as solution for the write-through activated case, works for me and is a good solution and would work for outside webroot, too. (though not necessary there)

Best Wishes,

Fabian

This makes sense to me as the

greggles's picture

This makes sense to me as the default.

At the risk of stating the obvious I think a lot of folks will want the option to store it outside the webroot so the code around this would need to be more elaborate, but that snippet as the default case to determine the directory makes sense to me.

Also, as catch points out I am not speaking on behalf of the Security Team. Ben and I were involved in the sprint for our knowledge of security issues (and convenient geographic location) but the ongoing discussions here and in irc have included lots of other security team members. It's unlikely there will ever be a single decision for "this is what the security team says!" much like there is no "this is what the core development team says."

I realize this may be late to the convo but...

arcaneadam's picture

I've noticed a lot of this discussion is revolving around how this will make it easier to VCS and deploy from dev->stage->prod and other such development issus, yet a lot of the justifications which are being used as reasons why this was a done a certain way are revolving around what if the user is: a one-click install, not good with VCS, doesn't have .htaccess on. etc. This makes me wonder who we are trying to serve with this featureset?

To me the affected users here are experienced developers. They are the ones who are using features to export code to multiple deployments/sites. They are the ones who are using VCS regularly and want an easier solution. Not the everyday average user. If this is the case then why are we

a) Talking about this like it is a mandatory core function.
b) Finding ways to make it work for normal users.

My wife is never going to care about this feature for her blog site. Joe the plumber won't know weather or not his site has this ability. And College intern Fred isn't going to be thinking about how he can move his first clients site from dev to prod, he's only got one install and he makes all changes live (C'mon we've all been there).

IMHO this is a great feature and talking about how to implement it is great and eventually getting it ready to use is great, but I think that it should definitely be approached and looked at as an advanced feature and not a required* feature.

I know everyone that is involved in this is a super advanced developer and we are all interested in making our VCS & deployment lives easier, but let's not add wrinkles to a featureset in the name of making it required when it doesn't need to be.

One of the things in this thread that comes to mind is the discussion of where these files should be located (i.e. in or out of the docroot). Anyone who is going to use this feature is going to have the wherewithal and ability to set this up correctly, anyone who's not going to use this shouldn't even have it turned on.

*(A drupal 8 site should be able to run without it.)

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

I thought the same as you,

glennpratt's picture

I thought the same as you, but I've changed my mind. I don't want two ways to deal with Drupal configuration. The documentation becomes a bunch of ifs.

I think version control is only going to become more accessible with time. People may have version control on there host without even knowing it in the future. Think about things like Dropbox today.

Also, many experienced developers have mentioned that they have removed the .htaccess file one way or another. I've seen apache configs that intentionally don't read .htaccess files outside of webroot because .htaccess could be something nefarious. In short, I don't think it's correct to tie this to a class of user either.

We are already talking about two ways

arcaneadam's picture

Basically the whole issue here is that we are already talking about two ways of handling config: DB vs. flat file. The only purpose of the sites//config/.json.php files is for exporting configuration to a new location.

So if we are so concerned about security then why are we even thinking about storing these files? Shouldn't they only be generated when needed. I guess thats what I am getting at. Why do something that isn't needed if it poses a potential threat, just to remove one step from the process (i.e. clicking a link that says 'export config'). I think the format is fine, I like the idea, I love what i means for deployment, I just don't understand why we are creating this security headache (even if it's done in the most secure way possible) for something that isn't even used on a regular basis (config is being read from the DB anyways.)

This also brings up a point that the module.mymodulename.json.php is really not needed anyway. Why not just create a core hook, or encourage devs to create defaults of their config variables into hook_install. Why expose ourselves to unnecessary security risks or add the need for an extra file (a DX issue that the whole i18n string issue is running into right now).

As I said I totally understand and support the need for this feature. I'm just not sure why we HAVE to generate these config files after every config change (as opposed to only when we need them) and then leave them sitting on the server as a possible security risk.

A simple menu item (or better yet drush command) to create these files when a dev is ready for them would be enough. I know there still needs to be the implied security, but not keeping files there all the time and only generating them when needed relieves about 90% of the security concerns. That was the main point of mine when talking about not making these file required and trying to think about the use case and users of them.

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

I do absolutely agree. I was

Fabianx's picture

I do absolutely agree. I was worried a little about webserver security, too.

Just removing the write-through part would make this much more enjoyable and also much more controllable.

+1 for this idea!

Best Wishes,

Fabian

I tend to agree

styro's picture

I really like the overall goal of this proposal. I'm happy with the overall persistence and updating/overriding workflow, and personally I think JSON is a good format to choose. And I think that wrapping JSON in PHP is about the best we can hope for if we need to make this secure for absolutely everyone.

But as Adam mentions though, it just seems to me that preserving the UX (or SX - security experience hehe) of the people that will never make use of this system is adversely affecting the DX of those that do. Or conversely if the DX of those using it is improved, then the UX/SX of those that don't suffers. Not that the adversely affected DX of the config files would stop me using it - it would just make it less convenient.

Is there really an intersection set between A) people that want to use config files and integrate it with staging and version control etc, and B) people that can't properly configure their webserver securely or use drush or put files outside a webroot? Could we split it apart into two different cases with more optimal experiences?

Is it possible that reading/writing these config files could be optional and something enabled only by those that will make use of them? ie out of the box, Drupal would act similar to how it does now. eg if the config system isn't given a directory outside the webroot, it doesn't bother with files at all and just uses the DB?

The only wrinkle would be that the DB only mode would still need something like settings.php, wheres DB+files mode wouldn't.

I know it might be regarded as making a two tier userbase, but that is already happening with features module etc etc. I reckon that horse has already bolted.

EDIT - After reading further down the page, I see this line of thinking is already being discussed :)

--
Cheers
Anton

Hi, This is exactly what chx,

Fabianx's picture

Hi,

This is exactly what chx, webchick, catch, arcaneadam and me have been kind of proposing, but it is now lost again in this thread, so I bring this up again:

  • Make the Files Storage optional and disabled by default: If someone enables it, he has to take responsibility for doing so and take care of the implied security problems. We can then go with plain .json files outside of ROOT Storage.

arcaneadam summed this up pretty well:

In talking with chx and fabianx last night in IRC this seemed like the way to go. To reiterate my point, writing files to the webserver that don't NEED to be there all the time, and then trying to find a way to make them secure is like leaving your house unlocked all the time then trying to find a way to make your alarm system account for that. Basically it's crazy.

I think the way moving forward is to use files, but generate and do export/import through an UI, that warns about leaving the files in place, or even offers import/export in multiple ways (file on the system, text area input, or direct file upload/download) that way we can let users have the different options available and then if they choose the less secure option, well "You were warned." It then becomes, to stick with the house comparison, more like keeping your house locked but CHOOSING to keep a key under the mat, give one out, or just leave your door unlocked (Which for some sites isn't a big deal, if you don't have any sensitive data in a basic install, then a config file left on the server may not pose a security threat even if someone can get to it)."

Best Wishes,

Fabian

Views as an example

jhedstrom's picture

Using views as an example (although this is applicable to any current functionality that supports the concept of default vs overridden storage), I'm wondering if we're losing functionality here.

The way views currently works, a module foo can define a default view_A. That module is deployed and installed. At a later date, a developer may add view_B and View_C to the default views hook. Upon deployment, these new views are instantly live (once caches are cleared anyway). Would this proposal require an update hook to make those views live?

Furthermore, the current workflow allows for a developer to make a changes to view_A. Upon deployment, these changes take immediate effect, unless an administrator on the production site has overridden that view. Would this proposal remove that layer of safety, and immediately overwrite the local changes to view_A upon deployment?

On the first question, you'd

catch's picture

On the first question, you'd need to go to an admin screen or run a drush command to sync the new configuration files with the admin store.

The second question is somewhat answered by the first question - the active store isn't a cache, and it's allowed to be out of sync with the canonical file storage, so changes from files to the active store have to be done as a conscious action. That could be resyncing everything, or it might be an individual screen where you can review diffs and sync a particular bit of config (not sure how much that has been discussed yet in terms of options).

If the ActiveStore is

Fabianx's picture

If the ActiveStore is chainable, we can have a AutoSyncActiveDisk Store, which is checking checksum with upper layer and automatically updates.

Problem solved.

Best Wishes,

Fabian

In regards to my earlier

te-brian's picture

In regards to my earlier comments about IDE DX with the .json.php extension .... something so simple it hurts came to mind.. if you made a slight tweak to the format, it passes any php syntax checking.

I'm not sure how this would affect parsing but the following causes no troubles in Eclipse:

<?php die(); /*723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "core.site_information": {"site_name": "This is the dev site"}
}

I don't think you could do similar commenting-out in the more JSON-valid version where <?php die() is the key.

My take on feedback so far...

webchick's picture

As Greg pointed out, people are getting fixated (perhaps a bit overly fixated :)) on the .json.php details. People hate it because:

a) It's ugly as sin (the full signatures are like 128 characters long, and are snipped here for brevity).
b) It's a non-standard Drupalism, and we already have enough of those.
c) It reports syntax errors in IDEs (times however many files), making it impossible to derive real syntax errors from fake syntax errors in the log. DX-----;
d) One of the stated benefits of JSON (compatibility with external tools) is rendered moot because of that goofy line.

These are all legitimate complaints. And I think the reason people are fixating on this is because, other than the variable_get()/variable_set() API replacement (which by contrast has almost NO discussion around it; a bit disturbing), this is probably the most developer-facing change we have.

The reasons those files exist as they do have been explained both in the main post and in several comments above. But basically:

a) Unilke .module files and other things, these files are guaranteed to contain, in some cases, very sensitive information, such as DB password, API keys, etc. We need to be absolutely sure these files are protected from prying eyes.
b) .htaccess (our first line of defense) doesn't work on all web servers, and can also be accidentally not uploaded, and can also be broken.
c) .php offers a very nice second level of defense in case .htaccess fails, because if Apache is working /at all/, those files should be parseable and simply show a syntax error (or a white page) rather than your database password.
d) Outside webroot (which would avoid the world-readable problem entirely) is not an option for all hosts, particularly crappy shared webhosts who are some of the most likely to have .htaccess problems due to the noviceness of people using them.
e) The config files themselves have to be stored alongside a signature, so we can detect rogue modifications and prevent them from infecting the active store. That signature is going to be OOGLAY and we have to store it somehow.

But at the end of the day, I think some people feel we're trading one set of DX headaches (having to deviate from normal deployment process with drush fu and the like) with another set of DX headaches (see everything in the first list), and remain unconvinced that the trade-off is sound.

We were talking a bit about this in IRC, and Fabianx/catch pointed out that the problems we're trying to solve for crappy shared hosts (protection against .htaccess failure, which is required since outside webroot is likely not an issue) are pretty much non-issues for anyone who'd be doing an advanced deployment workflow with version control, since those people at least have shell access and have a reasonable idea what they're doing.

Something that might be helpful from the team is a bit more explanation about what problems the "auto-sync to files" pattern is intended to solve. And also to talk about those in context of whether those reasons apply for shared hosts and the "default" workflow most inexperienced people will use. If there's a mis-match there, perhaps offering a pair of Active State options—ActiveStoreDbOnly and ActiveStoreWriteToFilesToo or something—might be one way of dealing with this; if we don't write the files on crappy hosts and force them to use the DB, we no longer have a security hole and we can just fanangle pure .json everywhere else. Argument solved.

Now you'll all tell me why I'm wrong. ;) Please do!

I think that's an accurate

catch's picture

I think that's an accurate summary of the conversation. The main thing is whether for people who do not care about deployment, there needs to be write through from the active store back to the files. Modules will not be shipping passwords in their config files, so the defaults can be stored on disk as much as you like, this only affects sensitive stuff that people add themselves. Either in version control, in which case they are not noobs on crappy hosts, or via the ui and into the database, in which case does there have to be a write back to the file system - especially since the active store and file system are not automatically synced the other way.

Not correlated

Crell's picture

There is one assumption you make in your post, and I think a few others have made, that I don't think is correct. Specifically, that the following things all come together:

1) A dev workflow that involves Git, SVN, or some other version control system.
2) Using that version control system to deploy to production.
3) Having root on the production box.
4) Knowing how to use root.
5) The production box is run by a web host that doesn't suck.
6) Have access to, and desire to, restructure the entire version control layout to put config files out of the web root.

If all of those came together, then sure we could have a "professional mode" and a "newb on crappy host" mode. The reality is far from that binary, however.

Of the sites I manage and/or have worked on recently:

  • One is on a well-managed VPS that, much to my surprise, recently installed Git. I do not have root, however, and the site is not managed through Git but through normal SFTP.

  • One is on a crappy shared host with no APC, but it does have Git installed so it could be managed via Git. I do not have root, and access outside the web root is very different than the first one. (This used to be a good host but Drupal's requirements have grown steadily to the point that it won't run there anymore, which makes me very very sad.)

  • One was developed in Git and is deployed via Git to the client's own in-house server. I have root here, but unfortunately so does the client. The client doesn't know the first thing about managing a server, including their own.

  • One was developed in SVN and then moved to git, but is not deployed via Git. I do not have root, nor do I have access to a directory outside the web root (save an arbitrary home dir, which is not a good option for config files). It is the client's internal server. The client recently hand-updated a module without telling us by uninstalling it first and then installing the new one. Hilarity ensued.

  • One is managed through SVN with a full dev/staging/production workflow, SVN all the way through. It is on the client's Amazon account, and the client knows exactly what it's doing with those servers and how to extend and develop their site.

Those are just a few combinations off the top of my head that I know exist in the wild, and will continue to do so. There are plenty more. (I realize the above may come off sounding very negative about clients, which isn't my intent. :-) Most of our clients know know full well what they're doing.) And, actually, in none of those cases have I ever wished I could move anything out of the web root. Frankly when I've worked with other projects in the past that did that it has pissed me off to no end, as it makes installation 10x harder and I've usually given up before getting it installed.

So I don't think the pro/n00b dichotomy realistically exists. That means we can't rely on the simplistic question of "what do people who just download Drupal onto their crappy shared host do" vs. "what do the pros with their own server cluster and super cow powers do".

But that still does not

Fabianx's picture

But that still does not answer the question, why we need to activate write-through by default?

You would activate it on all your sites, but you know what you are doing.

But it is totally optional for 99% of the "n00b" sites, that don't even have a use for this disk store.

And what would we loose by not activating it by default?

Best Wishes,

Fabian

agreed

catch's picture

There are two fundamental requirements to having Drupal serve a page request with the current design:

  1. A hard coded file (more likely 2-3 files) that contains the necessary configuration to bootstrap everything else - settings.php/settings.local.php currently - these need a .php extension, they need to be in the web root for installations via the UI and on crappy hosts, even if it ends up being the json/php hybrid (I would rather keep some version of settings.php though at the moment since the benefits of removing it are not at all clear), it is a special case so has it's own rules.

  2. The Level2/db store - which is where all configuration is read from on run time, and written to when changes are made via the UI. Hopefully that will not need to actually be accessed for full cached pages same as now, but it'll be required.

The requirement for the Level 1/json store is there only for consistency - so that there is a common path for configuration to travel from modules, to the config folder, to be read into the level 2 store. However by design you could rm -rf that folder on a live site, and while you might lose some staged configuration but the site will stay up (same with a very slow svn switch locking the file or something). You could then restore that folder via an export from the UI, or from version control / a backup.

However, by making that level json store required on every single site - including the one-click fantastico install, this is introducing another requirement - the .php extension and <?php die; - which is what is creating the visceral dislike in this thread.

If the json store is not required, then you can have a Drupal install without any sensitive information whatsoever written to those files - it goes in settings.php and the db same as now.

This means you'd have to make an active choice to have files written out to disk - whatever the format of those files is.

On top of this, if the Level 1 store is not required for the site to load, nothing stops it from being pluggable, and then core could ship with the hybrid format and contrib could provide a standard json format (or core could ship both and default to the hybrid). The standard json format write method could then refuse to write to the config directory if (strpos($config_path, DRUPAL_ROOT)) === 0) throw an exception. That means I can set things up if I know what I'm doing, and I don't get syntax errors in my editor.

Switching from one to the other could be done by switching the implementation, then doing a sync to disk from the Level 2/db store - so you still get cross compatibility and the ability to export and import between sites. Again - since the level 1 store is not required to bootstrap, it should be easy enough to change format or move around - this a great, great feature of the spec as it is currently.

Modules ship unsigned, .json files - they should not be shipping with sensitive configuration and those files cannot be signed anyway since they're not site specific.

A hard coded file (more

pounard's picture

A hard coded file (more likely 2-3 files) that contains the necessary configuration to bootstrap everything else - settings.php/settings.local.php currently - these need a .php extension

Go tell that to Zend of Symfony developers :) This is not needed at all. Drupal outside the webroot is not something not doable, it would be enough to copy js and css files in a public resource dir for making it work almost flawlessly.
Plus, you should definitely trust the web server for hiding file, they are really good at doing this.

The requirement for the Level 1/json store is there only for consistency - so that there is a common path for configuration to travel from modules, to the config folder, to be read into the level 2 store.

Again, I do not agree, JSON files are not a requirement, modules could expose their initial setup in their .info file or just a hook, I don't see why this couldn't be done. As soon as you are able to formalize any data as a structured tree of scalar values, they are basically an array. All of Zend INI, XML, JSON, Drupal INFO and raw PHP code can do that.

However, by making that level json store required on every single site - including the one-click fantastico install, this is introducing another requirement - the .php extension and <?php die; - which is what is creating the visceral dislike in this thread.

Again, I disagree, it's site admin choice, and if he is good enough, he will probably protect the files with the lower layers of it OS, not using Drupal, people that are not good enough should be warned about the potential risk, but it's not necessary at all.

If the json store is not required, then you can have a Drupal install without any sensitive information whatsoever written to those files - it goes in settings.php and the db same as now.

It can be anything, as soon as the file place on disk is always the same you can protect it effectively without Drupal. There are a lot of lower layers under Drupal, it's not Drupal role to attempt to fix them. It's effort duplication, and your PHP algorithm will always remain slower than Nginx or Apache themselves doing that.
EDIT: What I mean here is that the system can evolve toward configuration files containing sensitive information without the hash, and without the .php.

(strpos($config_path, DRUPAL_ROOT)) === 0)

What?!

Pierre.

Formats are different

Damien Tournoud's picture

Again, I do not agree, JSON files are not a requirement, modules could expose their initial setup in their .info file or just a hook, I don't see why this couldn't be done. As soon as you are able to formalize any data as a structured tree of scalar values, they are basically an array. All of Zend INI, XML, JSON, Drupal INFO and raw PHP code can do that.

You are ignoring 20 years of computer science :) All those different formats have different capabilities and even if all can represent a "tree of scalar values", they don't give the same guarantees regarding ordering of keys and values, strong typing and string encoding / binary string support. Standardizing to a unique format is definitely a good idea. (Remember the YAML vs XML debacle in Propel?)

Damien Tournoud

Do you need ordering?

pounard's picture

Do you need ordering? Basically that pretty much the only argument you could use here.

Pierre.

Damien Tournoud's picture

Ordering, typing and support for character sets and binary strings are all valid differences between the formats. PHP, JSON and XML have no the same expressiveness in those area, so a configuration object in one format could not be losslessly converted into another format.

Damien Tournoud

Go tell that to Zend of

catch's picture

Go tell that to Zend of Symfony developers :) This is not needed at all. Drupal outside the webroot is not something not doable, it would be enough to copy js and css files in a public resource dir for making it work almost flawlessly.

What is really behind that statement is I don't see any reason to mess with settings.php, certainly not as a requirement of adding the rest of the system. Many systems have a settings.php or conf.php with database credentials so it is a common pattern that people understand - not a Drupalism at all, and while it's not perfect, it works fine.

Also it has to be read every request so it is the one place where the APC argument very much applies. Additionally, if the Level 1 reader/writer classes are not needed on every request (since Level 2 is the runtime requirement and is pluggable), it is a real waste to load those classes, parse a .whatever file into an array/object, then never use them again for the rest of the request, where you could just do require $file. http://drupal.org/node/1055856 optimizes the current settings.php discovery process so that it can be done with exactly zero file stats (with apc.stat = 0).

So all this is saying is that the bootstrap file is a different beast to the Level 1 store, and since I'm pretty sure that Zend and Symfony don't come with web installers that at the end provide you a functioning website, I'm not sure straight comparisons with their system are that useful - you will need to have something, in the webroot, that when the request hits, points you to where the necessary config is to bootstrap the rest of the system (or hard code a relative path which would be no less ick than what we have now).

Again, I do not agree, JSON files are not a requirement, modules could expose their initial setup in their .info file or just a hook, I don't see why this couldn't be done. As soon as you are able to formalize any data as a structured tree of scalar values, they are basically an array. All of Zend INI, XML, JSON, Drupal INFO and raw PHP code can do that.

My main concern here is to make the Level 1 store pluggable - and yes that does indeed mean that as well as JsonWrite and BastardJsonWrite you could also put another format in if you want. As Damien points out these files have different limitations, so this is only going to support those that the same, or a superset of the default that core ships with (we will very likely need ordering for some things). I know the Zend config class has this capability, it would be good design to not hard code to JSON, but that is not a reason to avoid shipping with JSON and making use of the feature set it provides (in the same way we formally support multiple theme engines but it is rare that people use something other than phptemplate). I don't personally have any interest in format wars, although I do generally like JSON as a format aesthetically, but I'm not interested in defending any specific format, while I will defend core shipping with a default implementation.

A massive, massive -1 to putting more crap in .info files - due to a change to css/js loading for modules, these are now loaded into memory on every request in their totality, adding yet more crap to them does not help to reverse what is close to becoming a catastrophic situation with those files.

I'm also -1 to a hook, since that would mean the configuration system has a dependency on the module system. Themes can ship with settings too, and while they can implement some hooks, settings for themes doesn't currently require that. We may also want other non-module things to be able to ship configuration files.

Again, I disagree, it's site admin choice, and if he is good enough, he will probably protect the files with the lower layers of it OS, not using Drupal, people that are not good enough should be warned about the potential risk, but it's not necessary at all.

You're misunderstanding my post.

The spec here has a requirement (one that is not necessary) to always write through to configuration files.

Greggles (contrary to posts above, not the entire security team, this has not been discussed collectively by the security team and there is no collective agreed position), says that if it does this, there is a requirement to use the hybrid format to protect hapless and unlucky admins.

If you remove the first requirement, then the second requirement goes away whether we agree that the second requirement was valid in the first place or not.

Already considered

Crell's picture

Actually, we originally did look into supporting multiple formats. The original design at the sprint included that. We eventually threw it out because it added too much complexity.

For one, modules will have to ship with default config files. What format are those in? Different from the format in the live system? That means having to deserialize and reserialize to a different format. That's a great place for weird errors to creep in, as each format has its own quirks and edge cases.

It also means that people will potentially have to learn multiple formats. This module ships with JSON by default, but this site is configured to use YAML, but this site is configured for some wonky XML format they custom defined... Oh crap, and they all serialize slightly differently and the custom XML format doesn't sign properly for some strange reason. Fail.

Once we said "screw it, JSON, kthxbye" everything got a lot simpler and made a lot more conceptual sense. We then spent most of Saturday afternoon at DrupalCamp Denver bikeshedding the file format at length, and concluded that all file formats suck, PHP sucks, and JSON was still our best option for the on-disk store. It's possible to use a different format in the level 2 active store, although there's again a potential for weird inconsistencies and bugs to creep in.

The main flexibility is provided up at the API level, with custom configuration object classes. (I'm actually very surprised that few people have commented on that part, since that's the part module developers will actually be using.)

Level 2 is pluggable because it's performance sensitive and so high end sites could benefit greatly from moving to Redis or Mongo or whatnot. The on disk store, however, as others have noted is not a performance bottleneck. There's really no value at all to making that format pluggable, but there are serious DX drawbacks (as noted above).

I'm generally a huge supporter of making things pluggable, on principle, but even I don't actually see a use case for making level 1 pluggable other than aesthetics for people who just don't like JSON.

For one, modules will have to

catch's picture

For one, modules will have to ship with default config files. What format are those in? Different from the format in the live system? That means having to deserialize and reserialize to a different format. That's a great place for weird errors to creep in, as each format has its own quirks and edge cases.

You are also already parsing those files into the Level 2 store, which may not store them as serialized JSON - there may well end up being Level 2 stores that add a caching layer in front in APC, or try to use chdb or similar - storing as a serialized json string and deserializing doesn't fit those two very well. I already said that if we make use of JSON's capabilities that may de-facto rule out actually using any other formats, this is fine (as long as I can have a Level 2 store that gets as close to directly accessing the PHP structure itself from shared memory as is doable).

There's really no value at all to making that format pluggable, but there are serious DX drawbacks (as noted above).

There is a DX drawback to using the hybrid format, the visceral response to it on this thread makes that pretty clear - if nothing else, the fact that it breaks syntax highlighting in IDEs.

I did not particularly care about it at the beginning, but as the thread goes on I like it less and less, and it's not just a couple of people who are having this reaction. So if that is non-negotiable as the default that Drupal uses, then this discussion could be put to rest by giving people an opt out. As I already said, modules will not be shipping signed files with their defaults in - so there is already going to have to be a 'pure' JSON reader or some minor difference between the default reader and the Level 1 reader to negotiate those differences.

It also means that people will potentially have to learn multiple formats. This module ships with JSON by default, but this site is configured to use YAML, but this site is configured for some wonky XML format they custom defined... Oh crap, and they all serialize slightly differently and the custom XML format doesn't sign properly for some strange reason. Fail.

These are all valid for phptemplate too, but we still nominally support other theme engines. I'm not saying people should actually try to use non-JSON formats (same as they don't actually try to use Smarty), but this would give a choice between the PHPDieJSON format and just JSON.

At this point, I like the

Josh Benner's picture

At this point, I like the hybrid JSON -- not for it's aesthetic, but for its utility. And while I understand the reluctance to tempt the complexity of making layer 1 pluggable -- like catch says, it could put a lot of the objections to rest.

  1. Don't like JSON? Okay, store to disk using your favorite file format (if you can get it to work and be secure -- good luck!).
  2. Don't want another web-writable directory in your docroot? Okay, store to disk using an alternate layer 1 that stores outside docroot.
  3. Like JSON, but hate the hybrid? Fine, use a custom layer 1 approach to do without these things (but you should probably couple this with storage outside docroot).
  4. Have some crazy hosting setup that requires completely custom configuration management? Sounds cool, but you might end up needing a custom layer 1 for your solution.

Security team

Crell's picture

So if [PHP-headed JSON] is non-negotiable as the default that Drupal uses

Frankly I have a viscerally negative response to it, too. I hate the idea of mixing formats, and naming it .php is like nails on the chalkboard. If we could just do straight up JSON, I'd be much happier myself.

However, straight-up JSON is insufficiently secure and "well just put that directory outside the web root if you care about security" is not an answer. If the header is necessary for security, then I will suck it up.

This is an area where democracy should not win. We have a security team for a reason. If they say a hybrid header is what we have to do, then yes that's non-negotiable and I don't see a good case for an opt-out. If they can suggest an alternative that we hate less, we can use that. But that's, really, not the decision of me or anyone else in this thread to make. We should let the security team do its job here. I will defer to their judgment on this front, and I believe everyone else should as well.

The security team has not

catch's picture

The security team has not discussed this internally, nor is there a collective agreed position. Some members of the security team want to enforce the header, many have not spoken out either way, some may not be following this discussion at all, some like me (and I'll be the first to admit I'm an inactive member of the security team without a strong security background, but I understand and respect the concerns greggles has raised even if I disagree with their relative importance) are not happy with it.

Greggles has not claimed to be speaking on behalf of the entire security team, so nor should anyone else until that discussion actually takes place and is agreed on, otherwise it's putting words in their mouth. Perhaps everyone except me will think it's a great idea, but let's ask them first no?

The counter-proposal is no longer "just put the config directory outside the web root", it is "make the write through to files optional, disabled by default, and/or pluggable" - which does not run the risk of anyone getting into an insecure situation out of the box (in fact they would probably need to do it deliberately with some of the checks proposed in this thread). I have not seen any responses yet that actually deal with that proposal - neither from members of the security team, nor from anyone else frankly. You said that alternative file formats had already been considered - IMO that decision was made before a hybrid file format was imposed (it was certainly discussed a fair bit before the sprint - and I agreed with it until this came up), so there is new information to consider now.

Let's summarize the current state of the proposal:

  1. Drupal will create it's own hybrid format of JSON, which has a .php extension and throws a syntax error if requested.

  2. Writing configuration to this format will be required to run a Drupal install.

  3. There will be absolutely no-opt out.

  4. Discussion of this is 'off-topic' because it is a security issue, even though most of the discussion over the past 48 hours has been discussing a potential solution which does not fit into the original constraints that security solution was created to solve.

If you think it is unreasonable to keep belabouring this, then I would like to know what the proposed solution will be for http://drupal.org/node/1191314 - note the test result.

Now apply it to all kinds of third-party monitoring software that is Drupal-agnostic.

If they let you write .php

pounard's picture

If they let you write .php files, be afraid, be very afraid :)

Pierre.

You have a point about

pounard's picture

You have a point about sharing among sites. For the complexity I'm not sure this really add many.

Pierre.

Just to give some more

Fabianx's picture

Just to give some more information I just got:

e) The config files themselves have to be stored alongside a signature, so we can detect rogue modifications and prevent them from infecting the active store. That signature is going to be OOGLAY and we have to store it somehow.

This is protecting those files, because they are signed by a sites/default/key.php secret key. So if you don't have the secret key, you can't write / modify those files.

Something that might be helpful from the team is a bit more explanation about what problems the "auto-sync to files" pattern is intended to solve. And also to talk about those in context of whether those reasons apply for shared hosts and the "default" workflow most inexperienced people will use. If there's a mis-match there, perhaps offering a pair of Active State options—ActiveStoreDbOnly and ActiveStoreWriteToFilesToo or something—might be one way of dealing with this; if we don't write the files on crappy hosts and force them to use the DB, we no longer have a security hole and we can just fanangle pure .json everywhere else. Argument solved.

I do agree here. I can totally see the use-case for write-through for easier deployment and updates and "reverting" and "importing" and I actually love it, but I can't see it for users not using these deploy tools.

So the 2nd level active store should be optional and the option signed with "Do only activate, if you know that you can protect those files properly.".

We can then use whatever file format we want. This still could be data.json.php + data.json.php.sig with

<?php
"// <?php die(); /*"
?>

Or just be the data without the additional PHP protection, but at least there would be no users affected by default that are just using drupal on shared hosting.

By the ActiveDiskStore being optional we are having much less isses.

That being said, I would like to see ActiveDisk store to be anyway:

a) pluggable - we will need to support strongarm / spaces / "dev_staging_production_different_vars_for_each" etc. kind of workflows anyway later.

b) chainable

<?php
$store
= new ActiveDBStore("db", new ActiveDiskStore("disk")));
?>

To import would then be:

<?php
  $store
->pull("disk", "db"); // Import Vars from Disk to DB
?>

And other contrib modules could do much more complicated workflows, while still utilizing the default $store(s).

You can even have:

<?php
$store
= new ActiveDBStore("db", new ActiveDiskStore("disk", new ActiveModuleStore("modules"))));
?>

So modules default data in .json.php can be imported / accessed and overridden. This is really flexible.

Of course for specific stores, you'd just call $store->getStorage("my_namespce") or the like and whatever store is set will handle it in that cascaded way. For those computer science interested, this would be factory + decorator pattern.

As you can see in this model it is easy for the store to be flexible enough to support easy and advanced use cases in core and many more in contrib.

Best Wishes,

Fabian

Discussing more about this in

Fabianx's picture

Discussing more about this in IRC, chainability has lots of flexibility and also allows overrides for for example StrongArm or Spaces type things:

  • The disk store is completely optional, but can be turned on on demand in which case it requests a resync by the DB store. (It doesn't make sense to copy files, if you then don't write to them and they are unchanged)

Here is an example of how this would work for module_enabled. Note that each class is only dealing with one specific function:

[07:52] Fabianx: $store->module_enabled("foo");
[07:52] Fabianx: starts
[07:52] Fabianx: ActiveDBStore->module_enabled("foo")
[07:52] Fabianx: ActiveDBStore gets data from upper layers.
[07:52] Fabianx: ActiveDiskStore->module_enabled("foo")
[07:53] Fabianx: ActiveDiskStore gets data from upper layers.
[07:53] Fabianx: ActiveModulesStore->module_enabled("foo")
[07:54] Fabianx: ActiveModulesStore->module_enabled("foo")
[07:54] Fabianx: this gets the data from the module, parses it, whatever.
[07:55] Fabianx: Then gives it back to upper layers.
[07:55] Fabianx: ActiveDiskStore writes it to config/foo.json.php (if not there already)
[07:55] Fabianx: ActiveDBStore writes it to DB. (if not there already)
[07:55] Fabianx: And a ActivePartialMemcacheStore temporarily save it to Memcache.
[07:55] Fabianx: Caching for free too.

So it is really easy to take items out of the chain or re-add them.

catch: Fabianx: so one thing about this, is people were saying they always want to be able to go onto a Drupal site, and regardless of how it's set up, see the config in the same place.
catch: This would remove that ability.
[07:56] Fabianx: catch: But it would just be one mouse click away.
[07:56] Fabianx: Activate Disk Store, clear cache and you're there.
[07:56] catch: Yeah.
[07:57] Fabianx: Config missing? Activate Disk Store.
[07:57] Fabianx: Done.
[07:57] Fabianx: And we want to have this configuration partially cached, too.
[07:57] Fabianx: You get caching for free.
[07:58] Fabianx: And StrongArm implemenation.
[07:58] Fabianx: As easy as:
[07:58] Fabianx: StrongArmStore(ActiveDBStore(etc.))

Another example:

You want some items to automatically be updated (views), but other just by review and even others to never be overwritten ("locked").

You can do that:

AutoSyncActiveStore(ActiveDBStore(...)) and then decide by module, what you want to do ...

Best Wishes,

Fabian

.php offers a very nice

pounard's picture

.php offers a very nice second level of defense in case .htaccess fails, because if Apache is working /at all/, those files should be parseable and simply show a syntax error (or a white page) rather than your database password.

Creating writable .php file is the hugest vector of attack you could open. By creating those, any attacker that can modify those files will be able to prepend any kind of code before the die, then just hit the URL if .htaccess is broken, or make it run inside by another PHP script (or a PHP eval'd code).

You are not making security, you are opening the biggest Drupal security flaw ever. And this is only one. You are making your configuration being executable!

Trust the lower layer. You know UNIX? "A program has a single task, and must do it the best" (poor french translation, but the original expression is english, I can't find it right now)". You are basically delegating to core critical security responsibility that it shouldn't take care of, and because the PHP application is itself the vector of attack by definition (and experience) you are basically reducing to zero the real protection. You are opening new vectors, wide open, you are complexifying things that must remain simple, creating runtime overhead, loosing new developers, and making things hard for older ones.

Just for fun, let me quote:

"when smart people propose complex solutions for seemingly simple problems, either the people are not so smart or the problems are not so simple"
-- ksenzee - #drupal-contribute@freenode, yesterday night,

I would answer:

"For users, clarity is more helpful than magic",
"the simplest answer is almost always right"
-- The Tao of the Symphony

Pierre.

In your scenario, these two

Josh Benner's picture

In your scenario, these two things are true to allow the attack to succeed:
1. .htaccess protection of config directory has failed.
2. There is a compromise that allows attack to write arbitrary data to config directory.

How is that any different than the way Drupal handles files? If the same things are true...
1. .htaccess protection of files directory has failed.
2. There is a compromise that allows attack to write arbitrary data to files directory.

... it is the EXACT same scenario. Therefore, the config directory being writable is no different than the files directory being writable, and no new (or rather, more vulnerable) security threat is introduced.

Untrue, you can write files

pounard's picture

Untrue, you can write files using other flaws than a messy .htaccess, it could be PHP eval'd code from some input, it's sufficient to break the whole security.

Pierre.

Only if executed

Crell's picture

The web server writing out a .php file that the web server would then execute, yes, that's a huge security risk. However, if it's treating the file as a data stream rather than as executable code then it doesn't matter. If you put executable PHP code into that file during an attack, you'll just get a parse error when we run json_decode() on it.

If someone managed to get access to the file through another compromise in the system and overwrite it, writing PHP there wouldn't help because no normal person would be going to that file in the browser anyway. That wouldn't be a useful attack. They'd want to write some alternate configuration to break the system or make Drupal do something mischievous... which is what the file signing is there to prevent.

The syntax argument is

pounard's picture

The syntax argument is irrevelant, if I can prepend some PHP code and an opening comment there is not syntax error anymore.

EDIT: Plus, if I can write the file, I just can remove whatever else is in it.

Pierre.

just 'normal people' eh?

catch's picture

If someone managed to get access to the file through another compromise in the system and overwrite it, writing PHP there wouldn't help because no normal person would be going to that file in the browser anyway. That wouldn't be a useful attack.

Why does this require 'normal people' to visit the file? Just write the file then visit it yourself.

  • Put an infinite loop in the file, then hit it with your tool of choice.

  • Do a recursive file delete on the files directory.

  • Read sensitive file information stored outside the webroot or do fun things with exec.

  • If the file generates syntax error when you browse to it manually, it will generate a 500 error. Many load balancers cycle out web servers when they encounter 500s. chx mentioned this as a feature before (the syntax error) but I know of at least one large Drupal install where 500 errors would cause servers to cycle and it seems like causing the file to fail php -l would not even require write access.

  • You could also do things like require_once ../../../includes/bootstrap.inc, that gives you access to read and write everything including the level 2 store. There are a few modules that ship with .php files, do a relative include, minimal bootstrap for things like stats logging etc. - what makes this file different (please don't say .htaccess protection)?

Some of these examples may not be valid, but the fact that Drupal does not directly include the file, does not mean that web-writable PHP files are not adding any additional risk at all.

Just to clarify, you can do

catch's picture

Just to clarify, you can do all of these things with the files directory now (assuming you can get past the filename munging for uploads or directly write to the directory + .htaccess not working), so they are not new attacks, but it doesn't help to claim that this is any more immune to them.

The only possible exception is causing the 500 error from invalid php, I will see about running a quick test to validate how much of an issue that is.

syntax error

catch's picture

So using the example from the opening post:

foo.json.php
<?php die(); 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "value1",
    "value2"],
  "boolean_value": false,
}

catch@catch-toshiba:~/www/8$ php -l json.php
PHP Parse error:  syntax error, unexpected T_STRING in json.php on line 1
Errors parsing json.php

You can add a closing bracket to the initial line:

<?php die();?> 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "string_value": "string",
  "integer_value": 1,
  "array_or_object_value": [
    "value1",
    "value2"],
  "boolean_value": false,
}

That gives this:

catch@catch-toshiba:~/www/8$ php -l json.php
No syntax errors detected in json.php

But then I can do:

?php die();?> 723fd490de3fb7203c3a408abee8c0bf3c2d302392[snip]
{
  "string_value": "<?php ",
  "integer_value": 1,
  "array_or_object_value": [
    "value1",
    "value2"],
  "boolean_value": false,
}

which gives

catch@catch-toshiba:~/www/8$ php -l json.php
PHP Parse error:  syntax error, unexpected T_STRING in json.php on line 4
Errors parsing json.php

again.

Regardless of any other arguments, having PHP files with syntax errors in them is a no go for numerous reasons - whether load balancers or IDEs. And chx's request above was for a vulnerability that does not already exist in D6 - DDOS via 500 errors on-demand counts (actually image module can produce 500s for no good reason too, but not on demand without any other kind of error condition).

Even if it's only applicable to a small-ish number of installations, I'm sure they'll blame Drupal for throwing 500s out of the box and cycling out all their web servers in a permanent loop. I definitely blamed image module for doing it when I had to grep logs for where all the fatal errors were coming from when a particular site was going down.

You are mixing up everything

Damien Tournoud's picture

Creating writable .php file is the hugest vector of attack you could open.

You are not thinking straight here. It doesn't matter whatsoever which type of file the system is creating (.php, .json, .jpg, .pl, .py, etc.):

  • If you have a working .htaccess file, those files will not be executable
  • If the directory is writable by the web server (that's what we are talking about here), any file can be created in it, regardless of the files that are already there

The .htaccess file and the .json.php extension are designed to protect against two different vulnerabilities:

  • The .htaccess file is designed to prevent access to the files from the web
  • When the .htaccess file is not working correctly (which is sadly extremely frequent), the .json.php offer a second line of defense that doesn't reduce at all the security of the first line

Damien Tournoud

So you agree with me that

pounard's picture

So you agree with me that finally you need to rely on web server configuration. It seems that you are contradicting your colleagues :)

.json.php is an aberration, if your HTTPd is not properly configured, PHP will probably be executable whatever you do; In that case, if the file is writable and named .php, it just makes the system more vulnerable.

So, if the HTTPd is not working properly (sadly extremely frequent) you are opening a biggest hole that you intend to fix in the beginning.

Pierre.

Funny

Damien Tournoud's picture

Please re-read my post. You understood exactly the contrary of what I'm saying.

.json.php is an aberration, if your HTTPd is not properly configured, PHP will probably be executable whatever you do; In that case, if the file is writable and named .php, it just makes the system more vulnerable.

As I said, if your web server is not properly configured, and you have a writable directory (which is what we are talking about here), the type of file you create inside it doesn't matter at all. It's better in that case to create a .php file as it will not be readable from the web.

The .php is not opening any new security vulnerability. On the contrary, it offers a similar or better security in all cases.

Damien Tournoud

As I said, if your web server

pounard's picture

As I said, if your web server is not properly configured, and you have a writable directory (which is what we are talking about here), the type of file you create inside it doesn't matter at all. It's better in that case to create a .php file as it will not be readable from the web.

You are right, then I'll write

<?php
require_once './bootstrap.php';require_once '/includes/bootstrap.inc';drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);print_r(db_select('users', 'u')->execute()->fetchAll());
?>

In that file.

Pierre.

Stop for a minute and think

Damien Tournoud's picture

Please re-read my previous posts and don't reply again until you understand them.

Damien Tournoud

Done. For whatever you are

pounard's picture

Done.

For whatever you are fighting for, the .php extension is the worst stuff you ever came up with. If you do a raw PHP configuration, some configuration array I'm fine with, but write protect it (make it read-only if you don't understand the previous statement). For everything else you would want to be able to write, PHP is the worst choice you ever made, once you granted the write right it can be used by any means to inject arbitrary code, whatever it does, let attackers injecting code is far worst than let him read your configuration.

If you want to secure yourself your files, put them outside the webroot, it's the only real way to secure it. Basically you are fighting for something that doesn't worth fighting because you will never win against PHP sadness itself. It's a fight you lost in advance, not against me, but against potential attackers, so stop trying to make people think you are smarter than me, you may be are but I really don't wanna know if it's true. PHP is dangerous, attackers will probably be stubborn than you in the end and you cannot win.

Even if you secure this, as hard as you can, it falls appart as soon as you let people add poorly coded modules that may include far worst security problems, such as PHP sample files in third party libraries.

Pierre.

Pierre.

You can stop right here

Damien Tournoud's picture

Again, please understand the whole directory has to be writable, so writing PHP files in there doesn't open any new security issue. Thank you for being concerned about Drupal's security, but stop fighting on flawed grounds.

Damien Tournoud

pounard's picture

Indeed, you are totally right, it doesn't degrade the actual security. I was wrong arguing about this point. I sincerely apologise for this.

This still need to put the die(); PHP statement within those files. It put back on the table the fact that this syntax merge JSON/PHP will probably break a lot of advanced code editors features.

You said your files needs to be human readable, and I agree: I will add that files must be human editable too. But the weird syntax will make advanced editors bug: indeed, in the worst case scenario, the syntax melt will make Eclipse (for example) mark the full PHP project as being erreneous, therefor breaking this error detection feature. Seems to me like a regression.

You can still put this die(); statement in JSON comments, it will actually work fine. But PHP in there will need to be fully valid, which means adding the closing tag accordingly. To go a bit further, this implies, IMHO, a too strong consequency on files formalism which get them far away from the real JSON, which goal is primarely to remain simple.

Another example, this was an argument from chx himself: every techno is fully featured for reading and writing JSON. This means that with JSON we can interoperate easily with third party software that could potentially want to write Drupal config. This is a noble cause, I agree with this, we could do powerfull admin tools.

That said each time a third party software will need to write the Drupal JSON config, it wont be able to use their internal writer, because of the strong PHP die(); statement presence. It will need to create its own JSON writer or bootstrap a Drupal to use its for writing the file safely. This is nonsense because the interoperability statement falls apart because of this.

If the sotware don't want to implement this, files will be valid JSON, but Drupal core won't be able to read them because it'll try to strip the first line before reading, thus making the remaining JSON invalid. Plus, the external software written files will be flattened again (one liner) non human-readable again thereof, which un-serve the original purpose (will also totally break VCS diff'ing).

Naming JSON files .php will potentially confuse system administrators that are looking for files to manually edit. A lot of them are not Drupal zealots therefore don't have anything for that matter to care about core internals, they will just hate looking for 1) JSON configuration files (honestly, who on the earth will do that?!)... 2) ... named .php instead of .js.

I think this remains a totally useless complexity. Please, don't do that, it will definitely bother more people than it will help.

In conclusion, I won't add anything about security, you totally made your point and you are right. But, I will still continue to argue the fact that the only true way to secure your files is to put them outside the web root: any attempt to secure a file, not matter how, inside the web root is a desperate and lost cause.

Please name your file with their real name: .js, provide core with a well written and documented .htaccess, hopefully most mutualized environments will be happy with it. This is probably what should be done for non technical end-users, and it will work. Non technical end-user do not administrate their own web server, so a working .htaccess will be more than enough to protect the files. In case the .htaccess doesn't work because of hosting, then it's not your deal anymore, it's more like the user shouldn't use this hosting anymore.

Further, if a power system administrator will deploy a Drupal itself, he probably will bypass the .htaccess and deport all directives inside its own written VirtualHost, which is good. These kind of users will read the .htaccess file provided and attentively report any security measures into their VirtualHost. Doing this, they will probably test it. It's not your deal anymore except writing a good deployment documentation.

Stop trying to figure out how to fix environment related stuff. Drupal is the web application, not the administrator itself. You loose too much energy on that stuff while you could provide simple and fast code instead of providing obfuscated and dawn slow code because you are trying, at runtime and everyother moment, always, to fix every bit of environmental stuff that doesn't fit you.

Pierre.

Raining on your parade

perusio's picture

I must say that the idea of adding .php to a JSON file sprinkled with die() and a token is bad idea. I understand the security concerns. But I think that trying to preempt being p0wned if don't know how to manage the server is a disservice to the ones that can handle the config and know how to setup the environment. Here's an idea that could provide some protection while still delivering a good DX.

  1. Use regular JSON, no PHP or any other non valid JS inside. Take full advantage of the data = code equation.

  2. While bootstrapping check to see if the files are correct (create a checksum to prevent tampering) if not or are readable through the web
    abort the boostrap and display a big fat error. Use a non standard status, like 419 to signal the error.

  3. While installing drupal provide a choice of configuration management profiles:
    i) advanced - no .php, no die(), nada: straight JSON.
    ii) beginner - add the .php add the die()(no longer valid JS)
    Assume that the profile is advanced if you're installing via drush.

  4. Allow the possibility of changing profiles at any moment.

It's complex, without a doubt, but is much better IMHO, than degrading the DX by catering to the lowest common denominator. You're providing a choice. Afterwards is no longer our responsability, but from the person installing/managing the site. I think we should keep it simple. Hide all complexity as much as possible. Make security be invisible as much as possible.

Oh, if only it were that easy. ;)

webchick's picture

Stop trying to figure out how to fix environment related stuff. Drupal is the web application, not the administrator itself.

Unfortunately, we don't have that option. We need to perform due diligence to protect our users from harm. Anything less is asking for an enormous burden on the security team trying to combat a constant stream of FUD, because it isn't crappy web host/crappy FTP program that gets blamed when Drupal is compromised, it's of course Drupal. "Well, your environment is configured stupidly" is unfortunately not good enough when you consider how large of a percentage of non-technical users Drupal has. Additionally, Crell points out above that even on web hosts where you have some level of shell access, sometimes a directory outside of webroot is simply not available. I've never personally seen that, but that doesn't mean the situation doesn't exist.

We set this precedent back in 2006 with the introduction of the .htaccess in the /files directory which prevents execution of files there. See http://drupal.org/node/66763 for more details.

So, yes. Unfortunately, this is our problem to solve, and can't be shoved onto the responsibility of the environments.

this is not new

catch's picture

You can already do that with the files directory:

If you can get write access to the directory.

And you can visit the file via the browser.

The files directory has some protection against this (file uploads prevent uploading .php extension by default, .htaccess to prevent .php files being read if they came from elsewhere). So having the same potential issue with the config directory is not a new issue - since there will not be any way to upload files to that directory via the browser, all changes to the file system are via the API, direct edits on the web server, or via drush. If one of these is compromised, you are already pwned.

If you have a directory that has all .xml files in - what stops you writing a .php file to that directory - assuming everything else is equal?

I disagree! No, wait.

q0rban's picture

Now you'll all tell me why I'm wrong. ;) Please do!

You're wrong here, I refuse to tell you you're wrong.. Erm.. wait..

I really like the idea of having default DB (only) store. Can we couple that with forcing users who use DB + File Store to store files outside the docroot? Contrib could then provide another solution for "DB + File store inside the docroot" if you so choose, but at least Core is not giving people the option to (potentially) bare their sensitive parts on the interweb.

+1 I like this approach of

Fabianx's picture

+1

I like this approach of core not allowing this by default!

Posted above, but something

catch's picture

Posted above, but something along the lines of:

<?php
class JSONConfigWriter Implements ConfigWriter {

functoin __construct($config_path) {
  if (
strpos($config_path, DRUPAL_ROOT)) === 0) {
   ...throw
an exception, or alternatively just don't write anything out.
  }
}
?>

may be enough.

From reading over this

christianchristensen's picture

From reading over this thread: this seems like a very sane approach.

From the security perspective there is a need here to have a "trusted" area that configs can live; right now this trusted area is the database and a PHP file (settings.php) - if this is trusted and we cannot find (across all hosting platforms) another trusted storage, then we should continue to use the DB.

JSON is the right choice for this storage!

I am curious what other discussions went on with the security team as there seems to be several options, but this one really receiving a lot of heat (embedding "<?php die();" somewhere). This mixing of PHP into config data as a Drupal standard is wrong; as mentioned it hurts DX, sets Drupal apart (in the wrong tone), and seems problematic when editing by hand (as is one motivation for the JSON file).

I believe in the bootstrap (or Drupal setup) a detection script that gives some information about the environment - namely detection of config directories outside of web-root - is a sufficient guard against "shooting yourself in the foot". Explicitly setting configs (likely not UI accessible) to point outside of the web-root or force bypass would enable the saving to file.

(a thought: re: Labour youtube; isn't the discussion here regarding "security through <?php die();" equivalent to requiring those SQL dumps to be named .php and have a PHP function at the top before the SQL ... it may have helped - but there would have still been a problem...)

Nice russian translation

phlop's picture

Nice russian translation

Writing to disk only on demand

ontological's picture

I like the idea of only writing to disk on demand. When the user initiates the configuration dump they are doing it for a reason and intend to do something with it. It is reasonable then for them to be responsible for moving it to a secure location.

Even though there will be good protections in place, there will be the perception that Drupal is less secure because of the practice.

This was the jist of my idea

arcaneadam's picture

In talking with chx and fabianx last night in IRC this seemed like the way to go. To reiterate my point, writing files to the webserver that don't NEED to be there all the time, and then trying to find a way to make them secure is like leaving your house unlocked all the time then trying to find a way to make your alarm system account for that. Basically it's crazy.

I think the way moving forward is to use files, but generate and do export/import through an UI, that warns about leaving the files in place, or even offers import/export in multiple ways (file on the system, text area input, or direct file upload/download) that way we can let users have the different options available and then if they choose the less secure option, well "You were warned." It then becomes, to stick with the house comparison, more like keeping your house locked but CHOOSING to keep a key under the mat, give one out, or just leave your door unlocked (Which for some sites isn't a big deal, if you don't have any sensitive data in a basic install, then a config file left on the server may not pose a security threat even if someone can get to it).

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

+1

pounard's picture

+1 Files are a good import/export facility. And I don't see anything wrong with a good .ini or .js file with configuration if your webserver is correctly configured. It's a lot less risky than .php writable files.

"For users, clarity is more helpful than magic" -- is exactly this. If you write unsecure files, explain to your user they are unsafe instead of trying to mask this complexity to him. Better leave an known open door than an hidden catflap.

EDIt: Plus, if you go further this way, JSON is not preferable anymore to any other formalism since this is pure export, you can basically write XML, JSON, INI, YAML readers and writers easily for everyone to be happy with its preferred file format.

Pierre.

Bundling Configuration

e2thex's picture

So I am trying to see how one could bundle configuration. And do not really see a way.

As an example lets say I have a blog post content type a block taxonomy a few views that display blog post, and I want to wrap all of that up into one config so I can use it on my sites that need a blog, is there a way to do that with in this proposal?

Features module for Drupal 8

q0rban's picture

Features module for Drupal 8 would do this by creating dependencies on the pieces of configuration. If core can provide a way to export these bits to code, all Features module needs is a way to create dependencies on these bits of configuration.

I am not sure how the new

e2thex's picture

I am not sure how the new features would interface with this config build. They seem to be very similar tasks, maybe we should look for ways to merge them. I mean part of this system is going to be choosing to get the code out to the file system, do we need a core way and a d8 features way?

D8 Features is a nebulous concept at this point

arcaneadam's picture

They will wait for D8 core to be defined and determined before they start to worry about how they will do their thing. I don't think we need to get sidetracked on, "Well how is the feature (config mgmt), going to integrate with Features (the module)?" It's a totally moot point. If we write a good enough system/API then Features will integrate their workflow with it. But their workflow/logic should not effect our decision making other then from general scope of "Can we make this API easier/better?"

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

I guess what I am saying is I

e2thex's picture

I guess what I am saying is I do not see how the features module would get into the config as it is being presented here. Do you see what part of the API features would be able to use?

Integrated

Crell's picture

Features currently has 2 distinct use cases:

1) Push configuration changes to code for saner deployment. In this case features are pretty much all site-specific.

2) Packaged up turn-key functionality, especially for distributions.

For use case 1, Features.module goes away entirely. Configuration is on disk to begin with, so no extra system is needed at all.

For #2, a module should be able to ship with more than one config file. So if a module ships with, say, 5 view.foo.json files, all of those can be copied into the config directory on install. If core doesn't do that itself then it would be trivial for a contrib module to do so. There may or may not need to be a separate GUI to help copy config files TO such modules, but it would also be trivial to do by hand.

Steve (waaay above) noted a potential issue with upgrades, which is valid and could use more input.

Part of that second part

e2thex's picture

Part of that second part might be changing the configuration that is defined by other modules (and in an ideal worked even configuration defined by other features) So while it would be trivial for them to define new configuration, what about overriding configuration defined in other modules?

Do you see exporting exporting that whole piece of of configuration and having just live in two modules? I think core would have to be the one that resolved which one to use no?

So to move this away from a what would features do disscussion.

how would the core system deal with multi module defining the same config, and would there be a way for a module to define just part of a config, so that the rest is defined higher up the tree?

a quick example to help clarify
1 google_analytics module defines a config with default values

2 distro_ga defines a a ga config with best practices for some industry

3 sites_ga want to use all of the ga config from distro_ga (want to use best practices) but want to change a few settings for there group of sites

4 and then the general site version of the config want to include the api key.

If any thing change in 1,2,3 it would be nice if they propagate up. and were not wiped out by any of the other that where just changing a few items.

if core had a way for this to happen I think contrib could find a way to export it out. But it would be much harder for contrib to find a way to put it all together.

maybe the files could be merged? order by a decimal number in the fine name? so

ga.json.php is the base then
ga.1.json.php is meger from disto_ga then
ga.11.json.php is merged from sites_ga then
ga.9.json.php is merged from the general config?

this are easily order and such so just let each one down the chain over ride its changes?

or maybe I am missing the way this is already there

I think module weight has

Josh Benner's picture

I think module weight has been mentioned several times as being the manner of resolving these types of scenarios.

Something like this

stevector's picture

As Crell mentions, I asked about the same use case in http://groups.drupal.org/node/155559#comment-520564

I'm glad you're thinking about it too e2thex. This approach would alleviate some major site building pain points. David Strauss said an alter hook would get the job done. http://groups.drupal.org/node/155559#comment-522389 I agree.

A mechanism to apply the contents of those hooks in a smart order, beyond the normal module weight approach would also help a lot. I suppose an integer in the file name would work. That probably has implications beyond my understanding at this point and might be best worked out in an issue queue.

As long as core has the hook, a contrib module like Features could work out the details.

Going of on a slight tangent

perusio's picture

please, please do not hardcode any type of check to Apache's quirks. The more recent web servers have no .htaccess like concept. You cannot override any setting that is not in a vhost config. Therefore thinking security schemes that rely on .htaccess and printing big fat red warnings on status pages is IMHO, a niet niet. Please think your security without relying on specific quirks of a particular web server.

As for the JSON vs. PHP thing. IIRC there's one of the oldest living programming languages that got that part right 50+ years ago: data = code.
JSON brings that back.

The whole security debate is

pounard's picture

The whole security debate is a useless, it's a bootstrap problem:

Drupal needs to be sane (not compromised) to ensure correct file checks.
Files need to be sane (not compromised) to ensure Drupal is sane.

This is a circular dependency.

This is true not only for configuration: if you put your nice hash and die() in configuration, you should probably do the same in all Drupal files to ensure security.

Just put your configuration files somewhere with a real name (.js for json) and hide them with an .htaccess and basta. Every attempt to make the software checking security over itself is just a philosiphical problem (were there eggs first, or were there hens first?). It's useless, if one (of the files or the software itself) is compromised both are because both are the same entity at the OS level.

Pierre.

Please keep in mind, every

q0rban's picture

Please keep in mind, every time you make an edit to your post, you are sending us all an e-mail. :)

Yes that's something really

pounard's picture

Yes that's something really awful with g.d.o, as much as the fact that all html input have a fixed foreground color but no background color. Each time I type text I have to use firebug to disable the foreground color to ensure I can see the text.

Pierre.

Why doing simple when we can do complicated ?

Sylvain Lecoy's picture

Yes really I'm thinking the same thing, thank to having highlighted it. I really don't understand why we need an home-made security layer for those files where any other files in the framework doesn't depend on the same layer..

PHP vs JSON

e2thex's picture

I want to weigh in on the json file store vs using a php way of storing

JSON PHP
Get to have the web server export code Have to have an export and upload or rely on other tools like drush
Arguably slightly prettier obj desc
No way to programmatic effect config Allow people flexibility in defining Config

I would argue that allowing the flexibility for contrib and site builders to hack at configuration, out ways having core export configuration directly to the file system

I think we can brake drupal user up into three groups to look at this problem
1. people that will never export configuration, and do not care about version controling thier config
2. people that want to export config but are not able or wiling to use a drush tool (or a export and upload)
3. people that want to export and will be using drush

I think group 2 is very small and giving up a engine of innovation so that we make things easier for group 2 seems like a big loss. Mind you I have not done a survey of drupalnistas so I have no numbers to back up this assertion. But I do know, that one of the great parts of working and contributing to drupal is the fact that you can poke your head into everything and hack away. I think closing that off to configuration would be a loss.

I will add this:JSON:-

Sylvain Lecoy's picture

I will add this:

JSON:
- die() wierd syntax to avoid reading the content.
- hash checking
- .json.php in case the webserver is misconfigured.
- wrapper function of json_decode for pretty print.

PHP:
- just as it work for settings.php. It works great and have been proven to work with Drupal for many years.

So... I believe the debate is not PHP vs. JSON, but is more a question which is a known drupalism problem: Why doing it simple when we can do it complicated ?

The same thing? Really? Is

Josh Benner's picture

The same thing? Really? Is settings.php writable after install?

No its not, but this is not

Sylvain Lecoy's picture

No its not, but this is not the problem I guess, views exportable for instance generates such arrays.

Does Drupal write views

Josh Benner's picture

Does Drupal write views exports to disk?

right..

webchick's picture

The difference between the status quo and the proposed system is the existing config files are read-only and only updateable via drush or sftp. The proposed system would make these files writable by the application, so that synchronization with the active store (db) at all times was ensured.

I think this is the part that

e2thex's picture

I think this is the part that I might be missing.

I am not seeing the value in keeping the active store in the db in sync with the file system. I can see one saying ok I am happy with my settings lets write them to code so I can check them in and deploy. but what are we getting by having the config written out automatically?

Give it a rest

Damien Tournoud's picture

Let's give the choice of format a rest. PHP was considered and rejected, for reasons I already highlighted in http://groups.drupal.org/node/155559#comment-521629

The best reason is that we need to read configuration objects from the database (which rules out PHP), and there is a lot of reasons not to have a different format between the canonical store and the active store, see: http://groups.drupal.org/node/155559#comment-523929

Damien Tournoud

Custom config classes

dmitrig01's picture

I'm not really sure I understand this:

<?php
class MyAdvancedConfig extends DrupalConfig {
  function
someHelperMethod() {
   
$this->pants = 'fancy';
    echo
"Ma'am, you have my compliments on your fancy pants.";
  }
}
$config = config('core.site', 'MyAdvancedConfig');
$config->someHelperMethod();
$config->save();
?>

Specifically, how is that useful? The one use I could see is something a bit different: if it was possible to declare that core.site's config class was always MyAdvancedConfig, then config('core.site') instanceof MyAdvancedConfig.

However, without that, I fail to see how it's useful. Enlighten me!

This could be handy for

Josh Benner's picture

This could be handy for purpose specific logic that might impact the way a configuration loads. I think a specific example might be best... and I'll share one as soon as I have one, but in general, putting maximum flexibility (especially when this is a very easy/cheap [as in affordable] way to do it) in the hands of the module developer is usually a good.

Some less-than-specific applications of this:

  1. Query a custom table for data that might impact a config setting.
  2. Cache a setting from an external resource, when the cache has timed out, automatically refresh it upon load of the config setting.
  3. When saving a specific setting, notify a web service on another site in your network.

All of that said... I do tend to agree with dmitrig01 -- it would be handy if we could let configs load with a specific class. This starts to get into more about the plug-ability of the configuration system... but this made me consider another extension to the storage format:

{
  [security line(s)],
  "config_class" : "MyAdvancedConfig",
  "my_setting" : "blah..."
}

... which would imply that Drupal would know about something like the "config_class" key, and try to instantiate the object using the configured class (presumably an extension of the base config class) rather than the default class.

Such an approach might accomplish what dmitri says: to always load certain config files using an optionally specified class (rather than needing to specify it in the code).

I'm really not sure if this would be so useful enough to pursue, but I thought I'd share the idea-riff I had when I read dmitri's comment.

I had something exactly like

dmitrig01's picture

I had something exactly like this in mind, I thought maybe it should be called _class -- either works. But, I agree with you that it might not be useful enough. But again, I think having to specify the class every time the config is loaded seems even less useful.

Possible

Crell's picture

The use case for using a custom class is systems with more complex configuration needs, such as Views, Feeds, Panels, WSCCI Plugins, etc. Your typical module will likely not need more than the bare properties for its module.me.json.php file, so using the default "bare" class is fine. That will likely be the most common case. For more advanced cases, having utility methods to make life easier, well, makes life easier. It makes for a cleaner API that is easier to use and more readable.

I'd prefer to not have to respecify the class each time as well, but the question then is where to store information about what class to use. Embedding it into the file like that is a possibility, but at the sprint we moved away from that precisely because it's more "metadata" mixed in with the payload. However, if we end up throwing the security markers into the JSON structure as well then we can probably get away with putting the class name in there, too.

Of course, that then becomes a problem if the configuration object you're referencing doesn't yet exist. How do you tell it what class it should be using? You'd need to specify a default... in which case you're still specifying the class there anyway so gain nothing. (Just like always specifying a default for variable_get().) The alternative would be to have a separate config_new() directive that creates an empty object that can specify the class, but then we have to consider what to do when you can't guarantee that a config object has been created yet. (Say, for a new node type.) If you always need to check if the config object exists, or need to always call config_init() or something, then we're right back where we started.

The other advantage of specifying it inline is that you could specify a different class than the original module author intended, if you have a reason to do so. Say it's a complex configuration object like a node type, and you're extending nodes with some complex piece of configuration. You want some utility methods to make recording that configuration saner. The node.module-provided config class doesn't have those, of course. What do you do? If you can specify your own class, then you can just spin up class FancyPantsNodeConfig extends NodeConfig {} and use your own version yourself. If the class name is controlled solely by the node module, however, you can't do that.

If we can figure out a good way to simplify that API call I'd love to, but we have to consider the factors above.

Yeah, I think Larry does a

Josh Benner's picture

Yeah, I think Larry does a good job talking through the options, here. There isn't a great way to avoid specifying the class inline without making too many assumptions or getting overly complex.

However, while considering this, I did wonder:

Why does the config API, as proposed, use a factory function rather than straight instantiation of a class?

As proposed:

<?php
$config
= config('core.site');
?>

Why not this?

<?php
$config
= new DrupalConfig('core.site');
?>

Then custom class usage would look like this...

<?php
class MyFancyConfig extends DrupalConfig {
  ...
}

$config = new MyFancyConfig('core.site');
?>

It's the same, but not chainable

Damien Tournoud's picture

It's exactly the same, except that functions and objects are chainable in PHP, but the new operator isn't.

ie. you can do:

<?php
config
('core.site', 'MyFancyConfig')->display_handler;
?>

but not:

<?php
new MyFancyConfig('core.site')->display_handler;
?>

Damien Tournoud

Good practice

Crell's picture

It's generally good practice to punt object creation to a factory. It allows you to do much more flexible things with the object while keeping the typical case DX fairly easy. It also allows you to change the instantiation logic easily without having to change every line that calls it. That's especially important for unit testing.

In the current implementation for instance, DrupalConfig has more than one constructor argument. The second is the level 2 (active store) object in use. That means if you instantiated the object directly you would need to know what active store object you were supposed to be using and inject that. That's also controlled by a factory of some sort, so you'd have to first call that factory, then instantiate your config object by name passing that in. And if we ever have to add a 3rd parameter, then every call to create a config object needs to change.

Now try creating a mock object for testing if you're creating the config object yourself with "new". You can't. Not without runkit, anyway. :-)

Personally I'd almost want to make the factory its own reusable object rather than just a function, but we'll see if that would actually buy us anything. :-)

To play off that idea, I

dmitrig01's picture

To play off that idea, I would argue that it's not exclusively metadata. For some example, I might want to change NodeConfig to FancyPantsNodeConfig everywhere, or for just one node type, etc. This goes back to the idea of embedding it into the data, as it would be much easier to do that way (otherwise, we'd have to introduce an alter and it's slow and just ugh).

I don't really understand how "the configuration object you're referencing doesn't yet exist" is a problem. What we're proposing is referencing a class name, not an object name, from the data. The workflow (presumably in config()) would be: 1. the data is loaded, 2. the proper class is found from the data ($class = !empty($data['_class'])? $data['_class'] : 'DrupalConfig'), 3. it's instantiated, 4. the data is put in the object ($config->setData($data)).

The "doesn't yet exist"

Josh Benner's picture

The "doesn't yet exist" problem applies when you are setting the config for the very first time. You're going to need an object on which to call ->save()... what class is that object?

Larry discussed the option of using create_new(), but as he points out, you'd need to first know you need to use create_new(), and be checking for the config... and while I think we could probably come up with a way to make that work, I'd fear we'd end up with does_this_config_exist() calls with branching logic all over the place, which doesn't seem very elegant.

Let's not introduce a reverse dependency

Damien Tournoud's picture

We discussed that during the sprint too, and one of the challenges is that it introduces a reverse dependency between the configuration system and everything else in Drupal: to be able to load a configuration object that specifies an inline class, you need its class to be available, which means having the module providing the class enabled, which means having the module system bootstrapped, etc.

Making the caller specify the class is a great way of removing this dependency.

Damien Tournoud

arcaneadam's picture

I think this convo needs a bit of a reset.

As to the security concerns:

  1. Complaining about the ability to write php files to a web directory is NOT A LEGITIMATE security concern.
    Any web accessible writeable directory could then become a security concern. I don't hear massive uproar over the public files directory being writeable because just making a directory writeable doesn't make it less secure, some one still has to gain access to that writeable directory. So even if we drop the config directory guess what your Drupal install still has a writeable public directory that a hacker could exploit (granted it has some extra protections). But if someone is able to bypass and write files without permission to either the public or proposed config directory, then you have bigger issues you should be looking at. Creating a writeable config directory is not the security issue here
  2. The real security issue here is that someone could simply go to http://example.com/sites/default/config/commerce.authorizenet.json and steal sensitive info without proper protection (i.e. a .php extension).
    This is why there is a need for the .php extension. In an ideal world everyone would have .htaccess configured correctly and we could simply do tell it not to serve files from directory(which I imagine we will anyways), but you can't hang you hat on that and then tell people, oops you should have been smarter. There are a lot of inexperienced people who, as Drupal becomes more friendly out of the box will, pick it up. You have to protect their interests if you want Drupal to grow. And believe me we all want Drupal to grow, it is good for us all on many level, not the least of which is making Drupal development an even more financially viable occupation.

As to other concerns:

  • Complaining about *.json.php breaking your IDE is completely fixable.
    All thats needed to fix that is some creative thinking. There have been many ideas already proposed. I have a few also that will follow this post.
  • This whole feature should be much more flexible. Rather then simply writing files to a config directory, lets give people options as to how they want to export their data. I think having a basic api for saving and retrieving is the starting point and then from there we expand into creating ways for people to get the config data I/O (a UI for import/export, using uploaded/downloaded files, textarea input, or files from the disk). I'm not going to get into the JSON, XML, PHP, YAML debate except to simply say, there is an option to create some sort of config_data_format handler class that could be extensible to allow for multiple types of data storage formats, but out of the box core only has JSON and it's up to others to create their own (this brings the security issue #2 back into play if a data_formatter outputs a format that ends up being web readable, but that why this was only a thought).

Anyways those are my thoughts on what the real issues with this whole thing are. Frankly I'm probably going to unsubscribe from this thread tonight cause it's getting ridiculously long. I'll continue to discuss in IRC though.

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

Thanks for the summary

webchick's picture

I did have a question, though. What's the use case for supporting multiple configuration file formats? The architecture originally had that capability (it's really easy to do), and that question was asked, and it was subsequently taken out as a result in order to simplify things a bit.

It seems like we'd very quickly get into a complete nightmare if some modules were shipping with mymodule.json, others with mymodule.xml, and still others with mymodule.yaml. I as a site builder would then need to learn three different types of syntax in order to edit these files myself, and Drupal would need to load three different file parsing libraries (at least one of which—YAML—has no native PHP capability and would need to be a custom user-space library) in order to get the page configuration. I'm trying to envision what awesome capabilities adding this capability would unleash that would make this enormous DX pain worth it, and I've come up short.

OTOH, there's tremendous benefit in all of contrib and core standardizing on one type, regardless of what it ends up being. But JSON seems to have all of the things we're looking for (native encode/decode functions that are really fast, it's an encoding format specifically for this purpose, it has compatibility with external configuration management tools), so I'm not seeing the benefit of allowing other types here, as well, other than flexibility for the sake of it.

I think you'd have to be

Josh Benner's picture

I think you'd have to be willing to say to module developers, "Your modules need to ship configuration in JSON if you want core to handle it natively." And if they insist on using YAML, their module will need to depend on some YAML module that either parses their config files directly and informs Drupal's config systems of the new settings it gets, or provides a YAML config loader for the config system. I don't think this is the main motivation behind people's desire for a pluggable config file layer, though.

As a module developer, I'm not overly concerned with what language or format I'd need to ship my module with. On the other hand, as a Drupal site developer/admin, being able to customize how the config file store is managed might be very nice. That config file store is how I'm going to:
1. deploy site updates when I develop a new section for the site,
2. track changes to site config so I can figure out what settings might be suspect for causing an issue,
3. KNOW when a client has changed a configuration I put in place,
4. probably some other stuffz

... being able to customize how this layer works is important to me. Specifically, while I understand why the proposal has the config files in the docroot and I'm on board, I'd like to put mine outside docroot. I understand why the proposal has hybrid PHP+JSON and signatures, and I think it's a good way to go... but I'd like to riff on that in some cases, and in others avoid it. Sure, if Drupal supported these use cases out of the box, that'd be great -- but I'd rather see the effort go into letting us extend rather than core trying to account for all these cases.

In the end, even if we have fully pluggable format engines for the config system, whatever config format is used by core is likely to emerge as the de facto standard in Drupal. Phptemplate is a good example of this. Yes, its merits and power contributed to its wide acceptance, but also... it was the path of least resistance, and of many examples. I think JSON will be the same thing, but the pluggable file layer will let site admins really control how their site works.

arcaneadam's picture

I'm simply suggesting allowing for converting between multiple formats in import/export API that an end user sees. All modules and core would use a standardized format for default settings, weather that is in a file, or gleaned through something like hook_config_defaults, the data format for modifying in code will always be one format.

Where I'm talking about allowing different formats (and I'm really only postulating the possibility here), is in the import/export functionality. As I understand the workflow here it goes a little something like this.

  1. Modules (or Core) define config defaults in their code
  2. Once enabled these defaults are copied to the DB
  3. Site admin changes these through the standard UI we are used to. (Something like system_settings_form)
  4. Dev needs to get these config changes and port them over easily to a different site/server
  5. This is where the different formats **could** come into play:
    • The current proposal is the files are automatically created and stored any time the cofig changes
    • My Proposal is that the process takes place through an API that can be access by a GUI, or something like drush, but either way
      REQUIRES a user to generate the output.
      This output can be:
      • a file written to the system as proposed.
      • A file downloaded to the browser(UI), or filepath(drush)
      • or simply displayed in a textarea(UI), or stdout(drush)
    • Dev is able to choose the format of the data based on available config_data_format handlers, but Core only provides JSON out of the box and the format that json takes is dependent up the output they chose above. If they choose to get the data as a download or textarea then it can be plain json (although keeping it signed might still be a good idea).
      Any other formats that people want, well thats what the community is for. Make it pluggable and community members will find enterprising, cleaver and inspired ways to extended it.
  6. The Dev takes that data from above and once again through a UI, API call, or drush command imports his config on another server/site
  7. If this process takes place though a UI then maybe he has a chance to run over it and check what will be altered/changed, giving him control & ultimately responsibility if he mucks something up.
  8. This process repeats from step 4 when config needs to be moved across sites and step 1 is executed on modules when they are installed similar to hook_schema

So that is where I mention pluggable formats. Only in the export/import arena. Not in modules and cores default definitions of the values.

I do reiterate though I am strongly opposed to writing this data to a file automatically without a user making that conscious decision.

Adam A. Gregory
Drupal Developer and Consultant
Hire me
http://AdamAGregory.com
Twitter Me
http://twitter.com/adamgregory

multiple conversations going on

catch's picture

There are several different conversations going on at once, threaded comments are not a good fit for discussions like this.

Most of the discussion has been about making the Level 1 store pluggable (at least allowing both pure JSON as well as the hybrid, potentially it could be used for other formats but I wouldn't see the point).

What Drupal core and modules ships with is not the same thing as the Level 1 store - those are defaults, not the configuration store itself - and they don't need any special protection beyond what .module and .info already have since default configuration is not sensitive - so could be plain .json files, valid, no signing, no <?php die;.

Making the defaults themselves pluggable/any format would indeed be flexibility for its own sake, but I don't see it being any worse than supporting multiple theme engines - how many modules ship with default smarty templates instead of .tpl.php? Either way it's a different argument.

Are we fixated on the wrong problem?

chx's picture

Despite coming up with an idea that would make the JSON valid JSON, people didn't stop harping on that. Well then.

  1. Make Drupal core write to the database only.
  2. Add an option so you can download a JSON export of your config.
  3. Add a button to load and delete the changed JSON config into the database.
  4. Add a drush command to export the JSON export of your config.
  5. Add a drush command to import the edited JSON.

The configuration does not need any sort of protection because it's only there when needed.

valthebald's picture

Maybe it's a little bit off-topic, but this discussion is an excellent illustration of why traditional forum is not always a good tool.
I'd very much like to follow discussion flow, but it turns harder and harder every day :(
Frankly, I'm lost

I do agree. There is just no

Ralt's picture

I do agree. There is just no way to know what's new now.

Hi, The only point of having

Ralt's picture

Hi,

The only point of having files instead of using the database is for version control and exporting/importing easily. The problem everyone gets is that using JSON only makes it unsecure, so people are doing some kind of mixing with JSON/PHP.

Well, here is my suggestion : why not using plain JSON with generated filenames?

Here is an example :

<?php
...
 
// Generating the file when needed
 
generate_config_file(uniqid());
...
function
generate_config_file($filename) {
 
// Create file : $filename . '.json';
...
?>

Security is then included, since the name is impossible to find. The filename could be stored in the database so that we could retrieve it later. It is also easy to implement a revision system (so that there is only one config file at once).

With this, version control will not be a problem, neither will exporting be.

Maybe I missed some obvious flaw, but this looks good to me, and allows us to use plain JSON.

Well, then it is better to

Fabianx's picture

Well, then it is better to make the directory unguessable instead ... (as proposed elsewhere in this thread ... )

I'd much prefer to need to change core.config.json instead of 3453453453453add56456456456.config.json ...

Best Wishes,

Fabian

Yes, directory or file, it

Ralt's picture

Yes, directory or file, it doesn't matter.

But why would you want to change it? This config file will not allow you to make any change to your configuration, it is only there for version control and eventually exporting it.

For reminder :-)

cmi-architecture

What is the use of exporting

Josh Benner's picture

What is the use of exporting it, if it cannot be used to make changes?

Of course you can edit the config file, then load the configs in the file into the active store.

Direct editing

Crell's picture

One of the secondary goals of moving config to disk is that some changes are legitimately easier to make by hand. Eg, if you just want to fix a spelling error in a View, for instance. Yes, that would break any digital signing that's going on, which is why a bad signature triggers an OMG warning rather than blocking you outright. If you know that the change you just made is legit, then you know that you can ignore the bad signature and re-sign the file. Then you're good to go.

Closing this thread

heyrocker's picture

It appears we've gotten everything we're going to out of this thread. As much as I'd like to leave it open and see it become the most commented post on g.d.o, I'm closing it and have created a new post with some followup topics at http://groups.drupal.org/node/157379.

Thanks to everyone for their input.

Deployment & Build Systems & Change Management

Group organizers

Group categories

Tags

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week