Hello everyone. When last we met we were discussing potential file formats for configuration management, and just as that argument was winding down I went on vacation for a month. Sorry about that. However I am back again and hopeful that we can wrap up this discussion and put the topic to rest. Before I dive into this I want to thank everyone for the feedback they have provided. There haa been an enormous amount of valuable information coming from the community, which is exactly what we have been looking for!
So since coming back, I have taken the time to review all the previous discussions, and with the help of catch I have run a large number of performance tests on the various file formats we have been discussing. These tests were based on the ones that catch did in http://drupal.org/node/1198924. I took his tests and ran them in addition to new ones from YAML and XML, so we could get apples:apples comparisons on a single dev environment. All these test files have been attached to this post, as well as XHPROF screenshots of the code that did the parsing. Additionally, while I was on vacation, I met up with webchick and David Strauss at the Community Leadership Summit in Portland. We had a long and productive talk about the file formats which informed this discussion quite a bit.
So I would like to start by eliminating formats that are just non-starters, and evaluating what we have left.
JSON
I was initially a strong proponent of JSON but I have to admit now that its just not going to work. There are a couple reasons for this. First is that you can't put comments in it, and this really won't work for files that we expect devs and sysadmins to be able to hack around on at will. Additionally, json_encode() does not properly encode UTF8 characters, which is a huge bummer. The following script
$foo = array(
'föö' => 'bår',
);
print json_encode($foo);prints this
{"f\u00f6\u00f6":"b\u00e5r"}That is anything but human readable much less human editable, and combined with the problem of no comments, I think it takes JSON out of the running, which sucks because it was easily the most performant option. Insert old comment about the tradeoffs of performance and functionality here or something.
INI
Despite a passionate response from pounard, it appears that INI has absolutely no community traction, and actually there is a lot of discussion of moving the existing .info files into whatever format we agree on here. So I am going to knock it out of the running as well.
YAML
There are two main strikes against YAML. One is that while it is very human-readable, it is much more troublesome to write. The spec is very detailed and the format is very fragile and easy to break. The other strike is performance, since we would have to use a PHP library to parse it, rather than formats like JSON which can take advantage of the speed of being baked into PHP itself. To show just how bad this is, I took the YAML parser from Symfony and ran the same tests as above on it. Here are the results,

The YAML test took more than 10 times longer to run than any of the other tests. While its true that this would only happen when the level 2 cache gets reset, this is still unacceptably slow. These files are not even .5K, it will be so much worse on larger files. If you install the PECL YAML parser it actually becomes almost as fast as JSON, but I think we have to assume the worst possible baseline.
So we are left with PHP and XML. The big big downside to PHP is that when these structures are loaded, there is no way to kill them, which leads to horrible memory bloat. This is a really big problem in Drupal right now, and I know a lot of people see the memory footprint of Drupal as one of its biggest issues to be addressed at the moment.
So on to our good friend XML. Honestly, the main downside to XML seems to be that whenever people hear about it, they say 'Ewwwwwwww, not XML!' and frankly I was in that camp too. However there are a lot of compelling reasons to consider XML. It is easily the most interoperable format available, pretty much any external tool will be able to deal with it. There are a ton of tools on all major platforms for editing XML. The major IDEs will deal with it nicely. While it is not as performant as say JSON, SimpleXML is native to PHP and we just have to write a routine to bring that back and forth from PHP arrays. Here are some results using a parsing script that Rasmus Lerdorf wrote. We might be able to optimize this but I figured it was a good thing to test as he knows something or other about PHP.

So while its the slowest of the native formats, its hella faster than a pure user space implementation like YAML. In the last thread, mike503 suggested simply passing a SimpleXML object to json_decode(json_encode($xml)), which is a great idea and very performant. However, once you do that you run into the encoding issues mentioned above, so that's not going to work unfortunately.
I believe that if we keep our schemas simple we can have a really usable XML file format that meets all the demands we have as defined in the goals laid out in the last post. Additionally, I think this can offer some really nice options for managing multi-lingual data. For instance
<config>
<!--Site information in English-->
<site_info lang="en">
<site_name>I love Drupal!</site_name>
</site_info>
<!--Site information in Swedish-->
<site_info lang="se">
<site_name>Jag älskar Drupal!</site_name>
</site_info>
</config>So given all this, I'm pretty well sold on XML, not necessarily because I love it, but because when compared to all the other options as well as against our original goals, it comes the closest to meeting all our needs (IE it sucks the least.)
This is not a decision, we didn't have an enormous amount of discussion about XML in the last two threads. So I'm looking for a final round of commentary before we figure out what to do. I'd like to have this in by Monday, August 14 so that we can get a decision done prior to DrupalCon and maybe even have another productive code sprint following. Please keep in mind the goals we've laid out, they should be your guiding facts in this.
Thanks!
| Attachment | Size |
|---|---|
| eval_no_apc.jpg | 88.82 KB |
| json_decode_no_apc.jpg | 88.04 KB |
| pecl_yaml_no_apc.jpg | 83.44 KB |
| sf_yaml_no_apc.jpg | 87.49 KB |
| xml_simplexml_no_apc.jpg | 84.93 KB |
| eval_pecl_yaml.php_.txt | 492 bytes |
| eval_sf_yaml.php_.txt | 601 bytes |
| eval_xml.php_.txt | 1.17 KB |
| eval.php_.txt | 1.2 KB |
| generate_xml.php_.txt | 906 bytes |
| generate_yaml.php_.txt | 481 bytes |
| generate.php_.txt | 384 bytes |
| json_encode.php_.txt | 199 bytes |

Comments
XML
If we end up with XML, it's not a bad decision. XML is widely understood, easy to use as long as you have a simple and straight-forward schema (we can handle HTML, we can handle a small XML format), and it's extensible.
It's also possible to layer and merge XML files if you design them properly. XUL, the UI language behind Firefox and Thunderbird, does exactly that, which is how their plugins work. That opens up interesting options for us later in terms of configuration management, optimization, and layering.
+1
I agree
Overlays can definitely be done effectively. Nesting via XInclude means some rather complex (probably currently unanticipated) situations can be resolved. XSL(T) gives us some capabilities for transforming our XML into other formats (or, conversely, transforming other formats into our XML). And since XML comes with validation support (in three flavors... geck), we can also validate our documents independently.
There are many positives to using XML. And PHP's support for XML (esp. via libxml) is actually pretty decent.
I personally HATE SimpleXML, and have a long history of giving presentations about that... but even so, I think it's a decent enough solution. Just get used to making lots and lots of null/empty checks.
Blog: http://technosophos.com
QueryPath: http://querypath.org
I'll say it....
I'll be the bonehead to say it.... if we go the XML route we should consider the usage of QueryPath (http://querypath.org/). It makes working with XML so much easier for the masses of just enough php and jquery to be dangerous but still need to get stuff done crowd.
Yes, querypath is pretty neat
Yes, querypath is pretty neat to manipulate HTML (which is structurally like XML), and it could be useful to make code easier to write, it's like jQuery for PHP. And the good thing : we already have a module for that.
This discussion is about the
This discussion is about the format for the default files that modules ship with, and the files that can be used to move snapshots of configuration between environments (more or less export and import).
The current design (after a lot of discussion, not sure if that is fully documented in updated form) is that the level 1 (i.e. file) store is option. If you rm -rf your directory, Drupal will run perfectly OK because it only cares about what is in the level 2 store. And the level 2 store is format agnostic.
This means the files should be used for these things, and nothing else:
(if I got any of these wrong or missed a bit, please correct me).
The actual configuration API that you use to find out what the configuration is (i.e. replacing variable_get() and other $some_config_get() functions), has to be storage agnostic, because level 2 is storage agnostic.
So it's possible that QueryPath might give us a quicker parser than SimpleXML, but once we have the input/output from XML to PHP arrays and back sorted, then that is not going to be exposed to anyone else - you'll be dealing with method calls on a configuration object, not a format-specific API.
I'm in almost agreement with
I'm in almost agreement with you with one exception. I'm not sure QueryPath is going to perform faster than straight SimpleXML. SimpleXML is C code while QueryPath is user space code. When I suggested QueryPath I was thinking it would make writing, maintaining, and working in the layer that converts XML to and from the config objects we use as a replacement for variable_get()/set() easier.
XML merging pains?
Maybe there isn't a simple answer, but isn't XML difficult to merge? Wouldn't someone need to understand XSLT in order to do it properly? Correct me if I'm wrong, but my understanding was that JSON merges tend to be much simpler.
I'm approaching this from the point of view of using Opscode Chef for config management, but depending on how the config files are laid out, it seems merging could be pretty critical in using established config management tools to their full extent.
I too am quite happy to see
I too am quite happy to see XML as the front runner here. I'm not sure I'd use XUL as my goto source for awesome, but still ;-) +1 from me
Json
I love json. But the lack of comments would definitely be an issue when looking at other people's configurations. XML, although harder to read, is fine with me.
So what was wrong with ini
So what was wrong with ini files? I really don't think of XML when I think human readable and editable.
Are you talking about PHP ini
Are you talking about PHP ini files or our info files? Our info format should be a no go because of interoperability with other systems. PHP ini files only have 2 levels of nesting and I can come up with cases where you need to go deeper. To do that we need a custom system on top of ini files (like Zend provides) and that's really slow user space code. Also, another Drupalism (even if we share it with something like Zend). Again we have the interoperability with other tools issue.
That makes perfect sense,
That makes perfect sense, thanks.
I'm not really convinced that
I'm not really convinced that comments in json are a problem. Why can't we just do this?:
{"config": {
"site_info": [
{
"_comment":"Site information in English",
"lang":"en",
"site_name":"I love Drupal!"
},
{
"_comment":"Site information in Swedish",
"lang":"se",
"site_name":"Jag \u00e4lskar Drupal!"
}
]
}
}
That to me is a whole lot more readable than the XML example. Also, after htmlentities, the xml would look like this:
<config><!--Site information in English-->
<site_info lang="en">
<site_name>I love Drupal!</site_name>
</site_info>
<!--Site information in Swedish-->
<site_info lang="se">
<site_name>Jag älskar Drupal!</site_name>
</site_info>
</config>
That looks no more readable than what json_encode does.
Entities not required
Setting the encoding prevents SimpleXML from using entities. For example, this outputs exactly what you put in, no entities:
$a = '<?xml version="1.0" encoding="UTF-8"?><config><!--Site information in English-->
<site_info lang="en">
<site_name>I love Drupal!</site_name>
</site_info>
<!--Site information in Swedish-->
<site_info lang="se">
<site_name>Jag älskar Drupal!</site_name>
</site_info>
</config>';
$xml = new SimpleXMLElement($a);
$b = $xml->asXML();
There's a second parameter to json_encode() that looks like it could maybe create similarly readable Unicode in JSON, but I haven't been able to get it to work so far.
Edit: nope, json_encode() can't do that yet, though there's an open PHP bug.
Entities
In fact, we should be even more clear. Entities like the ones in the previous example aren't part of XML at all. Using HTML entities in XML will cause parser warnings and errors.
In contrast, UTF-8, per the XML 1.0 spec, is the default encoding. Only quotes, ampersands, and angle brackets are turned into entities in XML 1.0. While you can use numeric entities to encode characters, you have to force this to happen.
Blog: http://technosophos.com
QueryPath: http://querypath.org
JSON and _comment
I was just about to post a similar comment. I found this post on StackOverflow which reinforces the notion of using
_comment.Like the person there emphasizes, comments in XML are just a convention, and are still accessible via the DOM, so Drupal using JSON and
_commentdoesn't seem to be any different.i18n
While we could come up with a convention for comments like
_commentas a key that doesn't solve out internationalization issue. We need to be able to handle characters like ä, ジ, and even ☃ elegantly. With JSON all these characters need to be escaped in the format \uXXXX.I won't get into the history of how we got here with JSON. To any extent, this does not play nice for those non-English language Drupal users.
Comments is a much easier issue to solve than the i18n one.
Escaping is not required in JSON
Escaping is required by json_encode(), and that may be enough to make JSON unusable, but escaping Unicode is not required at all in the JSON format itself. It's optional. A custom JSON encoder could create valid unescaped JSON. From the JSON RFC:
lol. Sorry, you lost me with
lol. Sorry, you lost me with 4 little words...
The very idea that we would write our own JSON encoder/decoder... I'm not sure what to say about that besides writing our own JSON encoder/decoder is a terrible idea.
I don't see anything wrong
I don't see anything wrong with a custom drupal encoder/decoder. Its already done with other things why not json? Even if they end up only being wrappers with a little extra parsing for php's json_encode/decode.
drupal_json_encode
drupal_json_decode
NIH
That's something we need to do less of, not more of. PHP is a lot more advanced, as a language and as a community, than it was in 2001. "We have to write our own parser/encoder" is a strike against any format where that's the case. (Arguably it's a strike against both JSON and XML equally, since we'd need to define an XML format, but SimpleXML handles the underlying plumbing.)
Example UTF-8 friendly JSON
As an example, this creates UTF-8 friendly JSON that can be read by json_decode():
function better_json_encode($object) {
$json = json_encode($object);
$entities = preg_replace("/\\u(....)/", "&#$1;", $json);
return mb_convert_encoding($entities, 'UTF-8', 'HTML-ENTITIES');
}
$foo = array('föö' => 'bår');
$a = better_json_encode($foo);
print($a);
$b = json_decode($a);
Fixed version
This version works, at least under PHP 5.3.6:
https://gist.github.com/0cf6a31c4a560007eb00
<?php
function better_json_encode($object) {
$json = json_encode($object);
//print $json;
$entities = preg_replace("/\\u([a-fA-F0-9]{4})/", "&#x$1;", $json);
//print $entities;
return mb_convert_encoding($entities, 'utf-8', 'HTML-ENTITIES');
}
$foo = array('föö' => 'bår');
$a = better_json_encode($foo);
print $a . PHP_EOL;
print_r(json_decode($a));
?>
Blog: http://technosophos.com
QueryPath: http://querypath.org
Formatting breaks it
For anyone trying this out, the comment formatting breaks it, so this has the same problem. To get it working, you'll need to either grab it from Github, or double the backslashes in the preg_replace().
As i care a lot about i18n, I
As i care a lot about i18n, I tried it and doubled the backslashes. Here is my code (just more prints so we see the results), and then the results.
<?php
function better_json_encode($object) {
$json = json_encode($object);
print "json : " . $json;
$entities = preg_replace("/\\u([a-fA-F0-9]{4})/", "&#x$1;", $json);
print "entities : " . $entities;
return mb_convert_encoding($entities, 'utf-8', 'HTML-ENTITIES');
}
$foo = array('föö' => 'bår');
print_r($foo);
$a = better_json_encode($foo);
print $a . PHP_EOL;
print_r(json_decode($a));
?>
Results :
Array ( [föö] => bÃ¥r ) json : {"f\u00f6\u00f6":"b\u00e5r"}entities : {"föö":"bår"}{"föö":"bÃ¥r"} stdClass Object ( [föö] => bÃ¥r )So there's stiil a problem. I'm not very good at encoding, though. I'm using PHP 5.3.6. Is it a major block about i18n ?
But what if someone goes and
But what if someone goes and pastes this into jslint or similar and the whole thing blows up. If it is JSON it needs to be JSON.
Full Fat Things ( http://fullfatthings.com ), my Drupal consultancy that makes sites fast.
feel free to dump
feel free to dump {"föö":"bår"} into jslint. They're arguing this is valid JSON.
http://jslint.com/
I don't know what Djebbz encoding issues are but their just muddying the water.
You're right, there was no
You're right, there was no problem, forget about my comment.See my reply a few comments below...Just a quick note, json is
Just a quick note, json is not a true subset of js. Two bits come to my mind... first with some unicode there are problems (http://timelessrepo.com/json-isnt-a-javascript-subset) and then with comments (they are valid in JS but not in JSON). A JSON lint tool would be a better for checking for valid JSON as jslint will provide some false positives.
See http://jsonlint.com/. I
See http://jsonlint.com/.
I tested with
{"fööジ": "bår☃",
"_comments": "My useful description here"
}
it's valid.
While I'm not entirely sure
While I'm not entirely sure what your setup is, I tested the snippet from mbutcher (above) with a tweak to add a couple more complicated characters (like a snowman) and had no problems. My output looked like:
{"fööジ":"bår☃"}stdClass Object
(
[fööジ] => bår☃
)
I wonder if the problem was your setup.
So it looks like something
So it looks like something effectively went wrong with my test. It could be my setup : does it mean one needs something specific in my php.ini ? Or is it from my code, which is really the same than mbutcher with a few "print" and "print_r" added, and the doubled backslashes, nothing more. I don't get the difference and the problem...
Did you grab the raw code
Did you grab the raw code from mbutcher? See https://raw.github.com/gist/0cf6a31c4a560007eb00/31fcda40ff399c1eeb3e176...
Yes I had, but not from the
Yes I had, but not from the url you gave. I've just tried, here is the result :
{"föö":"bår"}stdClass Object
(
[föö] => bår
)
My setup defintely looks like a problem, but it may mean we have some "hidden" requirements regarding the system configuration. Or maybe I once changed something in the php.ini that I never should have... But still, I'm reading http://php.net/manual/en/ini.core.php and don't find anything specific regarding json or this encoding issue...
Other note (maybe useless), in http://jsonlint.com,
{"föö":"bår"}is valid.stack problems
Not really what I meant. You seem to have unicode problems in your terminal or something. If you trace through you'll see the wrong characters are the original and the utf8 encoded values. Every other point is correct so the code is actually functioning just your seeing it wrong.
Sidenote: I suppport hejrockers XML choice I'm just trying to make sure the discusion about JSON here isn't side tracked by FUD or unrelated confusion because I think it is a great format for a lot of things.
Agreed
Agreed. I also support XML. At the same time, I worry misinformation about JSON here will come back to bite us in the future.
Also, we're probably going to
Also, we're probably going to want pretty-printed json anyhow, so why do we even need to use json_encode? I'm not seeing writing out these files as performance critical, so we can always use a php encoder that doesn't have the UTF-8 issues mentioned here.
It was my impression we were
It was my impression we were always going to pretty print JSON anyway with a custom writer (or one found elsewhere that does nice pretty printing), so I agree that doesn't feel like a showstopper at all. Writing out might be a bit slow (no worse than XML likely) but that's only going to be happening when exporting config to disk or installing modules, and it doesn't need to block anything else at all.
And if XML comments are in the XML structure, that doesn't feel any different to putting them in the JSON structure - since we'll be making our own schema either way.
There was a discussion about
There was a discussion about embedding comments into the JSON, I can't remember if it was here or on IRC, and the consensus was that once you do that you are essentially creating your own bastardized version of JSON, and then it is .info files all over again. There was immense support for finding a standard and using it as is. We may be making our own schema for XML, but we're not adding to the standard or doing anything that people who know how to read XML wouldn't expect.
I do worry about the speed of the export operation. No, its not critical path, but we will at that point be writing all the config files out to disk, and I'd prefer it didn't take forever.
I don't see how it's
I don't see how it's bastardized. It would be valid json, that can be read with
json_encode()json_decod(). That's completely different to .info files where a completely custom format and parser was added.For both JSON and XML, the comments are treated as part of the data structure, so I don't see the difference at all. If we replace the word 'comment' with 'metadata' then there's nothing custom about this, JSON can be used to represent metadata.
Embedding comments in JSON outside the data structure is a different discussion, but I don't think anyone has seriously suggested why we would want to do this (or why it would be any different to trying to add comments outside the XML structure).
For both JSON and XML, the
The thing is that all XML parsers and writers know what a comment is, they know that this thing they reading is a comment and not just just any other piece of data. When iterating over the data we're not going to have to say: oh this key is actually a comment, silently ignore that one.
There's nothing particularly wrong with establishing convention upon json, but in XML it's actually baked into the language.
JSON and UTF-8
JSON doesn't plan nice with UTF-8. Per the spec, you need to encode a lot in a \uXXXX format. That means all the international characters used by those who don't use english. Since Drupal is really into i18n (enough so to have an initiative after i18n was a big enough issue to name a core maintainer who was big into i18n) we have to really consider this aspect.
Comments are the low hanging fruit to debate. i18n is the more difficult issue to solve and it's a place JSON becomes a burden.
I say this as someone who was a proponent of JSON and really wanted that. When you look at the full picture of needs doesn't look like the tool for the job.
I don't remember the
I don't remember the requirement to be able to edit these files manually. If we don't care about editing by hand then JSON isn't out.
Full Fat Things ( http://fullfatthings.com ), my Drupal consultancy that makes sites fast.
Human readability/editablity
Human readability/editablity has been an enormous part of the discussion since the very beginning. Please read the past threads on this topic for more background
http://groups.drupal.org/node/157379
http://groups.drupal.org/node/159044
The reality of these files is
The reality of these files is that they are going to be read and occasionally written by hand. The readability, as heyrocker points out, has been part of the conversation for some time. That's why you see discussions about comments and Pretty Print.... those are for readability.
What do we GAIN from JSON?
The pro-JSON argument so far seems to be roughly that "we could make it work, all we need to do is...".
Examples:
json_decode()UTF-8 issuejson_encode().Sure, we could do all of this and more. But why? What about JSON makes it so great that we'd want to add all of this? What does JSON bring to the table that other candidates (specifically, in this case, XML) don't?
Let me remind you of some of the things we LOSE with JSON over XML:
In the present thread, the argument for JSON seems to boil down to "if we work hard enough, it'll get by." I'd much rather hear an argument that says "these are JSON's killer features, and here's
whatwhy it must win out, even if we have to make some adjustments."Blog: http://technosophos.com
QueryPath: http://querypath.org
Though I see XSL(or its
Though I see XSL(or its general abuse) as a point against XML, great wright up.
It still true JSON is a great format. I think what's happened is we've pretty well proven it does not meet the needs of our configuration format. Communication format for Javascript and general web driven API's? Still a great option. Our standard file based configuration format? Not a good fit.
Thanks hejyrocker for your hard work in coming to this conclusion and putting forth what's obviously a difficult recommendation to make with all the strong feelings surrounding it.
We could write our own parser
No one has suggested this. To my knowledge, there is no issue with json_decode(). It parses escaped and non-escaped utf-8 characters juts fine in my testing.
I don't see how we don't have this with json.
I don't necessarily view these as advantages at all and I'm not sure we can argue that they are until we have a use case that justifies them.
Why is this an advantage? json maps directly to native php data structures, so queryability seems like added complexity.
THe one big advantage of json that seems to be undervalued here is it's simplicity. If you look at the example I pasted above, the json is example is much more readable. The fact that many developers have a natural aversion to xml is not something that should be discounted. It's not fun to work with, it's slower than json, and way more complex than seems to be warranted here. Some constraints wouldn't kill us.
XML Aversion
has always amazed and flabbergasted me. We're web developers. We've been writing XHTML for a decade, which is XML, and no one has had an issue.
Frankly I suspect much of the XML-fear is FUD. Sure, you can make XML formats that are completely unreadable by sane humans. WSDL is probably the worst offender here, but SOAP is rather complicated as well. CMIS I can't read, myself. But you can also produce XHTML, DocBook, or SVG with XML, all of which are extremely readable as long as you use sane whitespacing (which is a requirement of any format to be readable). I've handed DocBook templates and custom XML config formats to web-newbies before and they've not had any major issues.
I'm fairly confident that any format we come up with would be closer to XHTML, Docbook, or SVG than to WSDL. :-)
I don't have any particular
I don't have any particular fear about the format. But it concerns me that as soon as XML came back into the running people popped up with "oooh we could use this, THIS, this and This, ooh and there's this over here", when all we need is something nice and simple.
I hear what you're saying and
I hear what you're saying and can only offer this in return. A lot of us are still fuzzy on what exactly we will and won't be able to do with this. The fact that it's XML just means that we might have options (outside of what core drupal does) that haven't been considered yet, not that core drupal should address the XML config in these various ways. At least that's how I continue to think about it.
Eclipse
Just to be clear: We could
Just to be clear:
While I didn't test, I believe these are only issues with json_encode(), previous examples in other discussions suggested a few lines of code to pretty print JSON.
So one point in JSON's favour is you can just do
<?php$foo = json_decode(file_get_contents('file.js'));
?>
and that is all that's needed to convert from the native format straight to PHP structures.
I'm not sure how much these are real wins, when all we will be doing is writing PHP structures out to XML, then reading XML back to PHP structures.
Just because XML has some features doesn't mean we'll actually need to use any of them. How do these fit into the original requirements for a format outlined in these discussions?
Personally I don't really care that much what we use (as long as the 'level 2'/active store is completely format-agnostic and the level 1 store remains optional and rarely written to or read from), but I do care that we make the decision for the right reasons.
XML or the triumph of verbosity
Subtitled: please close the bloody tag.
XML is a very bad idea. It was a bad idea when it was conceived. It's extremely verbose and makes available a lot of stuff that we don't need. It comes with a lot of baggage.
This is a throwback. Everywhere around the web people are dumping XML in favour of JSON, but perhaps because drupal is enterprise ready we must follow the Joneses and use a legacy format like XML.
Just out of the top of my mind I recall that php-fpm moved from a XML based configuration to an ini based one,
Food for thought:
JSON: The Fat-Free Alternative to XML.
XML: The Angle Bracket Tax.
Yes we're entering the treacherous terrain of holy wars, but we had it coming all along anyway. Furthermore the web thrives on wars. Remember the Browser wars?
Holy wars
Holy wars only happen when people start them. Please don't. That helps no one, and we already had 2 over this issue. We do not need a 3rd.
Greg has laid out very specific technical problems where JSON (the original recommendation from the Sprint) does not work. The next best option is XML, at the cost of, yes, being more verbose than JSON (although by the same token, more self-documenting). Please focus on the specific technical issues at hand and not flame bait "holy wars". Religious Wars do not benefit Drupal in any way.
I'm not at all into entering
I'm not at all into entering war but I think perusio you're making a good point. @Crell since the problem of json_encode has been sorted out as catch said in comment http://groups.drupal.org/node/167584#comment-559239, and that all the bagage that XML comes with is something we won't necessarily need (see the same comment from catch, lot of unused XML features), we can't dismiss JSON.
The other point as perusio said is not technical, it's about following the wave and staying top in technologies. I remember that we chose git mainly because from all decentralized VCS out there, it was the most commonly used by the other "devs" and "web guys", and also because some people from the Drupal community were already using it in other projects. I think we're in the same case here. We could use XML, but using JSON means staying in the head wagon. If other people are using JSON for configuration files more and more, that's something to take into account.
BTW, please read the links of perusio's comment above, very interesting arguments about JSON vs. XML. All may not be valid in our case, but it's still some great food for thought.
If I may sum up the technical issues we're having with JSON :
/* */-style comments) : using a key like "_comment" or "description" is perfectly valid and make sense, we just have to take this specific key into account in our parser See this comment and read the link to StackOverflow.So after reading all our points and discussion, I think that the advantages of JSON (speed, size) and the disadvantages that can all be worked around easily (unless I'm completely wrong) makes JSON a very good option for our config file. I hope I'm summing this up well enough so we can make an educated choice.
"although by the same token,
"although by the same token, more self-documenting"
how so. More words doesn't me it is more self documenting. I would say it is a wash in this one point.
No encoding issues
I like how noone checked what json_decode(json_encode()) does to the characters: nothing. Why would it? json_encode takes a UTF-8 string and emits proper escaped Unicode JS (\uXXXX for Unicode code point XXXX on the BMP) and json_decode properly decodes thoes to UTF-8 encoding. In fact, this is one of the easiest ways to get codepoints out of a UTF-8 string in PHP!
So I have done what heyrocker suggested http://paste.pocoo.org/show/456118 and it works.
Neat. Am I wrong in noticing
Neat. Am I wrong in noticing that it does something sort of odd with the comment. Seems it stored that there was one but lost the content of it?
Less workarounds and less Drupalism
Going with
_commentseems like a very bad idea to me. It would create another Drupalism and a developer WTF. Yes, XML comments are part of the DOM, but at least XML is the most know configuration and markup language out there. People know how to write XML and HTML comments.I'm all for XML, since it requires us to do less workarounds and less Drupalism. That's a go for me.
Let's keep in mind however,
Let's keep in mind however, this is a possible Drupalism for a feature that we're not even sure how we'll utilize yet.
Summary
So far as I can tell from this discussion, the only real solid thing to be said is that we can get around the encoding issues by writing our own encoder (json_encode() plus a regex.) Everything else seems to be subjective and a matter of personal choice. For instance, some think it will be easier to use the "_comment" convention in JSON, others think this would be terrible. Some think XML is easier to read, some think JSON is. I won't deny that JSON is a less complex format, but I don't believe that this provides any significant benefit in and of itself. From a usability perspective, it is completely subjective whether or not one format is easier to read and edit than the other. The fact that one has less characters than the other is irrelevant to whether it is a better choice. From a performance perspective, this:
$xmlstr = file_get_contents($file);$xml = new SimpleXMLElement($xmlstr);
$json = json_encode($xml);
$arr = json_decode($json);
is obviously slower than just doing json_decode(), but it is faster than every other method tested and contrary to my assumption above works very well.
So again, I don't see what JSON objectively buys us over anything else. It is at best a lateral move from what I can see and given that we seem to have a large number of people behind the XML choice I'm going to want something more than that. If we design the API to be simple and straightforward, then we can switch to JSON later if it turns out everyone in the world actually hates the XML files.
Assuming the API is
Assuming the API is pluggable/hooked to some extent (even if thats "pluggable" in the same way cache is currently), could we simply choose a sensible default (e.g. XML), then let people override that behaviour if they wish.
My assumption here is that these files are generally always created by Drupal, rarely edited by hand (its a nice idea, but most people wont touch them), and rarely accessed (config cache flush or write operations only).
As long as the API is sufficiently abstracted from its basic XML handler, then this could add some hitherto unexpected benefits, such as the ability to access shared configuration backends or config management tools. Anyone hosting large numbers of Drupal sites (Acquia, Pantheon, et al) or using build management or hosting tools (Aegir) in non-standard configurations could find this immensely useful.
My understanding of the where we are going with interactions between CMI, pluggable objects and the core bootstrap in D8 is still limited, so perhaps this is not possible, but would it make the decision easier if it was?
Different capabilities
No, it cannot be pluggable. The primary reason is that different format support different structures. PHP has primitives, objects, and hashes (which is mislabels arrays). JSON has primitives, objects-which-are-kind-of-hashes, and true-arrays. XML has none of the above per se, but can be configured to represent any of the above.
If we go with JSON, one implication of that is we can support only one of objects and PHP arrays because in JSON they're the same thing. That has a very direct impact on the API, and if we go that route then we're saying up front that a config object's properties, if they are complex, can only ever be associative arrays.
If we go with XML, then we are able to support arrays, hashes, and objects separately if we so choose. If you then try to swap in JSON instead, you will lose data and/or break the API. That's ungood.
Plus, modules will have to ship with default configuration files that get imported by the system on install. Those must be in a consistent format or else we'll have to bake in support for all kinds of formats, at the cost of performance, complexity, and people editing the files needing to know all of them. (Imagine if Views shippped with JSON config files, Panels with XML config files, Rules with YAML config files, and pathauto with info config files. That would be an utter disaster.)
This needs to be a singular decision that we just make and live with. And at this point, that a decision gets made is more important, frankly, than the decision that is made. I am willing to defer to Greg's judgment as the CMI lead in this matter. If he feels XML is the best option available, then I'm fine with going that route (even though I too originally favored JSON).
Thanks for clearing that up.
Thanks for clearing that up. I had forgotten about the "ship with" problem and I agree, we don't want anything that is ungood to happen.
JSON/YAML/XML Sample
I didn't see anyone post the sample code in each syntax, so here you go...
JSON
{"config": {
"site_info": [
{
"_comment":"Site information in English",
"lang":"en",
"site_name":"I love Drupal!"
},
{
"_comment":"Site information in Swedish",
"lang":"se",
"site_name":"Jag älskar Drupal"
}
]
}
}
YAML
config:site_info:
# Site information in English
- lang: en
site_name: "I love Drupal!"
# Site information in Swedish
- lang: se
site_name: "Jag älskar Drupal"
XML
<config><!--Site information in English-->
<site_info lang="en">
<site_name>I love Drupal!</site_name>
</site_info>
<!--Site information in Swedish-->
<site_info lang="se">
<site_name>Jag älskar Drupal!</site_name>
</site_info>
</config>
When looking at these, I'm all in for either YAML or JSON, but lean closer towards YAML. XML is just an absolute mess for anyone who doesn't code. The Online YAML Parser was handy when writing this.
Seriously?
Anyone who doesn't code? The Apache configuration file is more complex than that. Most Drupal templates are more complex than that. That's on the order of an HTML hello-world example.
The Apache configuration file
Is that supposed to be something we should emulate? Just because Apache config is such a mess we have also to adopt a XML for configuration files? I don't think so.
No
No, I mean "sysadmins who might be editing these files already have to deal with files 10x more complex than anything we're proposing".
Similarly, anyone who's building a site and would be editing this file is already used to HTML, or godforbid Drupal template files, both of which are an order of magnitude more complex than anything we're proposing.
My point being, "readability" of the format is a non-issue for XML, understanding that the tiny sliver of the population that is even going to care enough to touch the files in the first place is already used to it. They're either already quite comfortable with XHTML (which is XML already), or they're already quite comfortable with system config files that are far more complex and less forgiving of whitespace differences. The "XML is too hard to read" line is, quite simply, FUD. Unintentional FUD, maybe, but still FUD.
The people most affected by the file format decision will be module developers when writing their default config file, most of which will be fairly simple. Not to put too fine a point on it, but if a module developer can't handle writing this:
<config module="mymodule"><var name="foo">bar</var>
<var name="baz">beer</var>
</config>
Then they'll never be able to handle writing a template file, or godforbid the actual PHP code of a module.
Really, let's assume some level of competency on the part of our developers.
Truth and consequences
JSON is in fact more readable. Why?
Because
code = dataand there's never any confusion between presentation and logic, like in XML, which is formatthat confounds presentation with logic. Which tag is merely presentational and which tag is merely logic and which tag has a little of both?
For developers it's clear that what they're seeing is code and that code can be put to good use without any intermediate step or at most with
only one step:
json_decode.The data-code identity opens up opportunities for plugging external services with the configuration quite easily.
Now for your lesser of two evils rationale that since we're accustomed to the mess that is HTML, and which is responsible for the many security problems — how many functions on the drupal API are dedicated to input sanitation? — we should bite the bullet and proceed. HTML is a format with a lot of issues. The confusion between presentation and logic inherited from SGML, and that found its way to XML, is the source of much pain and horror for web developers. Therefore it's not FUD but rather the desire to get away from a source of pain as much as possible.
As for templates. Yes indeed they're a mess. That's why things like Panels that abstracts that mess are things to be devoutly wished and used. But that's OT. There we're stuck with HTML. That's what the browser parses. There's no way around it. But here we're on our own terrain. Why should we accept the XML mess when we're not constrained externally to do so?
Finally there's the Zeitgeist issue. The current trend is for an increasing use of JSON, relegating XML for the crufty (albeit highly lucrative) enterprise market.
Real world examples
Here's a more likely real-world example, it's from a couple of field definitions from a feature in Drupal 7:
JSON (just using json_encode):
{"departure-departure-field_departure_additional_info": {
"field_config": {
"active": "1",
"cardinality": "1",
"deleted": "0",
"entity_types": [],
"field_name": "field_departure_additional_info",
"foreign keys": {
"format": {
"columns": {
"format": "format"
},
"table": "filter_format"
}
},
"global_block_settings": "1",
"indexes": {
"format": ["format"]
},
"module": "text",
"settings": {
"max_length": "255"
},
"translatable": "0",
"type": "text"
},
"field_instance": {
"bundle": "departure",
"default_value": null,
"deleted": "0",
"description": "Enter any additional information that applies to this departure.",
"display": {
"default": {
"label": "above",
"module": "text",
"settings": [],
"type": "text_default",
"weight": "24"
}
},
"entity_type": "departure",
"field_name": "field_departure_additional_info",
"label": "Additional information",
"required": 0,
"settings": {
"custom_add_another": "",
"text_processing": "1",
"user_register_form": false
},
"widget": {
"active": 1,
"module": "text",
"settings": {
"size": "60"
},
"type": "text_textfield",
"weight": "13"
}
}
},
"departure-departure-field_departure_availability": {
"field_config": {
"active": "1",
"cardinality": "1",
"deleted": "0",
"entity_types": [],
"field_name": "field_departure_availability",
"foreign keys": [],
"global_block_settings": "1",
"indexes": [],
"module": "number",
"settings": [],
"translatable": "0",
"type": "number_integer"
},
"field_instance": {
"bundle": "departure",
"default_value": null,
"deleted": "0",
"description": "",
"display": {
"default": {
"label": "above",
"module": "number",
"settings": {
"decimal_separator": ".",
"prefix_suffix": true,
"scale": 0,
"thousand_separator": " "
},
"type": "number_integer",
"weight": "20"
}
},
"entity_type": "departure",
"field_name": "field_departure_availability",
"label": "Spaces available",
"required": 0,
"settings": {
"custom_add_another": "",
"max": "",
"min": "",
"prefix": "",
"suffix": "",
"user_register_form": false
},
"widget": {
"active": 0,
"module": "number",
"settings": [],
"type": "number",
"weight": "12"
}
}
}
}
XML (using some code form stack exchange):
<?xml version="1.0"?><fields>
<departure-departure-field_departure_additional_info>
<field_config>
<active>1</active>
<cardinality>1</cardinality>
<deleted>0</deleted>
<entity_types />
<field_name>field_departure_additional_info</field_name>
<foreign-keys>
<format>
<columns>
<format>format</format>
</columns>
<table>filter_format</table>
</format>
</foreign-keys>
<global_block_settings>1</global_block_settings>
<indexes>
<format>format</format>
</indexes>
<module>text</module>
<settings>
<max_length>255</max_length>
</settings>
<translatable>0</translatable>
<type>text</type>
</field_config>
<field_instance>
<bundle>departure</bundle>
<default_value></default_value>
<deleted>0</deleted>
<description>Enter any additional information that applies to this departure.</description>
<display>
<default>
<label>above</label>
<module>text</module>
<settings />
<type>text_default</type>
<weight>24</weight>
</default>
</display>
<entity_type>departure</entity_type>
<field_name>field_departure_additional_info</field_name>
<label>Additional information</label>
<required>0</required>
<settings>
<custom_add_another></custom_add_another>
<text_processing>1</text_processing>
<user_register_form></user_register_form>
</settings>
<widget>
<active>1</active>
<module>text</module>
<settings>
<size>60</size>
</settings>
<type>text_textfield</type>
<weight>13</weight>
</widget>
</field_instance>
</departure-departure-field_departure_additional_info>
<departure-departure-field_departure_availability>
<field_config>
<active>1</active>
<cardinality>1</cardinality>
<deleted>0</deleted>
<entity_types />
<field_name>field_departure_availability</field_name>
<foreign-keys />
<global_block_settings>1</global_block_settings>
<indexes />
<module>number</module>
<settings />
<translatable>0</translatable>
<type>number_integer</type>
</field_config>
<field_instance>
<bundle>departure</bundle>
<default_value></default_value>
<deleted>0</deleted>
<description></description>
<display>
<default>
<label>above</label>
<module>number</module>
<settings>
<decimal_separator>.</decimal_separator>
<prefix_suffix>1</prefix_suffix>
<scale>0</scale>
<thousand_separator></thousand_separator>
</settings>
<type>number_integer</type>
<weight>20</weight>
</default>
</display>
<entity_type>departure</entity_type>
<field_name>field_departure_availability</field_name>
<label>Spaces available</label>
<required>0</required>
<settings>
<custom_add_another></custom_add_another>
<max></max>
<min></min>
<prefix></prefix>
<suffix></suffix>
<user_register_form></user_register_form>
</settings>
<widget>
<active>0</active>
<module>number</module>
<settings />
<type>number</type>
<weight>12</weight>
</widget>
</field_instance>
</departure-departure-field_departure_availability>
</fields>
Thanks for putting these
Thanks for putting these together. All other things being equal, +1 for JSON, it's so much more readable.
Performance on these more realistic values
I ran the read_xml.php and read_json.php scripts above with the more realistic configurations posted by Steven Jones, and the difference is even more pronounced. In the trivial example JSON is ~10x faster, while with these examples, JSON is ~240 times faster (24,000,000 microsec vs 99,000 microsec).
read_json.php xhprof output
read_xml.php xhprof output
Thank you for taking the time
Thank you for taking the time to test the scripts. Am I right if I say that you ran the test 500 times ? 500 is a bit much for our typical configuration, where we would ran this kind of script ~50-100 times for the average website. But it's good to reveal the differences.
99,000 microseconds = 99 milliseconds right ?
So 24,000,000 microseconds = 24,000 milliseconds = 24 seconds ???
Or can you enlighten me here ?
The test loops over 500
The test loops over 500 different files and reads them in.
I don't think it's at all impossible that a site might have 500 files. Bear in mind this could be a file per field instance, per field definition, per content type, per exported view, a file for the variables of each module installed etc. etc.
Just a note
I do not know what xmlToArray function you used but for sure xml is way slower given that the SimpleXMLElement::__construct took more than ten times than json_decode.
From my perspective, JSON
From my perspective, JSON seems as complex as XML in this example, in term of syntax. While I'm not pro XML, and while I originally proposed INI because the syntax seemed a less restrictive and a bit more natural at the time (I didn't change of opinion, but I'll respect hejrocker's decision of getting out INI), I would tend to think than XML is better here for all the reasons I could here from Crell and some others above, from various posts.
You could also have written your XML version as such:
<config><!--Site information -->
<site_info>
<site_name>
<en>I love Drupal!</en>
<se>Jag älskar Drupal!</se>
</site_name>
</site_info>
</config>
Which seems a wee bit shorter.
Pierre.
Readability
Since it seems readability is a metric here, I'd like to stress on a key point: syntax coloring. Both of the real-word examples above look almost unreadable to me as they are here on g.d.o. Syntax coloring would go a long way in improving readability. This is how the previous shorter examples look like in Eclipse:
https://skitch.com/plach/fq6dt/config-formats-json
https://skitch.com/plach/fq6dh/config-formats-xml
In both the editors I normally use (Eclipse and Notepad++) you have no native support for JSON data and you have to fallback to plain JavaScript, which does not distinguish between object keys and object values, thus making harder to find the latter at a first glance. Moreover in JSON comments have the same syntax (an thus coloring) of config data, making even harder to distinguish them.
So if readability is an issue, it seems that syntax coloring for JSON would work at the same level of XML only on advanced text editors letting the user specify colors through a regexp or a similar way. Which may not be always available (or properly configured) when one needs to make a one-time config adjustement on a production server.
You have a point here. I've
You have a point here. I've just tested in Netbeans and Coda, and in Netbeans, I have the same result than you.
Another thing to consider is whatever format we chose, in serious IDE like Netbeans, Eclipse and the likes, you have a code navigator where only the keys are displayed in the proper structure, so even if the file is harder to read in XML, the code navigator makes it really easy.
So I was a proponent of JSON partly for the readability, but realize now that it doesn't really make a difference.
The good thing would be to take those good samples, add some non-english characters and comments, and try them with config tools like Chief, Puppet or whatever (I have no knowledge here).
Another thing would be to benchmark our solutions. I'm thinking about creating 50-100 config files (which corresponds to the average number of modules activated in a real Drupal site) and write a small PHP loop to read them. That would simulate the initial writing from level 1 to level 2 store, or when the config cache is flushed.
JSON works perfectly here
with emacs
js2-mode. As can be seen. The readability is of course significantly improved in JSON.Now I believe we're discussing IDE limitations instead of bottom line convenience.
Imagine that I have to ssh in to a server where there's only vi. Which is more convenient and easy: XML or JSON?
My point is that if
My point is that if readability is an issue we need metrics to evaluate and benchmark it.
The fact that comments for JSON would be a convention involve an objective consideration: syntax-coloring for JSON might stand the competition with XML only under particular conditions. Or otherwise stated: on at least two of the major IDEs around JSON syntax-coloring is performing worse than XML one.
This is totally subjective, as I already stated, the real world examples above with no-syntax coloring were both totally unreadable to me. FWIW I find the readability of the two following excerpts very similar (lines swapped and integrated to ease comparison):
"field_config": {"active": "1",
"cardinality": "1",
"deleted": "0",
"field_name": "field_departure_additional_info",
"entity_types": [],
"foreign keys": {
"format": {
"table": "filter_format"
"columns": {
"fid": "3"
"format": "format"
},
}
}
}
<field_configactive="1"
cardinality="1"
deleted="0"
field_name="field_departure_additional_info">
<entity_types />
<foreign-keys>
<format
table="filter_format">
<columns
fid="3"
format="format" />
</format>
</foreign-keys>
</field_config>
Not quite for me
I find the XML example a complete mess. Difficult to parse and verbose. Yes it's true that you can be
conditionedtrained to parse XML "naturally". But that's hardly a reason to prefer it IMO.White spaces in keys
A point against XML: how do we encode keys having white spaces in them? Example:
"entity info": {"entity keys": {
"id": "nid"
}
}
Edit: how do encode keys having white spaces in a clean way, I mean.
I'd expect an XML format for
I'd expect an XML format for settings to, rather than use setting name as the tag, use name as an attribute, like:
<key name="entity info"><key name="entity keys">
<key name="id">nid</key>
</key>
</key>
If we go with XML I'd rather
If we go with XML I'd rather have no keys with spaces that have . Looping over an array of items all called key would be a terribe DX
Full Fat Things ( http://fullfatthings.com ), my Drupal consultancy that makes sites fast.
Stick to whats relevant please
Can we please stick to what's relevant? For deeply advanced config files readability is really is a non-issue. I'll quote Crell from above
So can we please look on the technical aspects of it, since this is a deeply technical matter.
JSON
XML
And IMHO performance is not the main issue here, when reading or writing config files.
To ignore anything but
To ignore anything but technical arguments is to ignore developer experience, so no, we cannot do that.
And with all do respect to Crell, I could not disagree more with that quote. It is in no way FUD that I find XML less readable. I work with systems that use XML as a config format (SOLR) and I also work with json configs (chef), and the latter far easier to mentally parse and hand edit. I am speaking from experience.
And saying that XML is more widely used is not true. Sure, it's more widely used among java and .net, but if you look at python, ruby, node, or any projects started in the last 5 years, you'll see a lot more json and yaml. But still, that's neither a DX nor a technical argument so it's really not relevant.
Ok, technical aspects (but not only)
Sorry but neglecting both performance and DX is not a good approach. msonnabaum is right, I already said it in a previous comment : the rest of the world is moving to JSON (and YAML) config file, so it really should be taken into account. Other people out there are smart, and we're just beginners in the world of config files, so let's admit that they already thought about these problems and took a good and educated decision. You know, even Java supports JSON files very well know (thanks to Groovy for instance). And performance is not an option, it's a requirement unless we have to sacrifice it. Remember that Drupal is a bit slow, so injecting speed should be very welcome.
Now technical aspects2, as asked. I was reading this page in SitePoint about JSON as config files. As already said here, it could be used (as for XML) to have language dependent config. For instance
"mymodule_config" : {"lang" : "en",
"value" : "Sentence of your choice here"
}
The same goes for XML, so both languages are equal here.
Now you're saying XML support more data structures. The only structure we don't have is the associative array. I tested the long json sample from above, with this small code :
<?php$json = file_get_contents("config.json");
$config = json_decode($json);
var_dump($config);
?>
Now a smaller sample with the output.
{"mymodule_config" : {
"lang" : "en",
"number" : 10,
"value" : ["format", 123, "nid"],
"settings" : {
"settings_1" : 1,
"settings_2" : 2,
"settings_3" : 3
}
}
}
Here we have an object with all basic PHP structured attributes : string, integer, zero-indexed array, plus a nested object. Here's the var_dump to prove it.
object(stdClass)#1 (1) {["mymodule_config"]=>
object(stdClass)#2 (4) {
["lang"]=>
string(2) "en"
["number"]=>
int(10)
["value"]=>
array(3) {
[0]=>
string(6) "format"
[1]=>
int(123)
[2]=>
string(3) "nid"
}
["settings"]=>
object(stdClass)#3 (3) {
["settings_1"]=>
int(1)
["settings_2"]=>
int(2)
["settings_3"]=>
int(3)
}
}
}
Missing the associative array ? Wel, our associative arrays in Drupal could be now converted in stdClass objects. PHP4 couldn't handle objects properly, but now with PHP5, we can iterate into an object properties, compare objects, and do all the fancy stuff that OOP and PHP5 allow us. And to be future-proof, we could even imagine that in a future version of Drupal (maybe even D8), we will create a real Configuration class, with inheritance, composition or whatever OOP permits. Using now objects instead old fashioned associative arrays could open interesting doors. I admit that XML could allow us the use of this structure, but the point is that JSON could too, and it's baked in the language.
Now to comments. Those thinking that using "_comments" -like key is a bad idea are wrong. It's already in use out there, like in the specifications of schema.org. It's already a convention out there. For the lazy, from schema.org :
{
"datatypes": {
"Boolean": {
"ancestors": [
"DataType"
],
"comment": "Boolean: True or False.",
"comment_plain": "Boolean: True or False.",
"id": "Boolean",
"instances": [
"False",
"True"
],
"label": "Boolean",
"properties": [],
"specific_properties": [],
"subtypes": [],
"supertypes": [
"DataType"
],
"url": "http://schema.org/Boolean"
},
"DataType": {
"ancestors": [],
"comment": "The basic data types such as Integers, Strings, etc.",
"comment_plain": "The basic data types such as Integers, Strings, etc.",
"id": "DataType",
"label": "Data Type",
"properties": [],
"specific_properties": [],
"subtypes": [
"Boolean",
"Date",
"Number",
"Text"
],
"supertypes": [],
"url": "http://schema.org/DataType"
},
...
They're using "comment" and "comment_plain". "comment" can include HTML, not "comment_plain", thanks for wondering. And they're not the only one to use this convention.
Anyway I noticed another interested thing when writing the previous sentence : they're using inheritence ! Notice the "ancestors" key : an associative array referencing the "DataType" class declared in the bottom part of the sample. So it confirms what I was writing above : using Objects instead of associative really looks like the proper future-proof road.
We could also, as written in the SitePoint page linked above, write our own json_decode() (I tested the following code, it works):
<?phpfunction drupal_json_decode($json) {
$regex = array("p"=>"/[\w]<em>(\/\/).</em>$/m", "r"=>""); //remove //-style comments
$json = preg_replace($regex['p'], $regex['r'], $json);
return json_decode($json);
}
?>
It's a drupalism but could help baking in comments right in JSON files. But it's a bad idea because it would break interoperability between systems (these would become Drupal-JSON files), and we want to avoid it.
And another reason to use json for config files instead of XML (well, instead of any other format in fact) : Javascript can read it ! It seems obvious, but the underlying means : we can use a single format for configuring our PHP code (core, contrib and custom modules) AND our client side Javascript libraries ! jQuery plugins, I'm looking at you. Well, not only jQuery plugins (provided by core or contrib), but also any javascript code that needs a "settings" object (a very common design in the javascript world). To go even further, with the use of nodejs, one could write drupal modules that provides nodejs plugins that would make use of a JSON config file... Seeing my point ? One config file format to rule them all.
So to compare JSON and XML :
JSON
XML
For performance, DX and flexibility's sake, my definitive vote goes for JSON. I hope I've been objective enough, I don't have any interest in one format over the other, but now I truly think JSON would fit the role of the configuration file format better for Drupal.
(what's with this habit to write story-long comments ??)
Configuration class
Really? Seriously? You're arguing about the config format and you don't know if we're going to be using a class or not?
Please read this original posting from June:
http://groups.drupal.org/node/155559
Especially see the section entitled "I'm a super-fancy-pants developer and need something more powerful than this. How do I override it?"
A "real configuration class" is the point. That is the API. That is what we are actually using 99% of the time. Aside from defining defaults, a module developer will not be touching these files directly. The core of the configuration system is a Configuration class. The file format we've been discussing now for over 2 months is simply a state-serialization format that needs to suck less than PHP's serialize(). That's it..
Now, the original writeup was written on the assumption of JSON as a serialization format. I'm not anti-JSON. I'm anti-anti-XML-FUD. Really, at this point I am happy with either one and am fine with heyrocker just picking one and telling us what it is. He's the CMI lead, he gets to do that. The difference between the two is not make-or-break for the system anywhere near the level that the API is, something that nearly everyone has been ignoring. We need the config objects (yep, with classes, as was the plan in June and AFAIK still is). We need them to make WSCCI work, because the variable system is simply not up to the task.
As to the expressiveness of JSON, consider:
{'foo': 'bar',
'baz' : {
'a': 'A',
'b': 'B',
'c': 'C',
},
}
Should baz map to an associative array or a stdClass object in PHP? There is no way to tell, because PHP has no array data type, only a hash type, and Javascript/JSON has no hash type, only an object type. That's why the original proposal from June said that we'd just assume baz would be an associative array, never a stdClass object, and call it a day.
I'm fine with that. I don't see it as a huge strike against JSON, really. But that doesn't mean it doesn't exist, whereas in XML since we'd be constructing our own data type formats anyway we could support all of PHP's data types.
Greg, just make a decision and we'll live with it, please. :-) Let's get to coding.
+1
+1
Well put
Well put. We just have to make whatever format work. Let's get coding! :)
A big problem with this whole
A big problem with this whole discussion is it's strayed so far from the context of the overall effort. I'm a bit surprised Djebbz didn't know we'd use a config class, but I have talked to plenty of people in irc who've been extremely confused about the current status of defaults vs. level 1 vs. level 2 storage - the plan has changed a couple of times since it was originally written up and I'm sure a lot of people aren't clear on the boundaries of what the files will actually be used for.
I'm in a similar position though. I'm anti-anti-JSON-FUD, I have concerns when people bring up various XML transformation tools since I very much want even the low levels of the API to be as format-agnostic as possible, but I really don't care that much at this point as long as it's enforced very strongly that we use it as "a serialization format that sucks less than serlialize()".
Thank you Crell for taking
Thank you Crell for taking the time to answer me correctly. And sorry if I've been noisy, I realize I miss some parts of the puzzle. I'll come back after reading them, and will try to be less noisy and more helpful.
I see the comparison as a
I see the comparison as a wash. They both have strengths and they both have problems.
I doubt either is going anywhere.
XML is more flexible (and a de-facto intermediate format), JSON is faster and easier (I don't like the "let the developers deal with it" argument). Both should have drupal_encode and drupal_decode functions for standardization.
The choice needs to be made, lets make it so us developers can deal with something.
XML!
So as I stated in my core conversation earlier this week, after a lot of consideration from this thread and talking to people in London I decided we should go forward with XML as the format. I also think we should make this implementation as decoupled as possible from the file format so we can swap it out if we want later, and also so that if we decide to make it pluggable down the road, we can. Closing this thread now, thanks everyone immensely for their input.