Hi all,
I've long been frustrated by the overhead of Drupal. So, let me float this idea.
Background: On each page load, we include all the enabled module files and a bunch of system modules and include files. MBs worth of code is pulled from the disk that never gets used. Functions are stored and comments that get tossed by the PHP compiler. All this extraneous data takes a toll on disk overhead and cpu time. The field module takes a step in creating several include files but they are all included on every page load. Menu items store include file definitions; this is an existing method that helps. Lets go another step.
New Approach: I'm currently developing a node module. I've stripped the .module file down to only containing the hook_ functions so that they can be loaded as pointers. The actual functions become only-loaded-if-needed functions. Each hook function has an include_once and a functional call to the actual hook.
for example, this code is in mymodule.module file.
/
* Implements hook_menu().
*/
function mymodule_menu() {
include_once "mymodule.menu.inc";
return mymodule_menu_build();
}
/
* Implements hook_theme().
*/
function mymodule_theme() {
include_once "mymodule.themes.inc";
return mymodule_theme_build();
}
/
* Implements hook_form().
*/
function mymodule_form($node, &$form_state) {
include_once mymodule.forms.inc';
return mymodule_form_build($node, $form_state);
}
/
* Implements hook_node_info().
*/
function mymodule_node_info() {
include_once mymodule.crud.inc';
return mymodule_crud_node_info();
}So when the module is included on page initialization, a very small file is loaded from the disk; with minimal comments. Instead of loading thousands of lines of code, only 100-500 are needed. Each include file is not read until, AND ONLY IF, it is needed.
When the system calls a hook, the required hook_function is available to catch the hook, include the related functions and pass the call to the full function. I've broken out menu, form, theme, block and crud functions. It probably makes sense to put all the cache related items (like menu and theme definitions) into a single include file.
I realize that the total number of included files will likely increase. And, while I consider that to be a small drawback, I consider the reduction in total data read and processed as being a huge advantage.
I'd like to hear thoughts on the subject.
Bob

Comments
If you do some benchmarks
If you do some benchmarks with a properly configured opcache (APC is the most common choice) I think you'll find little to no benefit to this technique.
Splitting code off into separate files that only get included sometimes is primarily a technique to improve performance on environments with no opcode cache (read shared hosting). If performance is important to you you'll want to get off of shared hosting for this and several other reasons.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
How much can opcode caching with heavy sites?
Hi Dave,
Doing benchmarks is difficult. I would have to rearrange a billion lines of core and contributed module code to make a meaningful test. If someone was paying me to spend 6 months coding and testing, then it would be an interesting (and informative) project for sure.
Yes, I'm one of those shared hosting users. I haven't gotten down the opcache road. While I'm not versed on the details, I understand that the compiled code stays resident. And that may be acceptable to the big high traffic sites or folks willing to pay for dedicated servers. But that does help the small business owner who has customers navigate away from his slow site.
While working on my current project, I had to evaluate a bunch modules like various field extensions, alternative media types, ecommerce and alternative location types. I enabled some, disabled others, uninstalled etc. It got so slow that the page latency grew to a typical of 3+ seconds for a logged in admin; 0.6 sec for and anonymous user. That's for just one user. I found that 95% of the time was consumed in the bootstrap phase, i.e. drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL). What was it doing? Well, something like 400 files were being included, the peak memory reached 60MB and 400+ conf variables were loaded from the variables table. Once the bootstrap completed the execution of menu_execute_active_handler() was relatively instantaneous. As I uninstalled everything, the latency dropped off, but many modules do not delete their database tables and config variables.
For reference, on a fresh D7 install, with a minimal number of modules, I get a page latency in the order of 0.4 seconds for an admin, <100 includes, and a peak memory of 15MB.
Perhaps opcode caching would help considerably. Has anyone had similar issues where opcode caching has "solved" this problem? If so, what is the typical execution time for drupal_bootstrap call with a high module count of complex modules?
Also, is opcode caching an option in any of the cloud computing services? Or is it always a dedicated server requirement.
Thanks,
Bob
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
Profile Before Optimizing
This is the golden rule of code optimization: Profile Before Optimizing. If you disregard this rule you are wasting your time (i.e. your performance will degrade).
If you are interested in learning how much time is spent in which function, check out XHProf.
Maybe I wasn't clear
The 'difficulty' to which I referred is in creating sufficient test code to make a valid benchmark test. That was my point. If I were to only compare a single module in the old way vs. my suggested way, the differential would be in the noise band and would not offer much validity.
XHProf could be handy to find areas of concern, but its added overhead will skew the data. When I measure time, I place microtime('true') before and after the measurement point to minimize any extraneous time.
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
i Dave,Doing benchmarks is
There's always synthetic benchmarks. You would basically write some code that generates the test code. Then test the generated code.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
opcode caching is essential
I consider opcode caching to be essential for PHP. It is not included in PHP 5.5 for nothing. Find a shared hosting provider that has APC by default or uses PHP 5.5, or get a VPS so you can install it yourself. You can configure APC so it will never check the disk and only access cached copies of scripts in memory.
Yes and maybe
Of course opcode caching is definitely something which be considered. In fact the usage is fully transparent. But one have to fully understand how e.g. APC is working.
Basically the zend-internal function "compile_code" (could not remember the name) is overwritten. This function gets as an input the file-handle of the source code file and outputs the resulting byte-code to be included in the full syntax tree. The opcode cache will return the "compiled" version of the source if in memory but APC will also need to check if the file has been changed. In this case the compile to be run had the result should replace the in-memory cache entry. Be aware that the check for modified files could be prevented, so APC might just cache forever (till the web servers restarts or the user requests a rebuild). But for include/require this prevention is only partially true. In the source of these php-functions (c-side) you will see that a zend_file_open is done before the overwritten compile_code.
So APC in fact will NOT prevent the file-io. Considering the size of the files and the way modern HDs are working I could be bet that the difference to load the file-attributes or loading the full file is somewhat similar. Then not sure if APC is using a file-attribte (modification time) or a content hash as the "changed" criteria. In the later case the source has to brought to memory to calculate the hash.
In regards to the Drupal bootstrap and the time when hooks are called so might find out that a large number of hooks are only called vary rarely. E.g. _menu after enabling a module (or when the Drupal cache needs a rebuild). And also pretty often these kind of hooks are doing pretty much nothing but returning a static array of data. In fact this is just a configuration !
So I assume it might be worse to find a way to minimize the number of files that needs to checked with a request. In this regard the strategy might bring you an extra boost. Anyway the boost coming from APC is the 95% extra.
But maybe there is a way to strip ALL hooks implementation of all modules and combine them to a single file. The include would then been done by the system (just before in invoke_all is dispatching). Of this would need to have something like a "compressor" running on the source code. Contact me if you are interesting in creating such a tools, I definitely would be.
best
Carsten
If you have access to your
If you have access to your apc.ini, you should be able to stop apc from statting the file and checking it hasn't changed. It's a trade off that should really only be set on sites which do not have much code movement since it requires either waiting until apc.ttl has been surpassed, running an apc flush or an apache roll to register if code changes.
but as I said this regards
but as I said this regards only to the checks which is done in APC it self. The internal implementation of include, require_once, etc. will still result to a file-io !
Of course this is not the big amount of time. But a thousand ms will end up at a full second.
Masking is not a solution
With all due respect, disabling everything is not solving anything.
Out of the box, things should just work. The more Band-Aids a user has to implement, the less viable we become.
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
Perhaps the number of files is the primary issue; not content
Carsten,
Thanks for the detailed explanation.
Yes, I would agree that if you have to go to disk to check a file header, there isn't going to be much difference in grabbing the file if it is small.
I have considered another type of "compressor" as you called it. We use minimized js code without question, yet we run on a million lines of uncompressed & commented source code. Perhaps this is the biggest issue here.
I just had a thought, (duh? as you said, mS add up to Secs). I earlier noted that my site was loading 400 files during bootstrap. For arguments sake, say each disk access takes 3ms. That is 1.2sec of read time. Even if each access were 1ms, we are still talking about 0.4sec. I'll run some tests to measure the time from an include_once to it's return. That is certainly easy enough.
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
yet we run on a million lines
Early Drupal did not contain code comments for fear that it was bad for performance. That fear was disproved, and now Drupal is one of the best commented applications in existence.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
The comments are the documentation
and the comments are the best documentation next to reading the code itself :)
disproved, I suppose, if you run the opcode cache.
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
Try Zend OPcache
It not only allows you to skip all comments to make the compiled in memory code much smaller, but also gives you better control over files modifications check ( it is configurable ) to limit disk I/O. Just stop using the old and deprecated APC.
is that php5.5 as Joritt
is that php5.5 as Joritt mentioned?
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com
It is built-in in 5.5, but as
It is built-in in 5.5, but as an extension supports also 5.4, 5.3 and even 5.2.
Just to agree with most
Just to agree with most others here… yeah, this is not a valuable expenditure of your time and effort relatively to just getting a good opcode cache set up. If that's not an option for your clients on shared hosting, set up a good VPS (using a host that uses SSDs, if disk access is really a concern for you) and offer to host for them.
That being said, as D8 moves more things into lazy-loaded classes (that is, classes in files that PHP doesn't load until it realizes it needs to use the class), your worries about loading unused code will be abated a bit. Of course, there will be trade-offs in the other direction as well.
The Boise Drupal Guy!
Some file access time data
I ran a test within drupal_load (in bootstrap.inc) to measure the response time of the include_once call. Some files include other files which added a little noise to the data.
On average each include_once call takes 4ms (windows laptop running xampp php 5.4). My shared host is only slightly better at 3.5mS. For reference, I ran a check of the php filesize call. It averaged 0.185ms. So even if the opcode cache is checking the file headers, the overall time consumption is trivial (0.5mS /400 files).
The include time averages about 0.38mS per 4kb block of file size. This was the average of 174mS it to include 50 module files totaling 1.8MB.
Speculation:
If I extrapolate out to the 400 file case (which may have inaccuracies) it totals 14MB in 1.4seconds. Opcode cache clearly wins big time if the cache remains between accesses to the website.
If we consider compressed code for a moment, with the comments and spaces stripped out, the include time would roughly be reduced by the compression ratio.
If a 30% reduction were achieved, then the 1.4sec drops 1 sec. Pre-compilations that reduce the variable and function name lengths down, or better-yet full compilations, could reduce the file sizes and load time by something like 80%(?). Then the 1.4 sec would drop to 280ms. While much more reasonable, opcode cache would still win easily.
I'll have to update my development server and get some comparative data.
Thank you to everyone for your input!
Bob
Bob Marcotte
Engineering Consultant
Marcotte Enterprises, Inc.
http://www.marcotteenterprises.com