Building a Drupal benchmarking test suite

Events happening in the community are now at Drupal community events on www.drupal.org.
gerhard killesreiter's picture

Hi there, I have some interest in building a Drupal becnhmarking suite that can be used for testing the impact of newly proposed patches on our beloved content managemanet framework.

I already have a start of a setup (which needs to be improved a lot). Currently, the setup is a rather simple Drupal site with content generated by the script collection that comes with devel.module. I have commented out the "log out" link and then let five registered users browse the site. The registered users are implemented through wget. This has some advantages and some disadvantages. Advantage is that the users more or less follow the same path when traversing the site. At least this was an advantage when I wanted to study the influence of several patches on table locking. For other tests it might not be terribly useful.

RobRoy has shown me how to do some comment posting, so I am likely to add this somehow.

The question is how the whole think should be organized. Currently, it is a standalone shell-script which calls the wgets with session info attached. What I would like to see is a bacnhmark.php script which lets a user specify the number of users to be used, the rate of traversal, the rate of comment posting, etc, and then starts the processes in the background. Ideally this would happen on a separate server. benchmark.php would then collect the results and do some statistical analysis. The test results would then optionally be specified in a database of test results, maybe here on drupal.org.

What I think will be most challenging is not to write the code, but to define useful test setups. We'd need to analyse how users really use a site and would probably need (anonymized) data from accesslogs.

Also, we'd want to check how crawlers access a site, whch can also be destilled from accesslogs.

Anybody interested in collaborating on this?

Please add useful ideas in the comments.

Cheers,
Gerhard

Comments

selenium?

greggles's picture

I've been thinking about using selenium for this? Record some typical patterns through a site (including posts of nodes/comments) and then play it back. I'm not sure how well it would work for running 5 of them at the same time, though or if selenium even handles that or perhaps you could fake it with 5 browser sessions.

Generally speaking, i think that testing and benchmarking are areas that we can really improve and I'd love to help develop a test/benchmark suite.

--
Knaddison Family | mmm Beta Burritos

Isn't Selenium a browser

gerhard killesreiter's picture

Isn't Selenium a browser plugin? I think it could not handle more than one session then.

yes, it is browser based

greggles's picture

yes, it is browser based hence my comment about needing multiple browser sessions (e.g. firefox on 5 computers, firefox on 5 x sessions, etc.)

Knaddison Family | mmm Beta Burritos

Some things in common

figaro's picture

It seems there is definitely a market for people willing to participate in such a venture:
http://groups.drupal.org/node/2410
Would be keen to learn your views and thoughts on a repeatable approach.

figaro

ClientForm and ClientCookie

sugree's picture

I have some great experience to crawl web pages using ClientForm and ClientCookie for simulating web surfing accurately. Anyway, it is written in Python so we also have thread to fake number of concurrent users. I would love to help you develop something in this project.

For high load condition, I always run siege. It allows me to specify list of urls for GET method.

I have 0 skills in python

gerhard killesreiter's picture

I have 0 skills in python and so I think it would not be a good choice. Drupal itself already has most of the functionality itself, besides the actual crawling.

php+wget?

sugree's picture

Oh, I see. You expect to have benchmark.php running on shell prompt as well as wget in the background.

It would be better to add a hook for crawling given url so the built-in drupal_http_request() could be used by default for testing in request/response fashion and wget could be used for more fake concurrent requests. In addition, I may write a python extension hooking for simulating higher load.

handbook page

pwolanin's picture

Have you looked at this handbook page: HOWTO: Benchmark Drupal code- that seems to have written by webchick?

Anyhow, it suggests Apache Bench (ab) in addition to seige.

I would like to see this be a major SoC project

robertdouglass's picture

My dream is to have a number of iso images of servers that can play different roles, and to launch them for testing purposes on Amazon EC3. I would like to be able to separate the clients running the tests from the server being tested. I would like to be able to test Drupal on different configurations (1 machine doing everything, 1 web node, 1 database node, N webs, M database), all the while being able to scale the number of clients running tests until we can break the installation. I would like to know the real break point for Drupal.

This setup could be used to test different configurations, test different patches, and test the unit cost of contrib modules.

This setup should build on the work done on testing done in the past two summers (SimpleTests and Automated Unit Tests).

Anyone have an EC3 account? I've applied but it is still a closed beta.

Benchmarking Drupal

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: