Hi there, I have some interest in building a Drupal becnhmarking suite that can be used for testing the impact of newly proposed patches on our beloved content managemanet framework.
I already have a start of a setup (which needs to be improved a lot). Currently, the setup is a rather simple Drupal site with content generated by the script collection that comes with devel.module. I have commented out the "log out" link and then let five registered users browse the site. The registered users are implemented through wget. This has some advantages and some disadvantages. Advantage is that the users more or less follow the same path when traversing the site. At least this was an advantage when I wanted to study the influence of several patches on table locking. For other tests it might not be terribly useful.
RobRoy has shown me how to do some comment posting, so I am likely to add this somehow.
The question is how the whole think should be organized. Currently, it is a standalone shell-script which calls the wgets with session info attached. What I would like to see is a bacnhmark.php script which lets a user specify the number of users to be used, the rate of traversal, the rate of comment posting, etc, and then starts the processes in the background. Ideally this would happen on a separate server. benchmark.php would then collect the results and do some statistical analysis. The test results would then optionally be specified in a database of test results, maybe here on drupal.org.
What I think will be most challenging is not to write the code, but to define useful test setups. We'd need to analyse how users really use a site and would probably need (anonymized) data from accesslogs.
Also, we'd want to check how crawlers access a site, whch can also be destilled from accesslogs.
Anybody interested in collaborating on this?
Please add useful ideas in the comments.
Cheers,
Gerhard
Comments
selenium?
I've been thinking about using selenium for this? Record some typical patterns through a site (including posts of nodes/comments) and then play it back. I'm not sure how well it would work for running 5 of them at the same time, though or if selenium even handles that or perhaps you could fake it with 5 browser sessions.
Generally speaking, i think that testing and benchmarking are areas that we can really improve and I'd love to help develop a test/benchmark suite.
--
Knaddison Family | mmm Beta Burritos
knaddison blog | Morris Animal Foundation
Isn't Selenium a browser
Isn't Selenium a browser plugin? I think it could not handle more than one session then.
yes, it is browser based
yes, it is browser based hence my comment about needing multiple browser sessions (e.g. firefox on 5 computers, firefox on 5 x sessions, etc.)
Knaddison Family | mmm Beta Burritos
knaddison blog | Morris Animal Foundation
Some things in common
It seems there is definitely a market for people willing to participate in such a venture:
http://groups.drupal.org/node/2410
Would be keen to learn your views and thoughts on a repeatable approach.
figaro
ClientForm and ClientCookie
I have some great experience to crawl web pages using ClientForm and ClientCookie for simulating web surfing accurately. Anyway, it is written in Python so we also have thread to fake number of concurrent users. I would love to help you develop something in this project.
For high load condition, I always run siege. It allows me to specify list of urls for GET method.
I have 0 skills in python
I have 0 skills in python and so I think it would not be a good choice. Drupal itself already has most of the functionality itself, besides the actual crawling.
php+wget?
Oh, I see. You expect to have
benchmark.phprunning on shell prompt as well aswgetin the background.It would be better to add a hook for crawling given url so the built-in
drupal_http_request()could be used by default for testing in request/response fashion andwgetcould be used for more fake concurrent requests. In addition, I may write a python extension hooking for simulating higher load.handbook page
Have you looked at this handbook page: HOWTO: Benchmark Drupal code- that seems to have written by webchick?
Anyhow, it suggests Apache Bench (ab) in addition to seige.
I would like to see this be a major SoC project
My dream is to have a number of iso images of servers that can play different roles, and to launch them for testing purposes on Amazon EC3. I would like to be able to separate the clients running the tests from the server being tested. I would like to be able to test Drupal on different configurations (1 machine doing everything, 1 web node, 1 database node, N webs, M database), all the while being able to scale the number of clients running tests until we can break the installation. I would like to know the real break point for Drupal.
This setup could be used to test different configurations, test different patches, and test the unit cost of contrib modules.
This setup should build on the work done on testing done in the past two summers (SimpleTests and Automated Unit Tests).
Anyone have an EC3 account? I've applied but it is still a closed beta.