Possible project for Summer of Code 2007

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
robertDouglass's picture

I've been thinking about GSoC 2007 quite a bit, and I have an ambitious plan for Drupal this year, but I want feedback from people in this group about the best way to implement it. In addition to however many normal SoC projects we might take on this year, I would really like to get a number of students interested in a research and infrastructure oriented project that will still require coding, but also pure research, server administration and documentation creation.

The project would be to build a virtual performance and scalability testing environment, and would build directly upon SoC 2005 and SoC 2006 work done by students Thomas Ilsche and Rok Zlender. Through careful preparation of pre-configured iso images of servers and scripted use of a server virtualization services (such as Amazon EC3), it will be possible to do stress and performance testing on various Drupal configurations and various hardware configurations. The questions that we are seeking answers for include:

* What load is required to break Drupal while adjusting the following parameters:
      o Core modules enabled
      o Core configuration
      o Contrib modules enabled
      o Contrib configuration
      o Database server (MySQL | PostgreSQL)
      o Apache configuration
* How does the load required to break Drupal change when:
      o Adding more database servers vertically (clustering)
      o Adding more database servers horizontally (breaking the database schema onto multiple databases)
      o Adding more webservers
      o Adding opcode cache
      o Adding memcache
      o Adding dedicated file server(s)
* How does applying patches currently being considered in the Drupal issue queue change performance on the above tests? 

The individual projects which would be assigned to students would be independent of each other and involve tasks such as configuring servers and making iso's, writing scripts to execute deployment tasks, writing scripts to create test suites, writing methodology documentation, analyzing test results.

Reaching these goals would be an amazing achievement for Drupal, and I see SoC as the perfect forum for getting us there. All of this is, of course, highly dependent on Google deciding to have SoC 2007 =)

Comments

Of interest to Google separately

Boris Mann's picture

If we can come up with a way to promote it to them, perhaps we can get some "special status" with Google. This is the kind of work that I suspect would be VERY interesting to Google regardless of SoC.

Robert, we might want to split the virtual environment infrastructure out into a separate proposal -- through the assoc. for instance.

allisterbeharry's picture

Hello,
I've done a SoC 2007 Drupal proposal on automating the whole process of creating a complete Drupal site with a LAMP stack in a self-contained virtual machine image. A formatted PDF version is here:
http://www.abeharry.info/SoC2007_DrupalAST_FullProposal.pdf
Here is the abstract:

The Drupal automated staging toolkit is a proposed set of code libraries, file schemas and parsers, and code generators, for
automatically creating a Drupal site with specific module code versions, sample users and data, and a specific LAMP stack
configuration for hosting the Drupal site. The toolkit also has the ability to stage this generated site on an existing physical server
location, and also as a self-contained virtual machine consisting of a minimal Linux environment, required LAMP software, and the Drupal
site.

The automated staging toolkit is designed to be part of an automated unit- and regression testing environment, by providing testers with a
simple, fast way to automatically generate a complete Drupal site running specific code versions, and using specific LAMP server
configurations. It is also intended for use as part of a performance and scalability testing environment by providing the ability to
rapidly build and then benchmark the effects of different application, web and database server configurations on Drupal site performance and
scalability.

This toolkit will be used in the following way:
1. The tester creates or reuses an XML(or other structured) file using
a schema describing the Drupal site code-tree, including modules
installed/enabled/disabled, and the versions of each module to be
used.
2. The tester creates or reuses an XML file using a schema describing
the LAMP stack web server, PHP/application server, and
database server configuration; e.g Apache vs. Lighttpd, mod_php
vs.FastCGI, choice of op-code cache, MySQL vs. PostgreSQL, and so on.
3. The tester creates or reuses an XML file using a schema describing
the sample users and content data the site will contain.
4. The parsers take each file and generate scripts in a lightweight
language (Python or Ruby or PHP-CLI.) These scripts, when executed,
use functions in the code libraries to download Drupal modules,
generate database scripts and datasets, and write server configuration
files.
5. Given a physical server target location, the Drupal modules,
database scripts and server configuration files are deployed to the
designated server location, to produce a new, ready-to-test Drupal
site.

Time and resource permitting, a builder in the toolkit will also use the generated Drupal site and servers' configuration as input to build
a self-contained virtual machine image in Xen, VMWare, or potentially the Amazon EC AMI format. This virtual image can also be be used in
testing environments, including advanced performance testing scenarios such as evaluating clustering, distributed database topologies, and
alternative storage and computing models like Amazon S3 and Elastic Cloud. The ability to rapidly generate self-contained Drupal virtual machine images will also be extremely valuable to Drupal consultants and solution providers for marketing, prototyping and demonstrating
Drupal solutions to potential clients, and large organizations and ASPs like CivicSpace looking to take advantage of the massive benefits
of virtualization technology from VMware, Xensource, Amazon, and the like.

Allister.