Infrastructure Requirements

ceardach's picture

Damien is working on setting up three instances we can use for this project: redesign.drupal.org, staging-1.drupal.org and staging-2.drupal.org. If you want access to these instances, upload your SSH key on this thread.

Here are details that we'll need to discuss:

  • What are our development needs?
  • What do we do about managing the database?
    • How will we handle configuration changes?
    • How can we make the database available for developers using local environments?
Login to post comments

Community instance requirements

ceardach's picture
ceardach - Thu, 2009-06-25 19:10

After some discussion, this is my current understanding of what is needed for the infrastructure:

  • Public Testing: always at the latest code, and available for the entire community to view and test
  • Developer Testing: An ability to port code over and test it with everything (like solr) when people don't have a complete local copy
  • Private Development: For people who need to develop with a complete copy of d.o (like solr), or can't set up their own local copy.
  • Local Development: Making all the tools available so someone can set up their own local development environment with minimal requests for assistance.

It appears as though what Damien is setting up with redesign, staging-1 and staging-2 will give us a good start, and we can evolve as need requires.


Developer testing

ceardach's picture
ceardach - Thu, 2009-06-25 20:03

From Damien Tournoud in e-mail:

From ceardach in e-mail:
Developer Testing: An ability to port code over and test it with everything (like solr) when people don't have a complete local copy

On "Developer testing", I include theming. Theming typically involves two different (but complementary) actions: a developer set up a new feature that output some HTML, the themer does the CSS magic on top of the HTML, or discuss with the developer to change the HTML if that's needed.

I'm not sure how to make the theming work. What apparently did work during the redesign sprint in Paris is to set up "development sites", accessible via SSH (and for Mac User, mountable via Mac Fusion). That means that we will have to set up SSH access to all themers.

The problem is, we have a limited disk space. That's why I suggested to have only two staging sites, on which themers could do their things. As long as we have some kind of coordination ("hey, I'm theming the forum node type on staging-1"), I believe it can work out.

I think that starting off with two staging sites for now is fine. We can figure out if we need to grow if our team grows larger and starts having conflicts.


Managing database configuration settings

ceardach's picture
ceardach - Thu, 2009-06-25 19:28

If we can, it would be great if we (as a development team) could work towards the process of eliminating the need to make changes to the database manually, and instead utilize update hooks for everything we'd need. Automatic updates, and passing around the database would be tremendously easier.

In using this process, we would never make configuration changes through the web GUI. Is this development process possible?

(feel free to port this over to a "development practices" thread as needed)


Redesign project is an "update"

ceardach's picture
ceardach - Thu, 2009-06-25 20:05

From Damien Tournoud in e-mail:

From ceardach in e-mail:
What are your plans for automatically updating the database as needed? If we can, it would be great if we (as a development team) could work towards the process of eliminating the need to make changes to the database manually, and instead utilize update hooks for everything we'd need. Automatic updates, and passing around the database would be tremendously easier.

That's a key point. I'm a strong proponent of "everything in code". On the current drupal.org, we don't have a lot of things in configuration. This is a good thing (TM).

I suggest we consider the redesign project as an "update". We are updating from the current version of drupal.org to the version N+1. Seen that way, it makes a lot of sense to script the migration using update functions.

I wholly agree that this should be approached as an update to d.o. In addition, we would be effectively constantly testing the update process during development, significantly reducing errors that will crop up on the final update to d.o.


Distributing a sanitized Drupal.org database

ceardach's picture
ceardach - Thu, 2009-06-25 19:40

In order to enable local development, a sanitized version of the Drupal.org database will need to be available for distribution. I normally recommend putting it into version control, but at 600MB zipped / 1.7GB unzipped it is obviously way too large for subversion :) If we can set up an rsync method for contributors, it could speed up the download process. We could also possibly set up a torrent.

If we are successful in eliminating the need to manually make configuration changes by putting everything into update hooks, then all we'll need for a database is an unmodified-yet-sanitized version of the Drupal.org db. This copy of the db would be created:

  1. Within a week of a configuration change on d.o
  2. 30 days from the last configuration change

Ideally we'd have two copies, just in case a configuration change causes a big problem that the redesign project has to compensate for. When using rsync, the download of these updated versions of the database would be significantly reduced. If we use a torrent, the contributor would have to re-download the database whenever they need the latest update.

Thinking longer term, having this distributable-version of the database available to the community could be useful for other needs outside of the redesign project.


DB staging requirements

ceardach's picture
ceardach - Thu, 2009-06-25 19:59

From Damien Tournoud in e-mail:

From ceardach in e-mail:
If we are successful in eliminating the need to manually make configuration changes by putting everything into update hooks, then all we'll need for a database is an unmodified-yet-sanitized version of the Drupal.org db (longer term, this could be useful for other needs outside of the redesign project). This copy of the db would be created 1) within a week of a configuration change on d.o, 2) 30 days from the last configuration change. Ideally we'd have two copies, just in case a configuration change causes a big problem that the redesign project has to compensate for.

We do have two types of modifications to stage:

- modifications affecting the "migration path": those change some key parts of the migration. They will require the full migration path to be played again: making a copy of the live database, running the update scripts

- modifications that doesn't affect the migration path: those can and should be pushed directly to the public testing site

How do we do the split between the two? Do we need a separate public testing site that do not risk to break if someone commit a major change, or do we consider some downtime to be acceptable?

About the database creation, I believe we need to make a full dump of drupal.org every week, and run the migration process on that right after that. I don't believe that having two copies will be necessary. We can anticipate changes, and we are a team small enough for coordination not to be a big issue.

How complicated or intensive are update scripts in both a best case and worse case scenario? Is it possible for a contributor to run these updates themselves?

In the scenario I was envisioning, a user would download the latest sanitized version of the d.o database as needed and run the updates themselves. If they do not need to have the latest version of the database, then they can simply run the latest updates. In development, we'd use a normal (if not more rapid) release process where each config change is incremental. This would reduce the download and restore process for the database to an as-needed basis, while updates would handle the rest. Is that possible with the type of development we're doing?


Sanitation requirements

ceardach's picture
ceardach - Thu, 2009-06-25 20:14

What are the requirements for sanitization? At the least, we'd want to reduce the file size of the database. When I dump a database, I will keep the structure of the following tables, but truncate their data:

  • cache
  • cache_*
  • views_object_cache
  • votingapi_cache
  • masquerade
  • sessions
  • openid_association
  • accesslog
  • batch
  • biblioreference_keyword
  • comment_notify
  • flood
  • history
  • messaging_store
  • mollom
  • notifications_queue
  • notifications_sent
  • search_dataset
  • search_index
  • search_node_links
  • search_total
  • simplenews_mail_spool
  • watchdog
  • workflow_scheduled_transition

As you may have noticed, I do eliminate the search index. If it is needed, perhaps we could distribute the search index separately for just those who need it.

What other sanitation needs to take place?


Disk space

ceardach's picture
ceardach - Thu, 2009-06-25 20:07

Disk space is apparently at an extreme premium. What can we do to reduce this need?

What are the infrastructure/system requirements for these instances? Is it possible to use another server for this development push? If so what is required?

For example, I have a shared host with lots of bandwidth and disk space, but low load requirements. If something like that can't work, can an EC2 instance work? I'm willing to pay for an EC2 instance for three months.


Disk Space and DB Requirements

Narayan Newton - Tue, 2009-07-07 19:52

Gerhard and I are working on getting this together. We will host 10-15 drupal.org databases on a special MySQL instance on db1.drupal.org. The databases will have certain easy sanitation done to bring them down to a reasonable size, but nothing that would require massive SQL statements to preserve referential integrity. The MySQL instance will be running 5.1.34+InnoDB Plugin+Barracuda file format+Compression to lower the size of the disk requirements. The MySQL instance will be niced higher than the production instances.