Drupal 'ports' collection

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
adrian's picture

What is it?

My recent work on Aegir has been very deeply oriented with Drupal package management and dependency checking.

From the perspective I have been working from (unix command line scripts), the currently existing functionality of the update module is simply not useful.
I am in need of an easily mirrorable meta-info repository of all the drupal projects, and all the drupal modules. I need to be able to figure out dependencies
before downloading the packages. This is further troubled by the fact that Drupal cares about the modules / themes, but Drupal.org and the xml feeds care about the drupal.org project nodes.

Please keep in mind, that this has very very little to do with Drupal's front end needs. This will never be the right tool for the job to update a Drupal site from within itself. I personally am not even sure that's all that wise, I liken it to apache trying to apt-get update itself.

Inspired by the FreeBSD ports collection, i started banging around some code.

Back-end

What we will have is a backend component, that generates a directory tree of meta-info files.
I've started writing a script which gets the xml files and releases from drupal.org, and re-formats them
into this filesystem structure.

Here's an example file :

<?php
Moya
:repo adrian$ pwd
/Users/adrian/Projects/hosting/repo
Moya
:repo adrian$ cat modules/atom/project.yaml
---
project: Atom
short_name
: atom
link
: http://drupal.org/project/atom
terms:
  -
Syndication
type
: modules
api_versions
:
  -
7.x
 
- 6.x
 
- 5.x
 
- 4.7.x
 
- 4.6.x
 
- 4.5.x
 
- 4.4.x
 
- 4.3.x
 
- 4.2.x
?>

Each of the API versions (i only start from v 5.x onwards), then has it's own xml file containing releases, for which i create a yaml file.

<?php
Moya
:repo adrian$ cat modules/atom/5.x-1.1/release.yaml
---
type: modules
project
: Atom
short_name
: atom
name
: atom 5.x-1.1
version
: 5.x-1.1
tag
: DRUPAL-5--1-1
link
: http://drupal.org/node/241025
file: http://ftp.drupal.org/files/projects/atom-5.x-1.1.tar.gz
mdhash: b597d7f65dcf9617eb814b4daf212372
filesize
: 8818
?>

I have chosen YAML because it is a easy to read and understand serialization format. I do not want to depend on Drupal version specific
.info file formats, and XML is out of the question.

I am working on a second pass after this, which downloads the tarballs for each of these releases, and parses out which
modules, themes, install profiles etc we are dealing with, generates checksums for all the files etc. etc.

I already have the vast majority of this code written for the package management in Provision,
so I already have a Drupal release agnostic algorithm for mapping the actual contents of each of the packages, and extract all the relevant meta-information.

Once this is done, we have a very portable meta-data repository, that can be distributed using rsync, or any other
method really. I think it might be useful to have a 'bootstrap' process where a tarball of the current repository is
automatically downloaded, and then rsync'ed from there.

The beauty of this method is, that we only need to have the script running in one place, and it will index Drupal.org for us, and we can
then easily mirror the code to multiple locations. There's even a possibility of the main repository being hosted on updates.drupal.org.

Additionally, at the moment, this will only be targetting Drupal.org as a package source, but it's being designed with the idea that you should be
able to index your own SVN, GIT, etc. repository, and add them as additional sources. This is the beauty of having a consistent intermediate format,
instead of relying on the d.o xml feeds. I'd also like to add acquia carbon as a source.

Client side

For the next step I will need to write a script (probably for drush) that parses all this information from the filesystem, and creates a sqlite
based data cache, so we can easily search and match for dependencies. I will also be porting the Drupal 7 acyclic graph api to Drush, as it is general enough to allow us to extend it to do all this stuff.

Implementation wise, i am also toying around with the idea of turning all the version numbers into a numeric index (kind of how the comment nesting
works in Drupal). This will allow us to determine higher versions through simple integer comparison sql queries , instead of the complex string manipulation and parsing we need to do.

Our dependency handling needs will far outstrip Drupal's own needs in fairly short order, as we will need to handle things such as the
'provides' keyword. IE: a drupal project contains 5 modules, by installing the package it provides: module-module_1, module-module_2.

We will also be breaking ground on versioned dependencies (ie: to install v 6.10 of somecoredrupalmodule, it needs to get v6.10 of Drupal itself). We will also be delving into cross package type dependencies. Currently you can't have a module that depends on a theme and vice versa. Install profiles depending on modules and themes for instance.

Once that is figured out, we should be able to 'apt-get update' and 'apt-get install' any drupal packages to our heart's content.
With the update.php and install.php code being ported to drush, we'll be able to simply call the update commands in the backend when we need them.

Major version upgrades, i'm not certain about. I am loath to upgrade sites in-line, as it's very difficult to recover from that,
and i'm also not sure i want this stuff tied to the provision / aegir codebase. Aegir handles the problem by creating a re-deployable backup
of the site, and migrating the site across to a newer Drupal release.

In some way, our requirements are orders of magnitudes more complex than what apt-get et. al. have to deal with. We need to be able
to run multiple versions of modules across multiple drupal installs across multiple sites in each of those drupal installs, simultaneously,
where as the other tools only need to deal with the currently installed, and currently available packages.

Front end

As this is being developed as part of my work on Aegir, this will need to be tied into the Hosting front end.
This will be trickier than it sounds, especially when we are working with multiple servers (Aegir is fully distributed), and the amount of meta-info we'll be transferring is quite large.

This part needs a lot more thought, to avoid unnecessary code duplication.

I hope these notes are useful.

Comments

a note on install profiles

adrian's picture

this will also be built to allow you to install profiles.

ie: drush.php install profile-myinstallprofile --root=/path/to/drupal

and it will go and get all the right modules and themes for you.

Big project

Rainy Day's picture

This will be trickier than it sounds

Dunno. Already sounds like a big project to me. But it would be nice to have (as in really awesome), so more power to you!

Though it’ll be less like BSD Ports (which usually involve compiling… tons of errors, large spools of bailing wire… & lots of prayers), and more like Linux Yum (which install pre-built packages, sans the compiling hassles).

Though it’ll be less like

Garrett Albright's picture

Though it’ll be less like BSD Ports (which usually involve compiling… tons of errors, large spools of bailing wire… & lots of prayers),

Either you haven't used the ports system in a while, or… you haven't used it at all. 95% of the time, building a port is a matter of cd /usr/ports/category/port, perhaps a make configure, and then a make install. The ports system itself is a little more complex than that (you'll want to synch up your tree first with portsnap fetch; portsnap update or some other way), but usually that's all that's involved in building a port. See Using the Ports Collection in the FreeBSD handbook.

(In case it's not clear, I'm somewhat of a BSD fanboy.)

All that being said, while it's true no compiling will be done here, I believe the comparison to the ports system was due to the auto-resolution of dependency issues, which is a fantastic idea.

adrian's picture

IE: it's a directory you can rsync or cvs update.
instead of downloading the entire thing in one file, every time, you just download the parts that change every time.

Also, unlike debian, we aren't inventing our own packaging format, that have to be served via our own repackaging system. We won't be distributing the actual packages.
We download the packages directly from Drupal.org either via cvs or wget, and then use the extra meta-information we have
to manage them.

So, like ports, we are just providing a map of what's out there, where to get it, and how to install (or 'compile') it.

Ports vs Yum

Rainy Day's picture

I’ve used ports under OpenBSD and MacOS X. I hate it. I like the concept, but hate how it builds practically the whole world for even the smallest of projects. It always seemed to me the ports system dependencies checker was seriously broken and always built stuff there was no need to build. That is a waste of bandwidth, CPU time, and HD space. I got so tried of it that i began to actively avoid using it. Haven’t touched it in at least five years now, and won’t in the next five if i can avoid it. Every time i hear “ports” i run away screaming! ;-)

While i generally prefer BSD to Linux, in my experience, Yum has Ports beat hands down when it comes to installing software. (Score 1 for Linux in this regard.) Both do dependency checking; both install software in the end. So both are valid analogies. The difference between the two is that Yum is much more user friendly, efficient, and far, far faster.

Yum is the better analogy because one does not compile Drupal modules. Yum checks dependencies, downloads and installs the package & its dependencies. This is exactly what Adrian wants to do. Ports, on the other hand, downloads source code, compiles and links. None of that applies to Drupal modules.

Be all that as it may, this is a relatively unimportant point and really not worth the time to debate.

This is not specifically part of the Aegir project.

adrian's picture

But it is something we will need to be able to do some of the things we need to do. Most of the components are already written, but they need to be put together in a sensible and usable way.

Our main problem is, that we need ALL the information upfront, to be able to make sensible decisions about where we can upgrade to. We can't even resolve dependencies cleanly because a module blindly depends on a module of another name, and we have no way to figure out which package actually provides that module. (say for instance you have a module which depends on views_ui, because the views_ui module is part of the views project, there is no way for us to figure out that we need to download the views package to get that module).

It would also be useful to know wether a module that has been installed, differs from the upstream package. IE: if patches have been applied to it, after the fact.

It's like BSD ports in that it's a directory structure containing meta-info that gets rsynced / cvs updated. Similar to how portage works. And PHP isn't compiled, so we will still be downloading source packages, they just happen to be compiled every time you load them =P

Executables

Rainy Day's picture

Yes, PHP code may (technically) be source code, but interpreted code/scripts act like executables. (Indeed they are executables, to the virtual machine of the interpreter.)

a note on install profiles

adrian's picture

this will also be built to allow you to install profiles.

ie: drush.php install profile-myinstallprofile --root=/path/to/drupal

and it will go and get all the right modules and themes for you.

Interesting project Adrian.

earnie@drupal.org's picture

Interesting project Adrian. I can see this helping out with Drupal hosting providers giving users a choice of installation purpose. Keep us posted when you need testers for the project.

Exciting work

briwood's picture

We need to be able to run multiple versions of modules across multiple drupal installs across multiple sites in each of those drupal installs, simultaneously, where as the other tools only need to deal with the currently installed, and currently available packages.

I look forward to testing your code. This approach will be a big improvement to my humble solution which leverages multisite and establishes module and theme "repositories" (directories) which are then linked into the sites. This strategy is meant to facilitate application of security upgrades. Patches to core/moduels can be applied to one core directory which is symbolically linked to from the sites. In the image below the symlinks are the green arrows. As you can imagine scripts are necessary to manage all the symlinks.

This was inspired by http://srhaber.com/talks/badcamp08_corecrazy.pdf

I'd actually be interested in seeing your code

adrian's picture

My code doesn't take into account avoiding the duplication of files across the filesystem yet.

I think it's unsound to copy the files around, and having the packages that you are actually using stored centrally makes a fair amount of sense.

I'll try to get you

briwood's picture

I'll try to get you something by early next week.

I'm still wrangling with my script. Here's what I'm attempting to do. Comments on my approach welcome!

script:

  • assumes a customer will install modules from tarballs in sites/[site url]/modules
  • bootstraps their site
  • uses modules/update/update.compare.inc: update_get_projects() to find module versions (D5 requires update.module.)
  • checks my repository (directory) for version of module
    • if found: replace module directory with symlink to repository
    • if not found: attempt to get that version of module from cvs.drupal.org and then symlink as above.

Software is never ready for primetime!

Rainy Day's picture

Why not just go ahead and post it now? Software is never “finished.” But we might be able to help you refine your script. As it happens, i’m working on a less ambitious bash script which simply downloads a module, archives the old module (if it exists), unpacks the tarball and archives (or deletes) it afterward. Far more humble than what you or Adrian are up to, but the point is folks here might be able to give you some feedback which will be helpful to you now.

Of course, YMMV!

Why assume

earnie@drupal.org's picture

Why assume sites/[site url]/modules? Most of my modules are install in sites/all/modules. Only those very specific to a site go to sites/[site url]/modules.

Alright, here's link_project.php so far

briwood's picture

I had to rename the tarball with a .txt on the end in order to upload it here.

It does what I say above, but doesn't create the symlinks yet.

Needs to be run from the root dir of a drupal 6 install. I can run it from the Eclipse debugger, but it's not working from the command line for me yet. (Bootstrap fails from the command line probably because it's missing environment variables. (See morgue.php). )

get_drupal_cvs.php preceeded this script. It's still useful on it's own. It does the cvs download to shared/modules/contrib (see image above). Also does a local svn checkin. This was adapted from a shell script by Shawn Haber.

Why assume sites/[site url]/modules

That's just the convention I've established for our hosting service. The above script can be tweeked....

If help is desired adapting this, let me know.

Edit: reuploading. Forgot something.
Edit: reuploaded.

SoC Projects

Rainy Day's picture

Adrian: Have you looked at this year’s SoC Projects? There are a couple version control projects, and a “Son of” Deadwood project. Thought you might be interested.

Here’s one of the version control projects.

I am sure everything is in the code

SLIU's picture

So I wonder if you can extract module (and function) dependency meta-info directly from the source code in addition to their .info files. Perhaps borrow some functionalities from the API module (http://drupal.org/project/api) or use Doxyen directly as a backend engine for efficiency.

PEAR?

Crell's picture

I spoke with Jonah Braun of Joomla fame at DrupalCon DC about something related to this. We meant to follow up on it but never got around to it. Basically there are already a dozen or so standards out there for specifying packages and version dependencies and such. We shouldn't keep defining our own. Rather, we should adopt, and encourage others to adopt, one of the existing standards or something very close to it. That would also allow a lot more project collaboration and the creation of a central index of PHP projects and their dependencies, as a sort of spawn of GoPHP5. :-) (That's where Jonah and I started the conversation.)

Possibilities for such a standard include Apt/Dpkg, RPM, and PEAR. PEAR already has this sort of infrastructure in place, and we should see if we could leverage it.

our issue is a bit more difficult than that.

adrian's picture

We can't make the assumption that there is only a single install of drupal on the server, and we can't make the assumption that only a single instance of a module is installed on a server.

you can have different versions of modules installed on different versions of drupal core on drupal-wide, site-wide and install-profile wide levels.

Also, when updating these, if there was only one instance you would need a way to automating the upgrade of all sites that use that.

So, perhaps pear/apt-get etc can be used to get the files on the server, but actually plugging it into the possible locations will need to be handled separately. All the packages generated will need to be able to co-exist with multiple versions too (like separate sites using views 2.6 and views 2.5), which drastically complicates the simplicity of apt-get upgrade.

Packaging

skwashd's picture

I am using aegir to manage a bunch of sites. We are using debian packages to manage our platforms. Our platforms are named [client]-platform-[svn-branch] with each package called [client]-platform-[svn-branch]_[svn-revision].deb. By using rerepro we can add a new platform to the server by calling sudo apt-get update && sudo apt-get install [client]-platform-[svn-branch].

The platforms are installed in /usr/share/[client]-platform/[svn-branch] similar to the convention for installing the drupal debs.

I hope to find the time to write some code to get debian's post install script queue the import of the new platform.

To a certain degree there will be roll your own package management solutions for mass hosting drupal sites as everyone's workflows are different. This way works well for us. Next is to add features to the mix to get rid of adding hacks to [module].install.

I plan to release my packaging scripts soon.

Aegir hosting system

Group organizers

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week