Translation tools project update

Events happening in the community are now at Drupal community events on www.drupal.org.
gábor hojtsy's picture

As Drupal 6 feature freeze is in, and part of my SoC involvement was about making D6 better, now is a perfect time to look back on where I am. I was set out to provide solutions in the following three areas as part of my SoC project: NEW TOOLS FOR DRUPAL USERS, NEW TOOLS FOR TRANSLATION TEAMS and WEB BASED TRANSLATION TEAM SUPPORT. Let's look at the details for each:

NEW TOOLS FOR DRUPAL USERS
- New packaging scripts for core translations to create a folder structure similar to Drupal core with translation files separated per functionality
- Automatic translation import functionality for Drupal to import these files only when needed, without further user intervention
- Batch processing of multiple translation imports

Apart from packaging scripts (which was imagined to be in project module), most of the desired core improvements are in Drupal 6. The important changes I implemented and got included in Drupal 6 as part of SoC:

  • install profiles, modules and themes could have a 'translations' subdirectory, where PO files are located
  • install profiles use their PO files to display the installer interface in the available (selected) language
  • the installer imports all PO files for all modules and themes enabled by the profile
  • modules get their PO files imported for all the enabled languages, when a new module gets installed
  • themes get their PO files imported for all the enabled languages, when a theme gets enabled
  • when you add a new language, all enabled module and enabled theme PO files are imported for the new language

So much for automation :) Unfortunately we cannot remove translations when a module or theme is disabled, due to how the locale data is stored (we don't know whether a string is shared by multiple modules, or what module uses it at all). We remove all translations of course if you remove a language altogether. Yched implemented the batch processing with myself being in close cooperation, so now the update system also work with the batch API.

There were some performance and code cleanup improvements in locale module too. One very important improvement I devised and got comitted is the ability to collect the exact list of interface strings used on a page. This (among other things) allows a contributed module to build a JSON array of strings and provide a simple in-place translation function to translate parts of the interface on the fly, while viewing the page. There is no such contributed module yet.

(There were lots of other language improvements in Drupal 6 of which I was part of, but those are not strictly related to my SoC project).

The other two big topics were:

NEW TOOLS FOR TRANSLATION TEAMS
- Decouple translations from module and theme projects (implement CVS move scripts, automatically create new projects)
- Relate translation projects to module and theme projects (as well as the Drupal core project itself)
- Monitor and display translation status of the Gettext files used
- Improve packaging scripts to allow compound downloads of projects and related translations
- Provide a service to automatically generate translation templates, so development project owners would not need to even think about translation team support, it would be automatic

POSSIBLY: WEB BASED TRANSLATION TEAM SUPPORT (if time permits)
- Web based translation tool.
- Import/export for familiar tools for experts.

After lengthy discussions with my mentors (see my earlier big post for a sample of that), checking with reality and also with Dries, it turned out that we are better off with forgetting CVS as a backend for translation teams. Decoupling the PO files from contributed modules and themes is important, because otherwise the translations are always out of date, releases are not coordinated, templates are not generated properly, and so on. But solving that with putting the CVS and drupal.org project management burden on translators would skyrocket the complexity and barrier-to-entry for translators.

Luckily the web base translation team tool points to another direction, so we looked into possibly providing a web based editor on top of CVS branches, tags, releases and drupal.org projects. That theorized possibility turned out to be a possibly huge mess, so my SoC project was repurposed to the following points:

IMPLEMENT WEB BASED TRANSLATION TEAM SUPPORT
- Web based translation tool with organic groups
- Automatically parse core and contrib projects for translatable strings
- Relate translations to projects, their releases, the releases files, and the files lines
- Monitor and display translation progress status
- Import/export possibility for translation files and templates
- Packaging solutions for core and contrib project translations

This new focus covers all the "end goals" set out earlier, but solves them with a web based tool, not a CVS and project+release based backend to reduce the barrier of entry and programatically support sharing between different project translations.

Interestingly Bruno Massa started some preliminary work on a poof-of-concept tool, which solved a small set of these goals, way before my Summer of Code project even started. When I described the above grand plans to him, he handed over the existing modules to me, so the now l10n_server and l10n_community modules are aiming to solve the above problems (but more modules could possibly appear as part of the abstraction process).

To make the above work well, reusing existing code, I needed a "string extraction API". The potx project which grew out of the old extractor.php script was not up to the task, so I spent a great deal of time refactoring the code, which resulted in a reusable strings extraction API. To be able to import and export gettext data from non-locale module tables, the core locale API also needed to be modified. As the l10n_* modules are developed with Drupal 5, it was not an option on that version, but looking further, I tried to implement a "gettext API" for Drupal 6 in core. Unfortunately feature freeze craze was close at that time, so only the export API was finished and comitted, the import code is not reusable still to be able to parse PO files independently of locale module processes. That said, a lightweight gettext API will be needed in the l10n_* projects.

After building up the API groundwork, my focus shifted to drupal.org integration questions. I am in heavy talks with Derek Wright about how the l10n_* modules could get information about projects and releases from drupal.org. Unfortunately the current XML based interfaces are not immediate fits, so we are discussing alternate solution possibilities.

I'd be interested in how the project metrics SoC project is going, as our locale data will be a valuable metric to look at, so we can collaborate on integrating the two data sources somehow.

The biggest challenge as it seems so far is packaging integration with drupal.org. "Unfortunately" (from this point of view) once a module/theme comes out with a release, it should have an untouched tarball with the code and associated files. Because projects modify their interfaces until the last commits, translations happen to be ready after the release. This makes it impossible to package translations right into the release tarball (this is one of the strong reasons to decouple translations in the first place). But packaging translations per project per language for download would be rather confusing and inconvenient to handle on the user (who downloads a project release, and needs translations for it). Technically, packaging will not be a problem. What seems to be tricky to figure out is how to make it simple for people downloading translations.

Currently I am working on "input integration" with the drupal.org project infrastructure, so we get the projects, releases, files and lines for strings. Then I plan to shift focus to making the collaborative translation interface much better (at that time with input from the translators), and finally, the packaging questions will need to be tackled.

If everything goes well, the resulting modules will run somewhere on drupal.org (possibly a subsite), and translators will forget that they ever needed to deal with complicated CVS things...

Comments

dww's picture

Once this whole web-based system is in place, and we solve all these problems, I wonder if we're going to completely purge the existing po directories and all their contents from the contrib CVS repo. Seems like we probably should. Keeping that stuff around seems like it'll only lead to confusion. In fact, I think I'd be in favor, after the purge, of re-packaging all the releases on d.o, so that things don't have a weird mix of the old translations and the new.

Not sure this is the best place to discuss this, but reading about how "translators will forget that they ever needed to deal with complicated CVS things" made me think about "CVS will forget that it ever needed to deal with translators". ;) For example, how translators are listed on the "developers" for a given project, etc, etc. If we're going to move out of CVS, then CVS should just be for developers. E.g. we should revoke the CVS accounts for people who only do translations, etc, etc, etc...

Anyway, we certainly don't have to decide this now, but I at least wanted to raise it to get us thinking about this in the medium term...

Thanks for the great write-up!
-Derek

throwing the CVS translations files out

gábor hojtsy's picture

Yes, I think it would be best, if no more translation files would be stored in CVS. The web based system might not be able to record a complete history of all the changes done on translations, but I definitely intend to record historical information, suggested translations for untranslated stuff and new suggestions for translated stuff, so to help translators collaborate and make decisions.

The system will definitely be unable to import the CVS history for translations, that is how life goes. But we should by all means import every translation already done at the time when the system goes live, so we have a huge head start. Lots of contributed modules reuse many strings used elsewhere (like "delete", "operations", "Save" and the like), so with the string sharing system in place, suddenyl lots of contibuted modules will be further with their translations just because other module translations help them too.

Translators (working on translation only) could get their CVS accounts revoked too, I am not sure this is easy to accomplish. Maybe we would grep the commit table for non .po file commits??

Localization server

brmassa's picture

Derek and Gábor,

nice!

is it not better to post a news on drupal.org main page telling about the new system and its consequences: about no-translators-on-cvs (promoving a bottom up update instead a top-down cvs revoke), strings reuse, etc?

instead repacking the entire drupal.org, i believe its better to make the client more intelligent, not importing po files if some conditions were not satisfied (drupal < 6, release date < xx/xx/xxxx, etc). i think its kind weird to download a new package that has no differences. or have two packages with same name but they are different.

regards,

massa

PS: gábor, i thought you were against sharing multiple "Submit" strings...

is it not better to post a

gábor hojtsy's picture

is it not better to post a news on drupal.org main page telling about the new system and its consequences: about no-translators-on-cvs (promoving a bottom up update instead a top-down cvs revoke), strings reuse, etc?

I don't think it is time yet. The new system is far from being complete or usable. There is lots to do (project integration, proper plural forms support, saving history, making the interface usable, finding a place to host). I would not say people to stop their work on translating with PO files, we will import the exact state of the CVS translations when the new server goes online. While we are getting closer, I imagine we can have translators test the service, which would be nice to get feedback on how useful it is.

instead repacking the entire drupal.org, i believe its better to make the client more intelligent, not importing po files if some conditions were not satisfied (drupal < 6, release date < xx/xx/xxxx, etc).

I don't understand what you mean by this. A module not released for Drupal 6 will not be able to run with Drupal 6, while Drupal 5 does not import PO files. So I don't understand what you are pointing to here.

i think its kind weird to download a new package that has no differences. or have two packages with same name but they are different.

Indeed, this breaks the unmodified releases rule. Future project releases will most probably not include the PO files by default, as they are ready after the release by their nature (think last minute string changes, and previous dev releases not being available on the same branch, so source strings only available at the time of the release).

Translations for specific releases

Freso-gdo's picture

I might be a little tired, but I couldn't figure out whether you intend for the system to be able to handle translations for specific versions of core or contributed modules/themes/installation profiles - ie., whether it will be able to work on both a translation for my_module @ DRUPAL-6--1-2 and my_module @ DRUPAL-7--0-3 at the same time.

--
Frederik 'Freso' S. Olesen

yes (and no)

gábor hojtsy's picture

Yes, the current implementation (and the plan) is to be able to work on translations for all releases of a project (whether it is Drupal core or a theme or a module or an install profile does not matter). When such a release is a moving target (such as Drupal 6.x-dev), the translation status could change as often as the new "release" files are pulled from drupal.org

The answer could also be "no" however, because the initial plan is to share all strings between all projects and all their releases. So sharing is about to be enforced in the system, which makes it very comfortable to bring indentical string translations over from previous releases and other projects. But it also makes it impossible (by UI design) to translate the same string differently not only in certain projects but in certain releases too. The database design allows for such a differentiation, but I don't plan to build this into the module at first (I doubt this would be needed).

Translations

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: