Automatically fetching translations

Posted by meba on March 5, 2008 at 10:09pm

Hi!

In D5, we introduced Autolocale module, enabling users to import translations after enabling module / during installation. However, this had few caveats, as timeouts on cheap shared hosts, etc.

In D6, Autolocale is built-in core, using batch API and just works very well.

What do you think next step should be? From my point of view:

"Let Drupal automatically fetch translations from a server".

Imagine an user, enabling a module. Drupal polls Drupal.org (or something else), downloads a translation, imports it. No need to download any package, packages for modules will be smaller.

This could be done for installer and/or contribs.

What do you think and how do you think we can do this?

Just my $0.02:
- Installer is a great place for this, contrib may not be (Translations are already packed with contribs)
- Gabor mentioned (at DrupalCon in Boston) that it may be done using Localization client. Is it wise? Any other approaches?
- I can imagine a very easy implementation using curl()/fopen()/xmlrpc() (having a failback) to query Drupal.org/... for a translation, download it, extract (how to do this without Archive/Tar.php?) and import.

Please post your ideas so we can eventually get this to D7 :-)

Comments

Posted by Freso-gdo on March 8, 2008 at 12:31pm

First, comments to the initial post:

It should always be possible to manually download translations. People may have security or other concerns with regards to having Drupal automagically speak with foreign sites - even if that foreign site is drupal.org. (Or they may be developing or using Drupal on a local machine or in a LAN without internet connection.)
I'd definitely say "yes" to having this in l10n_client - as this would both enable us to use it for Drupal 6 when the Drupal.org l10n_server launches, so that we can test etc. If it works flawlessly, l10n_client (or the client functionality) can be ported and integrated into D7 core. (IIRC, this has also been one of the long term goals of l10n_server and l10n_client, to be able to communicate with each other across the web.)
I think it would be fine if l10n_client/Drupal would look up translations as the strings were encountered in the interface, like it is now. That way, it will fetch the most fresh/latest translation, have the bandwidth usage spread out over a good while (and progressively less of it), and it would not fill up the database with strings that no one has used (and might never use).
Fetching translations per-string instead of per-package (module, theme, ...) would also let the downloaded translation "packages" be smaller, as common strings wouldn't be downloaded every time a package is installed.
I would definitely say to use xmlrpc(), at least if importing a few strings at a time (per the above). If we're importing entire core/module translations, it might be better to use curl() or some such and fetching regular PO-files. fopen() on remote files should not be assumed to be allowed/possible to use on a given server, as it could easily be disabled for security reasons with allow_url_fopen (one should always be wary of opening remote files). Also, if we're fetching text/plain PO-files, we wouldn't need Archive_Tar to handle the downloads.

Second, my own thoughts:

One thing I think would be awesome to have, is the ability to continuously keep your translations in sync. We would not want to override all strings that had been customised though. This could perhaps be made possibly with a moderation system, where new, "incoming" translations could be reviewed and either accepted as-is, accepted with changes (to fit the site's customisation), or rejected altogether. (This could also come with a setting to have all translations accepted before being applied on the site (if the site admin wants total control over what the site says) and with a setting to always accept new translations (for sites that use the official translation and doesn't mind the occasional term change).)
Having just seen the recording of Dries' latest "State of Drupal", perhaps we should be thinking RDF-like for string fetching? Or perhaps this is for D8. This is all very abstract in my head right now, and will need a wee bit longer to manifest itself properly in my mind, so that I can tell you (more) precisely what I mean. :)
The standing issues for full site localisation/translation (menu, site variables, ...) are probably a bit more important to get into D7 right now, even if this would be very cool. And we'll also need to actually have l10n_server produce a proper ("1.0 final") release.

--
Frederik 'Freso' S. Olesen

not hitting servers hard

Posted by gábor hojtsy on September 27, 2008 at 2:18pm

I fully agree this is a place we should move to. My main concern is to not hit on the server too hard. Having many servers for languages puts pressure on the individual maintainers, so having a central server gets off pressure from the individual teams (and also lets us fetch from one location instead of multiples with differing reliability). From then, we could have too broad needs (big file to transfer) or too specific (complex query on the server). The first is "All for this version of Drupal core", which would be a huge file, especially if we could not do compressed communication. The second issue is if we want something like "x version of module y, z version of module o", etc. If we do micro-polling, as Freso suggests, that would get less burden on the server per call but would continually hammer the server. Imagine hundreds and thousands of Drupal client sites calling back and doing micro-polling. In some cases, the extra baggage of the request might even be bigger then the useful data.

So anyway, I think we need to think hard about scalability before we launch such a system wholesale :)

Requesting diffs // Multiserver distribution

Posted by miro_dietiker on November 22, 2008 at 1:29am

I'm very interested in a clean distribution workflow of translations as i'm working on pages that use 3 languages and more.

Apart from all other ideas in my mind for a perfect translation distribution, i think there are a few key features to provide best results:

For automated processes we should be able to define a default translation server address and optional server addresses per language.
Thus companies may run their own servers to distribute own (modified?) translation among networks..

The incoming queue is a nice tool as an option. For collaborative confirmed strings from the trusted translation server we should always be able to accept translations automatically. This way completions of translationy autocommit to endpoints in a perfect way.

A client who is looking for updated strings could use its last lookup date as a submit param. I'd define all types of requests as "diff for project X regarding date Y". This way, a single request could be answered very efficiently (e.g. no changes!) and updates could be transferred in a minimalistic way. Servers would get a minimum load. Another way could be a (precalculated and dumped) filebased exchange as package managers of debian (dpkg) currently does.

BTW: I'm argumenting for the capability of own intermediate servers because original translations are not always fitting perfectly customers' needs. If a customer wants e.g. term "member" instead of "user" in a specific language, there are pretty many individual strings. As of my understanding a perfect translation server should also be able to pull translations from a parent server, being able to modify it and redistribute it again. The way upstream would also be an option - to commit new translations to an optional parent translation server.
Possibly a single core server (drupal.org) to support user specific project based overrides would be an option too. But this starts to become very complex and the multiserver approach would make many things much more simple.

What do you think?

Would you consider a commercial solution for translation?

Posted by icanlocalize on February 27, 2009 at 11:13am

I know that the natural tendency with open source systems would always be to use an open source solution, but I'd like to mention our commercial solution for translation because I truly think that it can help in this situation.

We're developing a system for Drupal translation. It's a commercial system and we're making our living from it. The system handles content translation. It does everything you (and your clients) need including intelligent content change detection, collaborative translation and other critical features.

The idea is to allow Drupal sites that require content translation to run without any effort on the side of the admin.

The drawbacks of using a closed-box solution are easily balanced by the advantages of having a very committed group of developers supporting and developing the system on full time.

Although it's a commercial system, we're offering it for free for open source and not-for-profit organizations.

Would it be an unholy thing to talk about commercial systems in this group?

Agreed: Multi-Server and diff sounds fine

Posted by tirsales@drupal.org (not verified) on February 27, 2009 at 8:48am

@miro_dietiker: I would prefer the usage of "translation version" instead of "translation date" , as a comparison of versions is somehow easier to implement / maintain, though dates would be possible too. An alternative would be the usage of repository-servers (e.g. CVS) for translations, commits could be done automatically (e.g. translations are registered with a server, new po-files are created automatically) as this would ease the usage of existing APIs.
Allowing multiple servers? Perfect. Not only would this allow "user changed translations of modules", but it would also ease the administration of e.g. menu-terms and the distinction between "translator" and "technician".

Automatically fetching translations

Comments

Comments

not hitting servers hard

Requesting diffs // Multiserver distribution

Would you consider a commercial solution for translation?

Agreed: Multi-Server and diff sounds fine

Translations

Group organizers

New groups

Group notifications