A controversial(?) point: store translations as nodes

Posted by gábor hojtsy on November 7, 2006 at 1:56pm

A striking similarity of all examined content translation modules is that they store translated content as separate nodes. Some of the vocal members of the community expressed their concern about this solution, refering to 'content duplication' as a problem. I suggest that we should store translations as their own nodes, but it does not mean we would not be able to solve any of the problems raised. Let's see what are the disadvantages of storing translations as nodes, and what do we get as an advantage.

Disadvantages

Translation nodes have their own web address. » I count this as an advantage, I can bookmark a specific translation.
Content translation workflow is combersome if you are adding some simple node (which could have the translation fields on the same page). » We can provide a contrib module for this, if this is really needed.
Duplicate content happens. If you add a poll in different languages, votes on those poll will be counted independently. If you add an event, which should have its date and venue information shared among translations, it is not possible. » This is not so simple, see below.
Menu item, taxonomy, user and other associations are only possible between a specific node and a specific menu item/taxonomy term/user. » We can provide a syncronizer, see what tr.module does by looking at my analysis.
Your turn! What other disadvantages should we keep in mind?

Advantages

You can have different URLs for different language content, it can be bookmarked, cached, even indexed in search engines, so you get more traffic.
Permissions! It is possible to give a group permission to translate to French, and let the others only see their work.
Workflow! The workflow/actions modules are bound to nodes, so you can provide a workflow for translations, you can have states (draft, in review, proofread, published) for your translations. You can also have custom actions, like publishing nodes only when certain translations are in the published state.
Views! Views deal with nodes.
Visitors to your different translations are counted separately, they can comment in the language of that node.
You can associate different entities with translation nodes (ie. related links in that language).
Your turn! What other advantages should we keep in mind?

The matter of fact is that there is really no option now to store translations of node text within the nodes themselfs. The idea popped up that we should investigate working our way to supporting a tree-like revision system, so that we can store translations of nodes in that tree. The problem is that we loose all the above advantages in this case. So after thinking about this approach deeply, I think we should keep translations as nodes. Every day more modules appear which add more features to nodes, we should take advantage of this.

So how do we solve the disadvantages? The biggest one comes from the fact that there could be multipart nodes, where some of the parts are language independent (ie. scenery image in a tourist article to be translated, date in an event, etc). The easy answer is that this is not a Drupal core problem, Drupal core has no multi-item content types. Of course this is an arrogant standpoint. I think Drupal core should have a language code associated with a node (or a special "independent" language code, if that node is not bound to a specific language). But we should not expect that the whole node is in a specific language. Contrib modules should have the possibility to specify language of their items which build up their nodes. A node would always have a language associated even if it shares some of its building blocks with other nodes, since the node title (and sometimes body text) is in some specific language. It is the responsibility of a contrib module when building up a node on display to grab the proper items for that node.

OK, ok, but I am interested in the user interface

As you can see in my report, there could be very different interfaces built on top of this (admittedly very simple) storage approach. These interfaces can even be exchanged by each other, if you wish (and if these have their own modules). We can provide an blogger api extension to submit translated versions of nodes for example.

As far as the Drupal web interface goes, as I see, the current i18n module interface for content translation is quite successful. Even the localizer development effort lifted that up. What do you think about this user interface? Do you have better ideas?

We are NOT talking about taxonomy, menus, URLs

How these nodes fit into the Drupal URL system, taxonomy or menus is NOT the topic of this discussion. We should not try to solve all questions at once. I beleive we will be able to solve menu, taxonomy and URL related problems with this node based translation system, but I would not lik to go into details on this front yet. Only raise points in these areas if you strongly beleive that we are going on the wrong path.

Try to focus on content language association!

Comments

like the revision system

Posted by moshe weitzman on November 7, 2006 at 8:49pm

i think you are onto a good path with your discussion. consider the revision system. a module can choose to care about it or not. book.module chooses to care about (note that the book table connects to a node with vid and not nid) but term_node does not (it connects with nid and not vid, and thus terms are not revisioned). we could so the same with language. contrib modules that care about both will link on vid and language. in your example of event module, the time info would not care about lid but would probably care about vid.

so yes, in this scenario, language should be a property of a node and should be in core.

seems like a promising avenue to explore, anyway.

I am not sure we are talking about the same avenue

Posted by gábor hojtsy on November 8, 2006 at 8:09pm

What I say is that node should have a lid in the node table. There should be some connection table which connects nodes to each other. This does not give you a super-nid inherently to the node collection you are referring to, and even if you would have one, that is definitely not going to replace the nid. Nids are unique ids and they should stay as is IMHO. So if you query a node in the table, you will get the nid and the associated lid too. You can only know the other language variants by requesting them via an API function or from the connecting table.

The event module use case maps to this by having nodes for each language translation, with node components translated. Some of the node components will be related to all translated nodes (like date or venue information). So this does not seem to be the approach you are talking about (that we would refer to some super-nid which idetifies all translations of the content at once).

sorry, my bad

Posted by moshe weitzman on November 9, 2006 at 5:23am

i was really not thinking clearly when i wrote that. thanks for setting me straight.. your proposal for a nid => nid relationship table sounds good and is a long time request for drupal. we have a relationships group on this site. i just cross posted this thread there.

indeed

Posted by gábor hojtsy on November 9, 2006 at 8:36am

Indeed, my next discussion point would be to look into implementing that simple relationships API in Drupal core. A next generation book module and the i18n module would instantly fit with it. I will post this up for discussion on the devel list.

Good point about duplicated poll results...

Posted by webchick on November 8, 2006 at 3:03pm

I hadn't considered that angle before, and unlike the rest of the disadvantages I can't immediately think of a way to fix that problem.. Hm...

Though I suppose we will need some means of specifying a "this node is the same, just in this language" relationship. We could use the presence of that relationship to dictate where poll results should be stored. Whatever the first version of the node is becomes the "primary" translation; all others feed into that.

Anyway, I'm fully behind you on "translations should be nodes." The advantages you listed far outweigh the disadvantages, and contrib modules can go a long way towards customizing the end user experience.

poll results

Posted by jax on November 8, 2006 at 3:24pm

Ideally the option to share results of the polls between languages or not would be a feature of the module.

Slightly OT but maybe it should even be optional to support i18n for a module. Now modules only can support 1 kind of database, ignore versioning and even ignore the "t()" function but still work flawlessly. This keeps developing a basic module easy and is IMO one of the reasons there are so many.

this is the idea

Posted by gábor hojtsy on November 8, 2006 at 8:02pm

Well, since Drupal needs to work without locale module enabled, it will certainly work without i18n features enabled. All content will be considered English (or "unknown language", this will turn out). We have quite a few people to remind us that performance is key, and i18n should not have a noticable negative effect, if you are not using it.

Share poll results

Posted by Roberto Gerola on November 9, 2006 at 7:48am

I have done it in localizer.
I have patched the original poll module adding also an option to share the results between the different languages.

--
http://www.speedtech.it

Translations as nodes

Posted by Anonymous on November 13, 2006 at 10:14am

The thing is that using different nodes for translations fits nicely into Drupal object model + has some additional features.

For the other option we'd need to rework the whole Drupal + it is more restrictive + it will add some important overhead for single language sites.

About duplicating content, I think the main concern are images, files etc... But these are really linked objects, not nodes so this shouldn't be the issue as we can have one file linked to more than one node -which btw I discovered recently so I've implemented it into latest i18n using file revisions-. What we need hear is better core support for linking files to multiple nodes, nothing else.

For the polls case, I think this is more a feature, you can have results by language + an aggregator to show the totals would be quite simple. Same for statistics, wouldn't you like to see also results per language? For all these cases it's an interface issue. The data is there so we only need some aggregation.

Anyway, multilingual fields could be implemented also into the CCK, for the cases where they are needed. And this should fully compatible with having different nodes for translations for some other node types.

But really, my main concern is that we have now a nice simple content object model. There are content objects, nodes, which are the main building blocks and have a number of relationships and properties, like author(users), workflow, terms, etc... If we go for multilingual nodes and then we want to have different author, workflow, terms, etc for each translation we'd need to do away with all the current model for something really more complex, and worse, we'd be loosing the 'node' as the basic content object everything else relates to.

About some of the 'disadvantages', why not look at the problem the other way around? Say we have different nodes which are different translations of the same content and we want to do something with all them at the same time -like an url pointing to the translation set- or promote all of them at the same time. Then all we need is some new 'translation' node type which links to all of the translations and selects the right one on browser language, or promotes all of them when promoted, etc... But this would be a new content type -or pseudo content type- which can be handled by a single module, no need to rework everything, and is really only about user interface.

Field-oriented instead of node-oriented?

Posted by rellis on November 30, 2006 at 9:12pm

Someone pointed me to this discussion. I wrote the tr module
for a project I was working on in 2005...

In retrospect, I think storing translations as separate nodes is
not such a good idea. The problem of syncing translation nodes
is a serious one -- you have to do that for any module data associated
with the original node, e.g. dates, categories, flexinode fields,
etc. -- it makes things less manageable/extensible, and it works
against Drupal's node-centric design.

Instead, with the forms API and form submit callbacks, you could
add translation fields to edit forms, save the translations,
and then with nodeapi 'load', switch in translated text for just
those fields...

Of course, you also want to be able to filter lists of nodes by language,
but you don't actually need separate language nodes to do that, just a
record of which nodes are translated into which languages -- something
to filter against in db_rewrite_sql().

For translated menus and translated category names, the same approach
of form_alter() + switching text would work. It would require
patches to add hooks to switch the text, but just little ones. :-)
Per language url aliases would be trickier, but still probably easier
without multiple translation nodes.

...

After thinking about this, I got interested again and started working
on a translation module for 4.7 that takes the field-oriented approach.
It's basically working. UI needs fixes/comments, but I could put
it up somewhere for reference...

Rob

what about other node level features?

Posted by gábor hojtsy on November 30, 2006 at 9:46pm

What about associating different workflows with different translations of a node? Giving permissions to different working groups to translate? Possibly putting translation nodes into different menus (other language subsite parts of the site)?

You had very good ideas implemented in tr module, and it is still very inspiring as far as I see. The field based i18n approach might work well for custom content type modules in contrib, but as far as I see, we loose too much, if there are no phisical nodes for every translation on the content management front. Drupal is so much based on nodes.

Feel free to put up that proof of concept code somewhere, I am very interested!

what about other node level features?

Posted by rellis on December 1, 2006 at 1:40am

I'm not sure about workflows...

You could have separate forms and separate permissions for translations
without having separate nodes. In fact it might be cleaner, because those
forms could just have the text fields that need to be translated.

I put my new module code up:
translate

It's not much tested, and only tested on apache2+php5+postgres.

And it does require some small patches. Patches are for DRUPAL-4-7-4.

thanks for the code

Posted by gábor hojtsy on December 1, 2006 at 10:17pm

Some of the stuff in this code looks nice (i have not been able to review the code all at once). I am happy that you are at this issue again!

Can we clean up this concept, so other's don't need to rescan the code to grasp how it is done? What if we have different authors sending in translations of the same node? What if we need to store different revisions of the translations? What if these different authors only have permission to create their language translations? What if I have a node published and some translations work in progress (not published) at the same time?

Text-oriented translation

Posted by rellis on December 3, 2006 at 10:54pm

Can we clean up this concept, so other's don't need to rescan the code to grasp how it is done?

The basic idea is that you have two URLs

http://example.org/en/1957
http://example.org/fr/1957

And both load the same node, e.g. 1957, but when the french
version is loaded, the title, body and teaser text are
switched to the french text. This is a simple thing to do
with nodeapi 'load'.

The question is how do you maintain the translation text.
The example module just adds per language field sets to the
regular node edit form, which is easy... but there's no
reason it couldn't be more sophisticated -- e.g., maybe a
separate translation text edit form, similar to a comment
edit form... or whatever.

The main point is that doing it this way, as opposed to
switching between different translation nodes, is that you
don't have to sync non-text data that's associated with the
original node. You just change text, which is really what
translation is about. E.g., the title of an event changes,
but it still happens on the same day, and it's still in the
same category, etc.

Obviously, keeping translations as separate nodes has some
immediate advantages in terms of managing the translations.
But I think this is a bit of a trap. Duplicating nodes is
NOT in line with Drupal's design, which clearly would keep
text translations simply as additional attributes of a node.

translate module

Posted by Roberto Gerola on December 4, 2006 at 9:25am

I've tried your module and briefly studied the code.
Translate module implements some very good solutions and ideas.

Translating only the necessary parts of a node has many advantages, I agree.
I see only two problems yet :

different languages need, most of the times, different attachments, but I think we can
easily solve this issue adding a language attribute to each attachment
support for contributed modules that extend node base type
Perhaps it could be solved providing some hooks that the external module must implement

Minor observations :

I think that the translation block interface should be separated from the main node form
and inserted in a dedicated tab.
In this separated interface will go also the translatable fields of contributed module.
More general : the support for non core modules, like taxonomy , should be , IMHO,
put out from the main module and implemented in a dedicated submodule.
Eventually, in Drupal 6, these external features could be easily incorporated in the
main module.

I am not sure that having only one table to manage all translations is the best way,
also for performance issues, but I agree that having only one table dramatic reduces
the management effort and offers a ready translation basic support for
contributed modules.

Thank you for your work.

Roberto

--
http://www.speedtech.it

translate module

Posted by rellis on December 4, 2006 at 4:37pm

Thanks Roberto. I agree...

UI:

a separate revisions-like tab for translations
a separate extensible translation edit form

DB:

table structure needs work

The single table at the moment just stores serialized
arrays of translation fields indexed by id and type.

I was thinking it would probably make sense to break
out node translations into a separate table.
Maybe {node_translation}, {title_translation},
and then a generic {text_translation} for
non-title module text?

You'd also need {node_locale}, or add a field to {node}
to store the node's base language.

For efficiency, it would be nice if db_rewrite_sql()
supported additions to the SELECT clause, e.g.

$sql['select'] = 'tr.title as title_' . $lang;

I don't know how hard that would be to implement,
or if it's been discussed before.

...

Separate modules:

request language determination + node translation
+ optional per language comments ?
menu translation
taxonomy term translation
per language url aliases
per language site settings (site_title, site_mission...)

Rob

request language determination

Posted by gábor hojtsy on December 5, 2006 at 12:43pm

The request language determination you mention is closely tied to the locale module too, so it is better off in core or at least not tied directly to any translation functionality. I might need a gallery site (no multilanguage content) with different interface languages, detected from the URL and/or from browser preference.

interface locale vs content locale

Posted by rellis on December 5, 2006 at 3:25pm

I agree, language determination could be a separate piece for core.

Note that there is a difference between determining the interface locale
(which can be set by user preference), and the locale for switching
content. They're close, but someone with a german interface might
want to look at french translations.

maybe

Posted by gábor hojtsy on December 5, 2006 at 5:35pm

It can happen that someone would like to view french content on a german interface, this was said several times. (Although I don't see the use case, many people repeat this, so be it). Regardless of this, somehow we need to identify the interface and content language to present when someone gets to a page like the homepage or a taxonomy page when there are multiple language nodes classified under one term. Then the browser language detection, the URL specified language and similar stuff can help a great deal in both interface and content language decisions.

Use-Case: Norway

Posted by pkej on January 31, 2007 at 3:42pm

Content can be in no-nb and/or no-nn. I would prefer my interface to be in no-nb even when reading no-nn content. As I said in a separate post, in most cases I would like to see both no-nb and no-nn content at the same time, but never no-sa (northern Sami) content.

In some browsers you can select languages you wish and set the sorting order of these. Why not let interface = top priority language and content served be in all languages the user wishes to be served.

For example I'd be perfectly happy to read Danish, Swedish and Norwegian (nb and nn) content (most Norwegians would understand most of Danish and Swedish written content), but I'd prefer to have my interface in Norwegian (nb).

Paul

how would you solve the limitations of this concept?

Posted by gábor hojtsy on December 5, 2006 at 9:58am

Since you are not answering my questions for whatever reason, let me try to pose them another way. It seems like Jose and you are quite far away from each other, and represent very different concepts.

Rob Ellis: let's store translation fields under the node. This way all translations need to be under the same taxonomy term, need to be published all or none at the same time, could only have the same author, will have mixed language comments, etc. Or we need to make taxonomy, comments, views and every single module/feature which connects with nodes aware of the language, so there is not enough to connect to nodes anymore. The inner data still limits all translations to have the same one author, the same one published, promoted, revision handled, etc flags associated.

Jose A Reyero: let's store translations as different nodes and have different taxonomy and menu trees per language: different translations of the same node could have different workflow, different authors, individually published or not, individually revision tracked, can have individual comments, can be under different taxonomy terms, can be under very different menu items, have rss feeds by definition. Polls can be tracked per language, page views can be tracked per language, permissions can be given per language. These are all a given in the current Drupal infrastructure, no need to change everything and make everything language aware if it does not need to be. Drupal modules know about nodes, and provide a tremendous amount of features for nodes. Drupal modules does not yet know about languages under nodes.

Gábor Hojtsy: let's store translations as different nodes but do not allow associating different terms and menu items in core just yet. We can still syncronize the shared fields if need be. (By the way, this is exactly what the often praised Plone does, if anyone is interested. If you turn off the i18n supporting module, everything will be in place and useable, the data will not be hidden somewhere down the data store. Don't misunderstand me, I have been surprised myself a few days ago to see that Plone does this, I am not advocating copying Plone.)

Maybe I am missing something?

Ps. I have tested your module, and on the way comitted a fix to the install file.

Limitations...

Posted by rellis on December 5, 2006 at 3:14pm

Rob Ellis: let's store translation fields under the node. This way all translations need to be under the same taxonomy term, need to be published all or none at the same time, could only have the same author, will have mixed language comments, etc. Or we need to make taxonomy, comments, views and every single module/feature which connects with nodes aware of the language, so there is not enough to connect to nodes anymore. The inner data still limits all translations to have the same one author, the same one published, promoted, revision handled, etc flags associated.

Sorry, I wasn't trying to evade your questions. :-)

= author/approvals

Translations could be more like comments -- associated with a node, but
with different authors, approvals, etc. It would require writing that
interface, but that's maybe not an overwhelming prospect -- both
comments and revision management offer models.

I.e., translations don't have to be simple text field sets with no other
attributes -- I was just doing that because it was easy...

= comments

You don't need to have mixed language comments. If the comment module
just wraps it's db call for comments in db_rewrite_sql, you can filter the
comments for the current language (or, optionally, not) -- see the
patch/example in the 'translate' module.

= taxonomy

Categories would be the same for all translations of a node, but you can
translate the term names.

If different translations really need to be in different categories,
maybe they are different objects anyway. There's nothing to stop
you from creating separate nodes in different languages.
Or separate menus, taxonomies for that matter.

There's a blurred line between translation and multi-site
management here.

= views

I'm not sure about views.

Rob

nodes already do what we need to do

Posted by gábor hojtsy on December 5, 2006 at 5:53pm

You propose storing translations "like comments" as an example. Then you have authors associated with translations. You completely lost permission management, workflows, revisions support etc. All features of nodes. Of course you can replicate it for supporting different authors, approval mechanisms, but this is already done fine. For nodes. If we are not storing translations as nodes, we loose all these features. Or we need to adapt all node related modules to be able to work with nodes and translations too. See what you needed to patch to migrate away from nodes being indetified by nids to nodes being identified by nids and a language code. Since there is significant work done to convert taxonomy to use nodes (category module), to make user profiles to use nodes (usernode) to leverage this tremendous amount of feature set associated with nodes, I don't think this is the right time to introduce another node-like concept for these extensions to work with.

There's a blurred line between translation and multi-site management here.

Yes, this is the main problem. I18n is somewhere between single site and multisite management. Everyone want different grades of structure sharing. Some want different taxonomies, different menus and different nodes edited by different translation teams. Some want a simple blog where she can post any number of nodes in any language, and they will have no translation in any other language for sure. Some don't even want to think about i18n, and would like to see Drupal all clean without i18n related query slowing stuff injected into the code.

The question is how to support as much of these grades as possible in core and in contrib modules, without patching core. We would like to introduce patches for being able to support some of these grades with contrib modules, and some direct functionality to support a subset in core. The bottom line is that Dries would like to see some "flexible enough" solution in core which supports the "all sharing" or "non-sharing" grades too, because this would be a differentiating factor (a very nice feature) for Drupal.

To be clear: I don't think syncronizing nodes on "node save" is a nice solution, neither I think duplicating node functionality in some "translation" object type is a nice solution. I do think that syncronizing nodes on node save is still less hachkish, since it still leaves all node related features and extensions intact. If we can reuse existing features, then why not?

nodes vs nodes

Posted by rellis on December 5, 2006 at 7:16pm

Right, I have been away from Drupal for a while, and so am
probably talking too much, will stop soon...

It might be that comments should be stored as nodes too.
Taxonomy terms as nodes, everything as nodes, I am ok
with that.

What I think is a bad idea is switching between nodes
for display -- i.e., where the translation node for an event is
another event node. The management of data other
than text in that case is ugly.

Maybe store translation text in nodes of type 'translation'...?

Rob

talking is not a problem

Posted by gábor hojtsy on December 5, 2006 at 8:20pm

Right, I have been away from Drupal for a while, and so am probably talking too much, will stop soon...

It is not a problem if you me or anyone else talks too much. In fact I talk here more than anyone else :) So I would deserve to be shut up if this would be a problem. Please don't stop talking, and especially don't stop coding Drupal :)

It might be that comments should be stored as nodes too. Taxonomy terms as nodes, everything as nodes, I am ok with that.

I am unsure if Dries wants this, but the development community is heading in this direction as far as I see, so sooner or later this will probably happen. Not yet.

What I think is a bad idea is switching between nodes for display -- i.e., where the translation node for an event is another event node. The management of data other than text in that case is ugly.

Dries indicated that he sees that we need some solution in core. What I am trying to point out is that I think we would need a forward thinking solution there. Your solution of storing translations componenst under the node level locks users in that mode of thinking. It is a nice solution for the problem when you don't need to have different authors, you are fine with publishing all translations at once, etc. It is a very nice solution, and even nicer concept if you have modest needs.

But if you would like to go forward, you need to migrate your data to a different model (if a migration tool is available). You cannot switch to a different model without migration (nor can you reuse functionality as already pointed out). You cannot switch back if you need to. This is why I think that this solution is not for core. If we can support this kind of solution evolving in contrib with some core patches which will get accepted, then I am completely fine with it. But I don't think that a solution which locks the user into a rather limited i18n solution is not for core. Dries indicated that we need a solution which scales for different needs.

Reusing nodes for node translation brings us lots of features. It is ugly to manage shared fields, I second that. But this approach is instantly capable of managing different workflows, permissions, status for nodes. It is possible to scale it back, and it is possible to use the full feature set if need be. (eg. Plone enables you to edit shared fields only on the original content, but copies the new value around to all translations, so if you disable the translation component, you will still have a fully functioning site). With my current thinking, it does involve ugly details, yes. I hope we can reduce some of the uglyness (possibly not eliminate it altogether).

Maybe store translation text in nodes of type 'translation'...?

The translation node of an event needs to be of type 'event' to have event fields. A node can only have one type. So in the current node type terminology, we cannot have a 'translation' node type.

Let me reiterate that your solution is good for one problem, and conceptually fits the "shared node fields" feature requirement much better then multiple associated nodes. But on that way, it shares too many node fields, which becomes a problem many times (AFAIS). If as a solution, by not sharing those node fields in a custom newly implemented way, you would actually reimplement most node features out of node, which is IMHO not desired.

nodes for data storage, not neccessarily for editing/display

Posted by gábor hojtsy on December 7, 2006 at 7:16am

Let me stress that I am talking about using nodes for storing data and managing different aspects / states of the nodes. In simple cases, where we only need a limited subset of the feature list, we don't neccessarily need to edit a node if we edit a translation. We just store the data there, the UI can be different.

A striking similarity of all

Posted by dries on December 1, 2006 at 6:59am

A striking similarity of all examined content translation modules is that they store translated content as separate nodes.

That is because contributed i18n modules have no other choice. There are is no adequate mechanism to let them solve this differently without having to patch core.

Not convinced yet ...

Posted by dries on December 1, 2006 at 7:20am

Goba mentions the following as an advantage for storing translations as nodes: Visitors to your different translations are counted separately, they can comment in the language of that node. To me, that would be a disadvantage. So, depending on what problem you want to solve, and your view point, advantages can become disadvantages -- and vice versa.

The issue comes down to this problem: translated versions of a node still have certain fields to share. Sometimes, you want to share these fields, sometimes you don't want to share them. For example, I want to aggregate the view counters, Goba wants to keep them separate. Same goes for the poll votes. Some people want to aggregate the votes, other people want to look at them separately.

Or how about comments? Do you want to mix comments in English and French, or do you want them in two separate threads? Certainly, the French comments are going to make things messy for English speaking people (those that don't know French). However, in Brussels where people are bi-lingual, mixing the comments might well be the preferred solution. Aggregating poll votes is one thing, mixing comment threads is another ...

Here is the current state of my mind:

Conclusion #1: keeping certain fields separate or merging certain fields are both valid use cases. Ideally, we'd support both.
Conclusion #2: merging or aggregating things on the fly is not a solution. It might work for counters, but it fails in other scenarios (eg. comments).
Conclusion #3: from conclusion #1 and #2 follows that storing translations as nodes is not the preferred solution. It doesn't provide a framework to solve some of these problems (eg. comments).

So, depending on what

Posted by jose reyero on December 1, 2006 at 11:25am

So, depending on what problem you want to solve, and your view point, advantages can become disadvantages -- and vice versa.

Yes, that's it. And that's why I'd go for the more flexible approach, which IMHO is storing translations as different nodes. Once you have i.e. node counts stored for each translation separately, then it's just a question of UI to aggregate the data if needed.

You want to have mixed language comments? Ok, then just build a page that shows all the coments for a node along with all the comments for each translation... That's only UI again.

If we keep different translations as different nodes, then we have the full power of the node system for each translation independently -publish, comment, author, revisions....- And whatever 'aggregation' functionality you want can be implemented on top of that.

Doing it the other way, you'll need to rework Drupal completely to have at least some of these features, like each translation with different author, or different workflow for each language. We have now a nice simple object model that would gone forever if you start reworking all these relationships/functionality for objects different than nodes.

Let's build complex functionality with simple objects, instead of going the other way around like having these monster multi-faced nodes people are talking here and then figure out how to rework everything so we have back the nice features we have now.

no framework?

Posted by gábor hojtsy on December 1, 2006 at 10:11pm

storing translations as nodes is not the preferred solution. It doesn't provide a framework to solve some of these problems

Neither there is another solution which does provide a framework to plug into existing node based solutions and solve shared fields at the same time. If we are not creating nodes, then attaching workflows, node level permissions, views and all the other benefits I listed above are gone, or we need to work around them, or we need to redesign all these.

If we are to support shared node fields in core, then we need to abstract sticky, promoted, published, author, etc. information out of nodes, so those can be shared and not associated with one single node if need be. If we are to support syncronization on submit (for these simple fields supported in core), then we don't need to rework the node system (yet).

apache language negotiation

Posted by adixon on December 4, 2006 at 8:16pm

Nice to see discussion and real code here. For what it's worth, i've just been reading an old Apache manual and got to the section on language negotation which I probably should have already known. In any case, the idea (at least at the time the book was written) was that you could have multiple versions of your document, with language suffixes to distinguish them, and then the browser would send out it's preferred languages, and apache would do the negotiation to send a 'best' version. So things that weren't language dependent were unchanged, and if you had file that was relevant to maybe a few languages but not all, you could event append multiple language codes. Dunno if that's at all useful, but it's an example of backporting language capabilities.

Also, and since I've been agitating for the single-node translation scheme and was the someone to point rob to this discussion, +1 for field-level translations!

Comparing the two solutions

Posted by Roberto Gerola on December 5, 2006 at 1:53pm

I have installed and tried the translate module (that provides translation of only necessary fields) of Rob Ellis and at the same time I am working with my own module (localizer) that uses the opposite approach (nodes duplication) like Internationalization module of Jose.

I am afraid that the first approach (not duplicating) can introduce a lot of management work for contrib modules.
Consider for example a pretty content-rich module like webform : you have to recreate and manage the main interface
of fields definition for every language, that it is almost the same to have duplicate nodes.

If we use this approach every module maintainer must implement support for multilingual content, and in complex
modules this can bring to duplicate all the core content of the module in a separate table, that it is the same, from
this point of view, of duplicating the nodes.

I don't see any problem to duplicate the nodes. Having a node duplicated in the node table or having its most but not
all content stored in another table, for me is equivalent.
For my point of view translated contents are different contents, so it has a sense that every content has its own
comments and counter.
In every case, a system of sharing comments of counters can be easily implemented.
The only problem of duplicating I can see is the management. It isn't a real problem for simple nodes where
you have to change only the title and the description, but it can become a problem for example
in webform or poll modules.

I am not sure it is be possible, but perhaps a mixed approach could be a solution.
For example, for images, it has no sense duplicating the nodes : you have only to translate the title and
the description. And duplicating introduces also a problem of maintenance : if you want change the image,
you must repeat the change on every localized node, if we don't implement a synchronization system that
update the image reference on every localized node.

Roberto

--
http://www.speedtech.it

module translation

Posted by rellis on December 5, 2006 at 5:01pm

On the node translation form it would be nice to
be able to optionally specify another url path for the
translation instead of translating text fields.
Then you just need a mechanism to do
a redirect.

For most modules, specifying a few extra text fields
for a translation form wouldn't be onerous.
You could even do a plugin type of thing for different
node types that someone else could maintain.

There's no reason those two approaches
couldn't co-exist.

Rob

Use-Case: Norway

Posted by pkej on January 31, 2007 at 2:27pm

In Norway there are three official (or is there four now?) languages. In certain municipalities you have to be able to respond to people in any three languages. All public information (laws, rules, etc) has to be provided in three languages. In all municipalities you have to be able to respond in at least two languages.

Two languages are understandable by all Norwegians in their written forms (New-Norwegian, no-nn and Book-Norwegian, no-nb)(some would say they don't understand, but that would mostly be a contrarian position; foreigners learning one Norwegian language can read the other), so comments to a node written in any of those languages could be interchangeable and interesting for both language 1 and language 2. These two languages are required for all municipalities in Norway.

Things written in the third language, Northern Sami, are not understandable at all for anyone who hasn't learned the language. This is a minority language from a completly different language family (Finnish-Urgic, as opposed to the Indo-European, Germanic which the other two are based on.) Only certain municipalities who have opted into this has to be able to respond in this language.

In one municipality (afaik) there is a fourth language, also Finno-Urgic, but completly different from Sami, which is also an official language.

A public servant answering a comment posted in no-nn has to answer it in no-nn; but the question and answer is certainly of interest for all no-nb readers as well. Furthermore it might need to be translated into Sami in certain municipalities, and if it is served on a state-level.

All this is required by law.

Of course, in practice this is very hard to do, especially for the fourth language which just a few thousand practicioners, Sami has tens of thousands, but I don't think all the departments have translators to translate every public document as they are written...

But, all taxonomy, all menus would ideally have a one-to-one relationship between the languages. Try out http://www.odin.no/odin/norsk/dok-bn.html as an example of a public site which is available in three languages, the link is to Book-Norwegian, in the second menu line there is a link to Nynorsk (New-Norwegian) and to Sámegiella (Sami). Note how the menus change their language when switching between no-nb and no-nn but the articles retain their original language (based on the language used of the author).

From a database normalization perspective I would go for the node version, contrary to what others might belive, the reason is that the extra information in the node is of interest for the current translation. The author info is unique for each translation (though an original might have several collaborators, like in a scientific paper, an investigative journalism piece, to mention two obvious candidates, and those would all need to be credited as authors in the author field, and should thusly be factored into an author table and have a one to many relationship with the node), the date of translation differs from the date of writing, etc.

Moving this into fields would create a layer which needs this kind of meta-information as well.

The information in an English node, a Sami node and a Norwegian node might be the same in a broad perspective, but don't forget that each language use its own idioms and ideas and thus translating a Sami node into English directly would show that even though the information is the same, the understanding would be lacking!

Of course, using nodes might be confusing for a first time user, but the current i18n with drupal 5.1 seems very good (from the very few looks I took into it), it even has acknowledged the fact that I don't need a copy of an image, just a translation of its description, so it seems it doesn't create unneccessary copies of a file, though I haven't had the opportunity to confirm this 100%. It also lets me upload a new picture in a different language if it is needed. Brilliant.

This path seems to be the right one for polls as well; the terms used in the poll might need translation, but not neccessarily a new poll. But the option to create a new poll might well be needed.

For example, a poll asking people about same-sex marriage might be apropriate in many countries, but in the Philippines even posting that poll would probably create an uproar. The same goes for many other social issues.

So that poll might have to be substituted for a different one in that country.

I like the workflow of the i18n, it fits into how I think. I like the possibility of assigning roles to people, you are a French translator, you are a German translator etc. It fits with the real world.

Disadvantages:

Aggregated poll results. If you are polling the answers based on spoken language, where you live etc. adds flavour to the analysis, and therefore are interesting. That's actually an advantage! As for the date and venue of an event. That is problematic, but just an UI issue.

Then to the advantages you mention.

Clean URLs, that's just an UI thing, therefore it can be solved for the field approach as well as for the node approach.

My advantages:

The author and the translator are different, the date of a translation is valuable information in its own right. I want to control when a translation is published, there are different versions of translations. Information is the keyword.

I can send a link to a node to a translator and she will have the same interface as if she wrote a new node, with her own translated interface, thus clearing up any ambiguity in an interface with several columns of information which needs to checked. Open two different browser windows to see the original and the translation, in their proper context, and in simpler UIs. Usability is the keyword.

The parts we're not discussing are those which have given me most problems in the past.

a few disadvantages

Posted by dynv on December 10, 2007 at 1:33am

I didn't think of all the advantages which you listed and I admit they have considerable weight. I got a few issues with the translations as nodes approach :

The main issue is not being able to switch language directly from the node the user is viewing. You need to switch then trail back menus, which you don't have to do if the translation is stored in the node.

Another issue is that if you create a node which will be used as a filter in a view, you will have to add all language instances. For example, I create a source node which I call it National_Paper from the type News_Paper, there will be a synchronized common point (URL, which could change) between the translated nodes (National_Paper-en, National_Paper-fr, ...) ; in the view I will have to select all translation instances. I could make a node with the common point and make a view from it, but the user will have to create an extra node and make an extra step when updating which synchronizing would skip.

I had another issue but I lost it ... Anyway, good article.

Yes, you can

Posted by jose reyero on December 10, 2007 at 10:02am

The main issue is not being able to switch language directly from the node the user is viewing. You need to switch then trail back menus, which you don't have to do if the translation is stored in the node.

That is about the UI, which is built on top of this node translations and actually you can switch languages and nodes just clicking on the language icon. This is already implemented in Drupal 6 core.

different URIs

Posted by gábor hojtsy on December 10, 2007 at 10:41am

Well, having different URIs for translations is a feature. Otherwise how can people bookmark them, how can search engines index them, how can we pass on the URL through some chat?

Maybe too late

Posted by andy inman on June 21, 2008 at 12:01am

No further comments here for six months plus, well here's mine:

I'm 100% behind the idea that the language that a node is displayed in is a display function. I want it displayed in English or French is the same as I want in displayed in Pink, or using the Garland theme.

The underlying problem is that machine translation doesn't work. If machine translation were 100%, nobody would be suggesting creating different nodes for the same content.

Suppose we want to store the value "day of week", Monday, Tuesday... Sunday. We all know, surely, that the way to do this is to store a number 1 to 7 (or 0 to 6 if you prefer.) Surely nobody would suggest that we should store "Monday" in the English data storage area, "lunes" in the Spanish data storage area, and so on, up to the number of languages that we want to handle.

Well, that's it :)

Currently part of the team at https://lastcallmedia.com in a senior Drupal specialist role.

A controversial(?) point: store translations as nodes

Disadvantages

Advantages

OK, ok, but I am interested in the user interface

We are NOT talking about taxonomy, menus, URLs

Comments

Group organizers

New groups

Group notifications

Hot content this week