[Dbix-class] Advices needed on creating multilingual application

Wed Sep 19 12:49:43 GMT 2007

Mark Lawrence(nomad at null.net)@Wed, Sep 19, 2007 at 12:58:24PM +0200:
> On Wed Sep 19, 2007 at 11:50:26AM +0300, mbit at ukr.net wrote:
> 
> > Let's assume an easy application. We have a blog. But there are
> > three versions int three languages: en, de and ru for example. But
> > when we don't have a translation for the en part, we show original
> > part. This way first we add original post in de, and then if we
> > translate that part we add translation. And when a user views en
> > version of the site he sees translated part. Otherwize he
> > sees de version. This way in all three languages we have complete
> > website, just not everything is always translated, but we know what
> > changed and was added in all languages.
> 
> The application might be 'easy', but the type of problem is not, otherwise
> I expect we would have seen the major blogging/CMS platforms do this long
> ago. Unfortunately I think the only ones who have made reasonable
> progress are Plone and the Gengo plugin for wordpress.

> Although it won't help with your Perl/DBIx::Class issues, looking at
> their database schemas might give you some ideas. As you have probably
> already found, apart from the (minor) complexity of determining which
> record(s) to retrieve, the biggest issue is efficiency.

Thanx. I'll try that.

> Here is how I do it (database-side only).
> 
> For every type of object that we wish to have translatable components there
> are two tables. The first table holds all data that is non-translatable,
> plus the original language versions of the translatable data. Eg:
> 
>     CREATE TABLE entry (
>         id    INTEGER,
>         ctime INTEGER,
>         lang  VARCHAR(4),
>         title VARCHAR(255),
>         body  TEXT,
>     )
> 
> The second table contains the translations of the translatable columns
> 
>     CREATE TABLE entry_t (
>         id      INTEGER,
>         transof INTEGER REFERENCES entry(id),
>         ctime   INTEGER
>         lang    VARCHAR(4),
>         title   VARCHAR(255),
>         body    TEXT,
>     )

Yeah. I do the same thing.

> To retrieve the desired translation with fallback to the original
> language this query is the most efficient I've found. 
> 
>   "find the last 5 blog entries, in Russian but showing their original
>   language if no Russian translation available":
> 
>     SELECT
>         t1.id,
>         CASCADE(t2.lang, t1.lang)   AS lang,
>         CASCADE(t2.title, t1.title) AS title,
>         CASCADE(t2.body, t1.body)   AS body,
>     FROM
>         entry AS t1
>     LEFT JOIN
>         entry_t AS t2
>     ON
>         t1.id = t2.transof AND t2.lang = 'ru'
>     ORDER BY
>         t1.ctime
>     LIMIT 5

Thanx for the SQL. That's surprising me. Actually i'm not that good in pure SQL.

> There is another way to do this, where all the translations are in the
> same table and using self joins, but I don't think it is as clear or
> efficient as the above. Of course the above is the simplest case. I tend
> to also want columns to specify whether the entry is published, whether
> the translation is complete, whether to show the entry at all if there
> is no translation for the selected language (ie hidden from other
> language groups) and so on.
> 
> >     lang => {
> >         data_type     => 'character varying',
> >         default_value => undef,
> >         size          => 2,
> >         is_nullable   => 0
> 
> By the way, not all language codes are limited to two characters in
> length.

Ok. I see it now.

> > My question is if there is a better approach to do the same stuff. Because i
> > faced a lot of difficulties in prefetching related object like creator where i
> > had to do something like this:
> > 
> > prefetch => { i18n => [ qw/ creator / ] }
> 
> Oh, forgot about that. Something like this works.
> 
>     CREATE TABLE users (
>         id    INTEGER,
>         name  VARCHAR(255),
>     )
> 
>     CREATE TABLE entry (
>         id    INTEGER,
>         ctime INTEGER,
>         author INTEGER REFERENCES users(id),
>         lang  VARCHAR(4),
>         title VARCHAR(255),
>         body  TEXT,
>     )
> 
>     CREATE TABLE entry_t (
>         id      INTEGER,
>         transof INTEGER REFERENCES entry(id),
>         ctime   INTEGER
>         lang    VARCHAR(4),
>         title   VARCHAR(255),
>         body    TEXT,
>     )
> 
>     SELECT
>         t1.id,
>         t3.name,
>         CASCADE(t2.lang, t1.lang)   AS lang,
>         CASCADE(t2.title, t1.title) AS title,
>         CASCADE(t2.body, t1.body)   AS body,
>     FROM
>         entry AS t1
>     LEFT JOIN
>         entry_t AS t2
>     ON
>         t1.id = t2.transof AND t2.lang = 'ru'
>     LEFT JOIN
>         users AS t3
>     ON
>         t3.id = t1.author
>     ORDER BY
>         t1.ctime
>     LIMIT 5
> 
> > Or worse when we have a related object that has a translation
> > too... That's a big mess for me.
> 
> On the SQL side you can juse keep on adding CASCADEs and JOINs as many
> times as you want. Unfortunately this doesn't really translate well to
> pure Object(table)-based mappers, so I'll leave it to others on this list
> to explain how you can do that with DBIC. (I couldn't find a nice way).
> 
> Hope that helps.
> 
> Mark.

Well, thanx a lot for your advises and experience. Now i know that my approach
is not that far from the truth. But i was trying to think `out of the box'
and trying to find a whole anouther way of dealing with maultilingual databases.

Again and again, you are right about difficulties of implementing such things
not in a raw SQL interface. The thing is that i have my application working and
it works not that bad, but i understand it needs refactoring and that involves
searching a better ways of doing the same tasks again and again.

-- 
vti -- Viacheslav Tikhanovskii