Re: lets merge i18n back into trunk

hugo Wed, 02 Nov 2005 10:25:26 -0800

Hi,

>2. Once this hits the trunk we'll need documentation about any
>backwards incompatibilities this introduces and what the upgrade path
>looks like (in the style of the docs on the
>BackwardsIncompatibleChanges page).  The translation document is great


Actually I don't know any backward-incompatible changes, at least none
I am aware of. I think it should be possible to just run your project
after switching to the i18n branch without any _required_ changes.

>3. I don't like that the translation context can be specified in the
>GET string (i.e. http://www.example.com/?django_language=en).  I hate
...
>whatever, but I'd rather that be relegated to a simple view (which
>sets the language and then redirect back to the previous page) then
>have it be an aspect of the middleware as it is now.

Done. I added a view that can be easily hooked into the project and
that will either redirect to an explicitely given URL or to the URL in
the referrer header or to / (if none of both is given). This view will
set the language into either the cookie or the session and then just
redirect.

>Similarly,
>exposing LANGUAGES in DjangoContext seems like overkill; a template
>tag seems like the best idea::

Done. Without the "load i18n", though - it just went into the
defaulttags.py. I added a unittest for it, too.

>4. I don't like that you have to explicitly provide the verbose_name
>for fields if you want the field names translated; we specifically
>removed verbose_name as a required attribute because it was a PITA,
>so if there's a way around doing::
>
>     name = meta.CharField(_('Name'))

Sorry, no can do. At least no way to do it really clean - the reason
is, I need the _() hooks for the xgettext tool. It searches explicitely
for function calls with a given name and string contents - so I can't
just use the stuff that's in the assignement to the left of the =
operator.

A change like this would require writing our own xgettext replacement -
but then, what with all those other modules that happen to use the same
syntax, but don't require a translation in that place?

If people want translated modules, they need to provide translation
hooks. The only way out would be to do model introspection additionally
to the xgettext run - but that's  problematic, too. Because you can
only introspect models that are loaded. But if I for example want to
create translations, I might just not have a installed and _running_
django, I might just have a checkout of the subversion tree. So
introspection is out, too.

The main problem here is, I need to be able to pull out all translation
hooks by just inspecting the sourcecode. And so I need explicit and
unambiguous markers.

>5. Similarly, I don't like the changes to the template code.

Yeah, that part is the biggest problem with translations. The main
problem here is: translation of sourcecode is easy. Translation of
templates is hard. It's because templates are "turned around" with
regard of code/text ratio. More text, less (hopefully) code.

>     <title>{{ _('Title') }}</title>

Again, this is needed to mark the part that needs to be translated. If
I don't mark that part, but just use HTML, I would need the string
puller to know about HTML. But what if the user writes XHTML templates?
Or when this is a YAML template? Or a simple CSV format?

>And *I* barely understand what::
>
>     <p>{% i18n ngettext('There is %(count)d file', 'There are %
>(count)d files', files|count) %}</p>

Yes, that one usually puts people off, if they first work with
translations. The problem here: string translation is easy.
Pluralization is a real bastard.

The main reason why pluralazation needs this rather clumsy format with
two _full_ translations (and not just part translations) is that
languages handle pluralizations differently. Some languages have only
two forms: plural and singular. But even those languages might behave
differently with  regard to the zero: what if there are _no_ objects?
Is zero a singular, or is it a plural? Depends on your language.

But then there are languages that have a different concept. I call it
"troll counting" after the nice description by Terry Prattchet how
trolls count: one, two, three, many. :-)

There are many languages around that not only have a singular and
plural, but have special forms for one element, two elements, more than
two elements. So you need to provide the sentence and the count.
gettext provides the singular and plural form because gettext takes the
stand that the source language must be english (or at least a language
whose pluralization is identical to the one of english). So you provide
the sentence in singular and plural form and provide the count.

The translators will provide the forms that are needed by their
language and will provide a tag in the .po file that tells the system
to do the different pluralization. And if you think that's a rather
weird case, have a look at the "sr" language file in
conf/locale/sr/LC_MESSAGES/django.po - it's using pluralization. It
provides the "Plural-Forms" header for that purpose that defines the
rules when to pull what plural form. I don't even know wether the
python gettext library supports that fully, as it looks rather complex
...

>is doing, and "leaking" %-style string formatting into the template
>code seems ugly.  Specifically, I'd like the template language to be
>as losely coupled to Python as possible; I'd like implementations of
>the template language on other platforms to be possible.

We need to pull text flow and named parameters together. Another
problem of translations is: languages order words differently. Ever
heard an asian guy speak english in a funny yoda-style? It's because
order of words and concepts is differently. It's even very different
between english and german - and those languages are quite near to each
other.

So you need to put parameters in a named way into the translation
strings. That's needed, so that translators  can reorder the whole
sentence. You can't just construct a translation out of translated
parts - that's one of the biggest problems, you allways _must_ take
semantic blocks and translate them in one go. So the only option is
some kind of string interpolation to take place. Wether the syntax
should be pythons string interpolation or some faked django template
interpolation is irrelevant for execution. But it's relevant for
editors, because there are editors that support the formats. That's why
the .po files carry funny comments like "#, python-format" - that's a
tag to tell the system how to handle those. Strings that have similar
text, but different placeholders, will be handled automatically by some
tools. The gettext tools themselves do that - they produce "fuzzy"
translations, when there is a string that is - after removing all
placeholders - similar to some other string. It will just pull the
translation from that other string in those cases.

So changing this format might break some nice effects for translators
and maybe make their work harder - it would be a deviation from the
gettext formats, and I am especially reluctant to do _that_ (yes,
gettext has explicit and official support for python string
interpolation syntax).

>     <title>{% translate "Title" %}</title>

That's already possible by {% i18n _('Title') %} - the {{ _('Title') }}
is just a shortcut for that, because it became a bit tedious to add all
those i18n tag invocations :-)

And since there are other tags that can have string constants, I
thought it would be best to allow the i18n translated string syntax
everywhere where we have strings. At least everywhere where it's easy
to add,  like with resolve_variable and resolve_variable_with_filters
(the tag itself would still need to be aware of possible translated
strings - that's a problem that stemms from the fact that django
doesn't have a central tag-bit-parser).

>     <p>{% translate %}Hello, {{ name }}, welcome to {{ site }}!{%
>endtranslate %}</p>
>     <p>{% translate %}There are {{ count }} {% pluralize count
>"file" "files" %}{% endtranslate %}</p>

I could do that - I already have a template-transformer that turns the
templates into something grokkable by xgettext. But as I said, the main
problem here would be the deviation form the gettext syntax. And it
would be another problem: the part in between the translate block (or
some i18n block in my case) would have to be first collected _without_
rendering the interpolation elements - because we can only translate
strings with placeholders.

And what should happen if there are things in the blocktag like other
block tags? Like this:

{% translate %}
This is a sentence {% if doit %}with something weird{% else %}without
it{% endif %}
{% endtranslate %}

How should I handle this? I would have to pull _all_ text within a
translate block tag and pull it together - uninterpreted! - as a string
and store it in the .po files for translation. And what if that inner
block tag is another translate tag?

And what is a full non-option would be the  pluralize thingy - in the
light of different pluralizations  it _can't_ possibly work. Actually I
really dislike the pluralze tag ;-)

 We would still need something like:

{% translate %}
this is the singular case
{% plural %}
this is the plural case
{% endtranslate %}

But then there are still the problems about what to do with the inner
block and possible block tags that are in there. The behaviour of the
translate tag would be weirdly different from other tags: it wouldn't
run it's inner nodelists through the template engine, but would first
pull them together into a string, run that string through the
translation engine and then reparses the resulting string and run
_that_ through the template engine. Doesn't sound like an afternoon
project ;-)

>Thoughts?

Yup. Loads of them. ;-)

bye, Georg

Re: lets merge i18n back into trunk

Reply via email to