I'm building a 'document centric' internationalised site which, to my mind, Cocoon can't 'quite' do yet. Cocoon's i18n functionality works well on 'webapps', where you have snippets of text to be translated, but not when the content is whole pages.

The most complex part of this is identifying the most appropriate language content, given the combination of the user's desired locales/languages, and the available translations.

The site will cater for locale provided as a request parameter, as the one of the acceptable locales configured within the browser, or as a site default.

When a page is requested, it will look for a page with the preferred locale (request parameter, if provided), if not found, it will look for a page using each of the locales in turn. If none are found, the default page is used.

So, say we have three locales to try: pt, es, en. We have resources:
content/pl/foo.xml
content/es/foo.xml
content/en/foo.xml
When the user requests foo.html, Cocoon will look to see if content/pt/foo.xml exists. It doesn't, so it will look for content/es/foo.xml. That it finds, so that is what it uses to as a source for the pipeline.


Similarly, this system would be able to handle a file structure such as:
content/foo_pl.xml
content/foo_es.xml
content/foo_en.xml

Now, handling this functionality within a Cocoon component really isn't that easy to work out. To achieve it, the component needs to take a configurable path, e.g content/{locale}/{1}.xml, and needs to be told what to use for finding the locale (request param, accept-language header, default locale for site). Once it has made a decision, it might also want to make its choice of locale available to other components (e.g. the i18nTransformer) so that it can localise any other bits of text on the page, e.g. navigation.

I have mulled on whether an input module, a generator or maybe an action would do the job. In fact, I think it is a job for an I18n matcher.

Introducing the I18NMatcher
---------------------------
Here's a sample sitemap snippet:

<map:match pattern="**.html">
 <map:match type="i18n" src="content/*/{1}.xml">
   <map:generate src="{source}"/>
   <map:transform src="foo.xsl"/>
   <map:transform type="i18n">
     <map:parameter name="locale" value="{locale}"/>
   </map:transform>
   <map:serialize type="html"/>
 </map:match>
</map:match>

Once an ordinary wildcard matcher has done its job, in comes the i18n matcher. Its job is to see whether it can find a suitable source document for the requested page. The * is used to symbolise the place where the locale is to be placed. If a match is successful, it will make sitemap variables available for the source that was found, and the locale that matched.

Now, this seems to be quite in keeping with the Cocoon sitemap model, and gives some rather nice, flexible functionality.

What do you all think?

When I've finished implementing this, I'll go onto extend the CLI to be able to work effectively with this kind on i18n site, enableing it to crawl a site for each of a range of locales. But that's for another time.
Regards, Upayavira





Reply via email to