Hi Nicolas, thanks for your very fast reaction!
On Sat, Aug 23, 2008 at 05:43:13AM +0200, Nicolas François wrote: > On Thu, Aug 21, 2008 at 04:28:08PM +0200, [EMAIL PROTECTED] wrote: > > I used po4a to convert the Subversion Book > > (svn source: http://svn.red-bean.com/svnbook/trunk/) from DocBook XML > > into PO. > Some remarks: > * Did you try using a config file. This might ease the process of > generating the translations, updating the POT, updating the POs. > Simply calling "po4a --previous po4a/svnbook.cfg" could be sufficient. > (example attached) No, I didn't. I just opened the manpage of the corresponding po4a subcommands (such as po4a-updatepo) which names I got from an existing project (Debian's Release Notes). I didn't spent much time reading the general po4a manpage :-) I also tried to use as much as possible of the existing build system. Since it specified the source as a variable containing book/*.xml in a Makefile I tried not to duplicate the information how to find the source. It sounds strange but I needed really many hours to get the current patch against svnbook's buildsystem, to avoid recursively defined Makefile variables, still allowing users to overwrite some parts, ... (and I thought I know GNU make very well :-)). Nevertheless I will consider using a configuration file once it will simplify things in the future (currently it would only help reducing the command output). > * Did you try the includeexternal option (this could also simplify the > command lines (only call the po4a commands once with only the > book/book.xml file)) No, because I read somewhere that it is currently not (well?) supported. Locale::Po4a::Xml(3pm) contains at the end: "Support for entities and included files is in the TODO list." Locale::Po4a::Docbook(3pm): "Please note that this module is still under heavy development, and not distributed in official po4a release since we don’t feel it to be mature enough. If you insist on trying, check the CVS out." Is this still true? "The only known issue is that it doesn’t handle entities yet, and this includes the file inclusion entities, but you can translate most of those files alone (except the typical entities files), and it’s usually better to maintain them separated." I will try it out (I'm not very motivated currently because of the idiom "Never touch a running system" but if po4a supports it it should indeed be used and tested). There is also a single included file (version.xml) which is automatically generated (contains only 2 lines) and on which po4a failed so that I excluded it. Lets see ... > Note: I'm note sure it's working correctly (at least you have to be in > the correct directory if you want po4a to find the files). Just tell me > if you think this option would help you (and should be improved). So it needs to be tested! Will do it (but maybe I will wait for my long holidays in 3 weeks). > The ideas/bugs I noted are: > * speed improvement (at least I need to understand why it takes so long > on my laptop (15 minutes)) I think at least 80% of the time is spend in msgmerge. Try to have a "top" session open. To avoid that user blame po4a I suggest you output "starting msgmerge (may need some time ...)" or something like this. As I wrote on the mailing list: "I read in the past (on debian-i18n at lists.debian.org) a posting from Clytie Siddall where she wrote about a patch to speed it up, don't know it's current status)." An inspection of msgmerge would probably be useful (but my TODO list is already much too long :-). > * <option> needs to be inline (see the other bug; fixed in CVS) > * fix for the nodefault option (fixed in CVS) Thanks a lot! So I think po4a is now fully capable of handling DocBook XML. Great! > * support for untranslated inline tags (tags not translated if they are > found alone on a paragraph, but translated "inline" if they are part of > a bigger translated paragraph I think this is already possible using the "untranslated" option together with a (maybe long) list of nested tags such as <bbb><aaa> (Locale::Po4a::Xml(3pm)). A fully automatic support would nevertheless simplify it. > > po4a-gettextize failed multiple times because of the following text: > > > > <figure id="svn.intro.architecture.dia-1"> > > <title>Subversion's architecture</title> > > <graphic fileref="images/ch01dia1.png"/> > > </figure> > > > > The error I got: > > > > po4a gettextization: Structure disparity between original and translated > > files: > > msgid (at ../en/book/ch00-preface.xml:902) is of type 'Content of: > > <preface><sect1><sect2><title>' while > > msgstr (at book/ch00-preface.xml:1773 book/ch00-preface.xml:1788) is of type > > 'Content of: <preface><sect1><sect2><figure><title>'. > > Original text: Subversion's Architecture > > Translated text: Die Architektur von Subversion > > This is "normal" > > The problem is that the English version contains: > <title>Subversion's Architecture</title> That's the headline of the section. > ... > <figure id="svn.intro.architecture.dia-1"> > <title>Subversion's architecture</title> That's the caption of the figure. > (note: two different strings: one with 'A', the other one with 'a' at > the beginning of "architecture") > > The German version has the same string for both: > <title>Die Architektur von Subversion</title> Right. It's nevertheless a valid case and the translation is right. > <figure id="svn.intro.architecture.dia-1"> > <title>Die Architektur von Subversion</title> > > During the "gettextization", I put the English strings in front of the > German strings, but the English version has one more string than the > German. Ah, I understand. But I really wonder that the files I worked on were only affected in figure tags (not outside) and only a few times. Maybe you can improve the diagnostics in this case? I assumed that both files were not in sync but I really failed to see a difference. PS: What would happen if the English strings were equal but the translations not? This would result in #-#-#-# PO conflicts as described below? > I will check if I can fix it by having a less strict check on the type of > string when a string appear multiple times (if there is a mismatch, it > should be noticed later), or by storing the type of each occurrence. Good. It isn't very important but I just missed the fact that identical translations of different English strings could cause trouble. Now I think I found it in the po4a(7) documentation, in the section "HOWTO convert a pre-existing translation to po4a?": "Sometimes, you get the strong feeling that po4a ate some parts of the text, ... So, when the same paragraph appears twice in the original but are not translated in the exact same way each time, you will get the feeling ..." It seems I didn't read it carefully enough. But again: Considering that the whole transformation worked so flawlessly I expected a bug :-) > The goal of the "gettextization" is to speed up the retrieval of existing > translation, it does not need to be perfect, and there is not need to > loose 5 minutes to just win one string. Yep, it does not need to be perfect, that's why I used Severity: minor for this bug. Feel free to close it. > I'm also sometimes adding something like FIXME1, FIXME2 at the end of > sentences to force them being different while still being identifiable in > the PO file (or just to add missing paragraphs to the translated text). I will see whether Osamu needs help transforming his new Debian Reference to PO format (there was recently a translation request by a new Korean translator). Maybe I can apply some of your hints to it as well. At least it would be just another test for po4a. Thanks again! Jens -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]