From: Jens Seidel <[EMAIL PROTECTED]>

> On Thu, Nov 09, 2006 at 03:34:15AM +0100, Danai SAE-HAN wrote:
> > Okay, I checked out the SGML-generated .tex file, and it seems very
> > much that Perl or something else misinterprets some of the CJK
> > characters.
> > 
> > The command I used to create the .tex file was:
> > debiandoc2latex -l zh_TW.Big5 reference.zh-tw.sgml
> > 
> > Examples of these misinterpreted characters:
> > 
> > in titledoc.sgml: 程 -> 琵{
> > 
> > in preface.sgml:  開 -> 跚}
> >                   閱 -> 閱textbackslash{}
> >                   誤 -> 蓋textasciitilde{}
> 
> Misinterprets? I remember that this was intentional! Extract from
> fixlatex script which is applied after the SGML/Perl magic:
> 
> ("zh_TW")
>         perl -p \
> -e 's/([\x80-\xff])\\textbackslash\{\}/$1\\/g;' \
> -e 's/([\x80-\xff])\\textasciitilde\{\}/$1\~/g;' \
> -e 's/([\x80-\xff])\\textasciicircum\{\}/$1\^/g;' \
> -e 's/([\x80-\xff])\\\}/$1\}/g;' \
> -e 's/([\x80-\xff])\\\{/$1\{/g;' \
> -e 's/([\x80-\xff])\\\_/$1_/g;' <$1 |
>         bg5conv >$2
> 
> That's the only code I'm aware of which substitutes characters. I have
> to confess that the code is cryptic and I do not fully understand it.
> But it was provided by a Chinese person and introduced to support zh_TW
> documents and building did not worked without in the past.
> The funny thing is that I also looked into the bg5conv code and noticed
> that it does also very similar tasks (only a few substitutions). I once
> tried to merge both so that the substitutions can be called together on
> pieces of text inside the LaTeX processor (and not afterwards for speed
> up and simplicity) but ...

The code works, and the output of bg5conv is then saved to
reference.zh-tw.tex-in.  Run latex on this file (not bg5latex) and it
should work.


> > And perhaps a few other characters.  I find it strange that not all
> > characters suffer from this problem, but only some.  Perhaps Perl
> > doesn't have a complete map of the Big5 encoding?
> > 
> > A quick fix would perhaps be to put everything in a sed script.
> 
> I would prefer the opposite: deleting the above substitutions. Can you
> please check these? Are they still necessary?

They're necessary.  In the weekend I'll experiment with UTF-8.


> BTW: Are you a Chinese? Or do you also have to guess whats right and
> what not?

I'm Chinese, but I only recently learned Mandarin Chinese.  I can read
and write some Chinese, but not proficiently enough to be able to
proofread the output.


> What about the other languages? Japanese doesn't substitute strings but
> failed with the same error that [EMAIL PROTECTED] was not defined ...

$ debiandoc2latex -l ja reference.ja.sgml

The result won't run with "latex" because it misses the HBF fonts.
For Japanese, there's a much better way now to get nice fonts: the use
of DNP Type1 fonts.  You will need a slight change in  though:


--- LaTeX~      2005-02-13 17:04:27.000000000 +0100
+++ LaTeX       2006-11-10 02:03:16.130041713 +0100
@@ -11,7 +11,7 @@
           'copyright notice' => '���ɽ��',
           'pdfhyperref' => 'CJKbookmarks',
           'before begin document' => '\\usepackage{CJK}',
-          'after begin document' => '\\begin{CJK}{JIS}{song}
+          'after begin document' => '\\begin{CJK}[dnp]{JIS}{min}
 \\renewcommand{\\vpageref}[1]{on page \\pageref{#1}}
 \\def\\prefacename{�
 \\def\\refname{ʸ ��}


Now "latex" will run fine (provided latex-cjk-japanese and
latex-cjk-japanese-wadalab are installed; if you depend on
latex-cjk-all, then it should be okay).


> > Because of the many \text* commands it's best to keep using "bg5latex"
> > instead of just "latex", which apparently doesn't work.  Fixing the
> > FTBS is much more important now ATM, so let's stick to bg5latex.
> 
> But it worked in the past. Do you still think it is necessary to use
> bg5latex? This would make just more code language specific.

I'll try to make it run by "latex".  Another solution might be to make
a new zh_TW.UTF-8 version and use the CJKutf8 package.  Then we don't
need to use converters and such.

Perhaps doing so with zh_TW, zh_CN and ja_JP will transform the HTML
into UTF-8 encoding as well, which would be a good thing IMHO.


Best regards



Danai SAE-HAN
韓達耐

-- 
題目:《王充道送水仙花五十支》
作者:黃庭堅(1045-1105)

凌波仙子生塵襪,水上輕盈步微月。
是誰招此斷腸魂,種作寒花寄愁絕。
含香體素欲傾城,山礬是弟梅是兄。
坐對真成被花惱,出門一笑大江橫。

Reply via email to