From: Jens Seidel <[EMAIL PROTECTED]> > On Thu, Nov 09, 2006 at 03:34:15AM +0100, Danai SAE-HAN wrote: > > Okay, I checked out the SGML-generated .tex file, and it seems very > > much that Perl or something else misinterprets some of the CJK > > characters. > > > > The command I used to create the .tex file was: > > debiandoc2latex -l zh_TW.Big5 reference.zh-tw.sgml > > > > Examples of these misinterpreted characters: > > > > in titledoc.sgml: 程 -> 琵{ > > > > in preface.sgml: 開 -> 跚} > > 閱 -> 閱textbackslash{} > > 誤 -> 蓋textasciitilde{} > > Misinterprets? I remember that this was intentional! Extract from > fixlatex script which is applied after the SGML/Perl magic: > > ("zh_TW") > perl -p \ > -e 's/([\x80-\xff])\\textbackslash\{\}/$1\\/g;' \ > -e 's/([\x80-\xff])\\textasciitilde\{\}/$1\~/g;' \ > -e 's/([\x80-\xff])\\textasciicircum\{\}/$1\^/g;' \ > -e 's/([\x80-\xff])\\\}/$1\}/g;' \ > -e 's/([\x80-\xff])\\\{/$1\{/g;' \ > -e 's/([\x80-\xff])\\\_/$1_/g;' <$1 | > bg5conv >$2 > > That's the only code I'm aware of which substitutes characters. I have > to confess that the code is cryptic and I do not fully understand it. > But it was provided by a Chinese person and introduced to support zh_TW > documents and building did not worked without in the past. > The funny thing is that I also looked into the bg5conv code and noticed > that it does also very similar tasks (only a few substitutions). I once > tried to merge both so that the substitutions can be called together on > pieces of text inside the LaTeX processor (and not afterwards for speed > up and simplicity) but ...
The code works, and the output of bg5conv is then saved to reference.zh-tw.tex-in. Run latex on this file (not bg5latex) and it should work. > > And perhaps a few other characters. I find it strange that not all > > characters suffer from this problem, but only some. Perhaps Perl > > doesn't have a complete map of the Big5 encoding? > > > > A quick fix would perhaps be to put everything in a sed script. > > I would prefer the opposite: deleting the above substitutions. Can you > please check these? Are they still necessary? They're necessary. In the weekend I'll experiment with UTF-8. > BTW: Are you a Chinese? Or do you also have to guess whats right and > what not? I'm Chinese, but I only recently learned Mandarin Chinese. I can read and write some Chinese, but not proficiently enough to be able to proofread the output. > What about the other languages? Japanese doesn't substitute strings but > failed with the same error that [EMAIL PROTECTED] was not defined ... $ debiandoc2latex -l ja reference.ja.sgml The result won't run with "latex" because it misses the HBF fonts. For Japanese, there's a much better way now to get nice fonts: the use of DNP Type1 fonts. You will need a slight change in though: --- LaTeX~ 2005-02-13 17:04:27.000000000 +0100 +++ LaTeX 2006-11-10 02:03:16.130041713 +0100 @@ -11,7 +11,7 @@ 'copyright notice' => '���ɽ��', 'pdfhyperref' => 'CJKbookmarks', 'before begin document' => '\\usepackage{CJK}', - 'after begin document' => '\\begin{CJK}{JIS}{song} + 'after begin document' => '\\begin{CJK}[dnp]{JIS}{min} \\renewcommand{\\vpageref}[1]{on page \\pageref{#1}} \\def\\prefacename{� \\def\\refname{ʸ ��} Now "latex" will run fine (provided latex-cjk-japanese and latex-cjk-japanese-wadalab are installed; if you depend on latex-cjk-all, then it should be okay). > > Because of the many \text* commands it's best to keep using "bg5latex" > > instead of just "latex", which apparently doesn't work. Fixing the > > FTBS is much more important now ATM, so let's stick to bg5latex. > > But it worked in the past. Do you still think it is necessary to use > bg5latex? This would make just more code language specific. I'll try to make it run by "latex". Another solution might be to make a new zh_TW.UTF-8 version and use the CJKutf8 package. Then we don't need to use converters and such. Perhaps doing so with zh_TW, zh_CN and ja_JP will transform the HTML into UTF-8 encoding as well, which would be a good thing IMHO. Best regards Danai SAE-HAN 韓達耐 -- 題目:《王充道送水仙花五十支》 作者:黃庭堅(1045-1105) 凌波仙子生塵襪,水上輕盈步微月。 是誰招此斷腸魂,種作寒花寄愁絕。 含香體素欲傾城,山礬是弟梅是兄。 坐對真成被花惱,出門一笑大江橫。