Subject: mueller7accent-dict: Phonetic Transcription to display correct UTF8 for mueller dict packages Package: mueller7accent-dict Version: 2002.02.27-3.2 Severity: normal Tags: patch
*** Please type your report below this line *** This could also be considered as extension of bug #92351 and is probably just a wishlist. The phonetic transcription of some words do not display correctly. e.g. dog [dâg] 1. _n. 1: ÑÐÐÐÐÐ, ÐÑÑ; Greater (lesser) Dog .. Should display: dog [dÉg] 1. _n. 1: ÑÐÐÐÐÐ, ÐÑÑ; Greater (lesser) Dog .. In fact even some trascriptions are deleted in the to-dict.sh script. e.g. bread 1. _n. 1: ÑÐÐÐ; _ÐÐÑÐÐ. ÐÑÑÐÐ ÑÐÐÐÐ .... Whereas keeping the phonetic transcription would display (by simply removing the --no-trans section from the debian/rules file) bread [bred] 1. _n. 1: ÑÐÐÐ; _ÐÐÑÐÐ. ÐÑÑÐÐ ÑÐÐÐÐ .... Included is a patch to go against the debian patched source. Added is a perl script that Fixes the phonetic transcriptions by modifying the incorrect UTF8 bytes and rewriting them as their (hopefully) correct IPA UTF8 counterpart. This is more a "completeness" issue so that transcriptions are in fact displayed correctly. Not sure if keeping the transcriptions will break any required dict format. Also there will be the additional requirement of needing perl to build successfully. It should work for any perl version 5.6 and up. Hopefully someone might find this useful. Thanks Chris Donoghue
diff -Naur mueller.orig/debian/rules mueller/debian/rules --- mueller.orig/debian/rules 2005-03-18 20:32:37.000000000 +1100 +++ mueller/debian/rules 2005-03-22 12:19:50.000000000 +1100 @@ -21,17 +21,22 @@ # patch does not set executable flag chmod a+x debian/scripts/to-dict.sh + chmod a+x debian/scripts/upgrade_trans.pl - debian/scripts/to-dict.sh --no-trans Mueller7accentGPL.koi mueller7accent.notr - debian/scripts/to-dict.sh --src-data mueller7accent.notr mueller7accent.data + # Keep the phonetic transcription. Most stayed anyway, so let's just keep them all. The phonetic transription is upgraded to correct UTF8 encoding in the to-dict.sh using perl program + # debian/scripts/to-dict.sh --no-trans Mueller7accentGPL.koi mueller7accent.notr + # debian/scripts/to-dict.sh --src-data mueller7accent.notr mueller7accent.data + debian/scripts/to-dict.sh --src-data Mueller7accentGPL.koi mueller7accent.data debian/scripts/to-dict.sh --data-dict mueller7accent.data mueller7accent -rm -f mueller7.data mueller7.notr debian/scripts/to-dict.sh --expand-index mueller7accent.index mueller7accent.index.exp sort -k 1,1 mueller7accent.index.exp > mueller7accent.index -rm -f mueller7accent.index.exp - debian/scripts/to-dict.sh --no-trans Mueller7GPL.koi mueller7.notr - debian/scripts/to-dict.sh --src-data mueller7.notr mueller7.data + # Keep the phonetic transcription. Most stayed anyway, so let's just keep them all. The phonetic transription is upgraded to correct UTF8 encoding in the to-dict.sh using perl program + # debian/scripts/to-dict.sh --no-trans Mueller7GPL.koi mueller7.notr + # debian/scripts/to-dict.sh --src-data mueller7.notr mueller7.data + debian/scripts/to-dict.sh --src-data Mueller7GPL.koi mueller7.data debian/scripts/to-dict.sh --data-dict mueller7.data mueller7 -rm -f mueller7.data mueller7.notr debian/scripts/to-dict.sh --expand-index mueller7.index mueller7.index.exp diff -Naur mueller.orig/debian/scripts/to-dict.sh mueller/debian/scripts/to-dict.sh --- mueller.orig/debian/scripts/to-dict.sh 2005-03-18 20:32:37.000000000 +1100 +++ mueller/debian/scripts/to-dict.sh 2005-03-22 12:19:50.000000000 +1100 @@ -13,6 +13,9 @@ DICTFMT=`which dictfmt` DICTZIP=`which dictzip` +# and upgrade phonetics transcription perl script +UPGTRANS=`dirname $0`/upgrade_trans.pl + INFO () { echo " to-dict, version $version ($versiondate). @@ -166,6 +169,7 @@ # -s "$TITLE" $3 < $2 || exit 1 recode -f KOI8-RU..UTF-8 < $2 |\ + LC_ALL=C $UPGTRANS |\ LOCPATH=locale dictfmt -p --allchars --locale ru_RU.utf-8\ -u "http://www.chat.ru/~mueller_dic" -s "$TITLE" $3 diff -Naur mueller.orig/debian/scripts/upgrade_trans.pl mueller/debian/scripts/upgrade_trans.pl --- mueller.orig/debian/scripts/upgrade_trans.pl 1970-01-01 10:00:00.000000000 +1000 +++ mueller/debian/scripts/upgrade_trans.pl 2005-03-22 14:31:39.000000000 +1100 @@ -0,0 +1,34 @@ +#!/usr/bin/perl + +while(<STDIN>) +{ + $linemod=$_; + $linemod=~s/\[(.*?)\]/&pronmod($&)/eg; + print $linemod; + +} + +sub pronmod +{ + $phword=$_[0]; $word=$phword; + $chf=chr(0x51); $cht=chr(0xc3).chr(0xa6); $phword=~s/$chf/$cht/g; + $chf=chr(0x41); $cht=chr(0xc9).chr(0x91); $phword=~s/$chf/$cht/g; + $chf=chr(0xd0).chr(0xab); $cht=chr(0xcb).chr(0x90); $phword=~s/$chf/$cht/g; + $chf=chr(0xe2).chr(0x95).chr(0x9a); $cht=chr(0xc9).chr(0x99); $phword=~s/$chf/$cht/g; + $chf=chr(0x45); $cht=chr(0xc9).chr(0x9b); $phword=~s/$chf/$cht/g; + $chf=chr(0xe2).chr(0x96).chr(0x88); $cht=chr(0xc9).chr(0x94); $phword=~s/$chf/$cht/g; + $chf=chr(0xd1).chr(0x86); $cht=chr(0xca).chr(0x8c); $phword=~s/$chf/$cht/g; + $chf=chr(0x49); $cht=chr(0x69); $phword=~s/$chf/$cht/g; + $chf=chr(0x69); $cht=chr(0x69); $phword=~s/$chf/$cht/g; + $chf=chr(0xd1).chr(0x85); $cht=chr(0xcb).chr(0x88); $phword=~s/$chf/$cht/g; + $chf=chr(0xd0).chr(0xb3); $cht=chr(0xcb).chr(0x8c); $phword=~s/$chf/$cht/g; + $chf=chr(0x5a); $cht=chr(0xca).chr(0x92); $phword=~s/$chf/$cht/g; + $chf=chr(0x4e); $cht=chr(0xc5).chr(0x8b); $phword=~s/$chf/$cht/g; + $chf=chr(0x53); $cht=chr(0xca).chr(0x83); $phword=~s/$chf/$cht/g; + $chf=chr(0x44); $cht=chr(0xc3).chr(0xb0); $phword=~s/$chf/$cht/g; + $chf=chr(0x54); $cht=chr(0xce).chr(0xb8); $phword=~s/$chf/$cht/g; + + return $phword; +} + +