Subject: mueller7accent-dict: Phonetic Transcription to display correct UTF8 
for mueller dict packages
Package: mueller7accent-dict
Version: 2002.02.27-3.2
Severity: normal
Tags: patch

*** Please type your report below this line ***
This could also be considered as extension of bug #92351 and is probably
just a wishlist.

The phonetic transcription of some words do not display correctly.

e.g.

  dog
     [dâg]
     1. _n.
        1: ÑÐÐÐÐÐ, ÐÑÑ; Greater (lesser) Dog ..

Should display:

  dog
     [dÉg]
     1. _n.
        1: ÑÐÐÐÐÐ, ÐÑÑ; Greater (lesser) Dog ..

In fact even some trascriptions are deleted in the to-dict.sh script.

e.g.

  bread
     1. _n.
        1: ÑÐÐÐ; _ÐÐÑÐÐ. ÐÑÑÐÐ ÑÐÐÐÐ ....

Whereas keeping the phonetic transcription would display (by simply
removing the --no-trans section from the debian/rules file)

  bread
     [bred]
     1. _n.
        1: ÑÐÐÐ; _ÐÐÑÐÐ. ÐÑÑÐÐ ÑÐÐÐÐ ....

Included is a patch  to go against the debian patched source.

Added is a perl script that Fixes the phonetic transcriptions by
modifying the incorrect UTF8 bytes and rewriting them as their
(hopefully) correct IPA UTF8 counterpart.

This is more a "completeness" issue so that transcriptions
are in fact displayed correctly.

Not sure if keeping the transcriptions will break any required dict
format.  Also there will be the additional requirement of needing perl
to build successfully. It should work for any perl version 5.6 and up.

Hopefully someone might find this useful.


Thanks


Chris Donoghue

diff -Naur mueller.orig/debian/rules mueller/debian/rules
--- mueller.orig/debian/rules   2005-03-18 20:32:37.000000000 +1100
+++ mueller/debian/rules        2005-03-22 12:19:50.000000000 +1100
@@ -21,17 +21,22 @@
 
        # patch does not set executable flag
        chmod a+x debian/scripts/to-dict.sh
+       chmod a+x debian/scripts/upgrade_trans.pl
 
-       debian/scripts/to-dict.sh --no-trans Mueller7accentGPL.koi 
mueller7accent.notr
-       debian/scripts/to-dict.sh --src-data mueller7accent.notr 
mueller7accent.data 
+       # Keep the phonetic transcription. Most stayed anyway, so let's just 
keep them all. The phonetic transription is upgraded to correct UTF8 encoding 
in the to-dict.sh using perl program
+       # debian/scripts/to-dict.sh --no-trans Mueller7accentGPL.koi 
mueller7accent.notr
+       # debian/scripts/to-dict.sh --src-data mueller7accent.notr 
mueller7accent.data 
+       debian/scripts/to-dict.sh --src-data Mueller7accentGPL.koi 
mueller7accent.data
        debian/scripts/to-dict.sh --data-dict mueller7accent.data mueller7accent
        -rm -f mueller7.data mueller7.notr
        debian/scripts/to-dict.sh --expand-index mueller7accent.index 
mueller7accent.index.exp
        sort -k 1,1 mueller7accent.index.exp > mueller7accent.index     
        -rm -f mueller7accent.index.exp
 
-       debian/scripts/to-dict.sh --no-trans Mueller7GPL.koi mueller7.notr
-       debian/scripts/to-dict.sh --src-data mueller7.notr mueller7.data
+       # Keep the phonetic transcription. Most stayed anyway, so let's just 
keep them all. The phonetic transription is upgraded to correct UTF8 encoding 
in the to-dict.sh using perl program
+       # debian/scripts/to-dict.sh --no-trans Mueller7GPL.koi mueller7.notr
+       # debian/scripts/to-dict.sh --src-data mueller7.notr mueller7.data
+       debian/scripts/to-dict.sh --src-data Mueller7GPL.koi mueller7.data
        debian/scripts/to-dict.sh --data-dict mueller7.data mueller7
        -rm -f mueller7.data mueller7.notr
        debian/scripts/to-dict.sh --expand-index mueller7.index 
mueller7.index.exp
diff -Naur mueller.orig/debian/scripts/to-dict.sh 
mueller/debian/scripts/to-dict.sh
--- mueller.orig/debian/scripts/to-dict.sh      2005-03-18 20:32:37.000000000 
+1100
+++ mueller/debian/scripts/to-dict.sh   2005-03-22 12:19:50.000000000 +1100
@@ -13,6 +13,9 @@
 DICTFMT=`which dictfmt`
 DICTZIP=`which dictzip`
 
+# and upgrade phonetics transcription perl script
+UPGTRANS=`dirname $0`/upgrade_trans.pl
+
 INFO () {
   echo "
 to-dict, version $version ($versiondate).
@@ -166,6 +169,7 @@
 #          -s "$TITLE" $3 < $2 || exit 1
 
        recode -f KOI8-RU..UTF-8 < $2 |\
+        LC_ALL=C $UPGTRANS |\
         LOCPATH=locale dictfmt -p  --allchars --locale ru_RU.utf-8\
          -u "http://www.chat.ru/~mueller_dic";  -s "$TITLE" $3 
 
diff -Naur mueller.orig/debian/scripts/upgrade_trans.pl 
mueller/debian/scripts/upgrade_trans.pl
--- mueller.orig/debian/scripts/upgrade_trans.pl        1970-01-01 
10:00:00.000000000 +1000
+++ mueller/debian/scripts/upgrade_trans.pl     2005-03-22 14:31:39.000000000 
+1100
@@ -0,0 +1,34 @@
+#!/usr/bin/perl 
+
+while(<STDIN>)
+{
+  $linemod=$_;
+  $linemod=~s/\[(.*?)\]/&pronmod($&)/eg;
+  print $linemod;
+
+}
+
+sub pronmod
+{
+    $phword=$_[0]; $word=$phword;
+    $chf=chr(0x51); $cht=chr(0xc3).chr(0xa6); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x41); $cht=chr(0xc9).chr(0x91); $phword=~s/$chf/$cht/g;
+    $chf=chr(0xd0).chr(0xab); $cht=chr(0xcb).chr(0x90); $phword=~s/$chf/$cht/g;
+    $chf=chr(0xe2).chr(0x95).chr(0x9a); $cht=chr(0xc9).chr(0x99); 
$phword=~s/$chf/$cht/g;
+    $chf=chr(0x45); $cht=chr(0xc9).chr(0x9b); $phword=~s/$chf/$cht/g;
+    $chf=chr(0xe2).chr(0x96).chr(0x88); $cht=chr(0xc9).chr(0x94); 
$phword=~s/$chf/$cht/g;
+    $chf=chr(0xd1).chr(0x86); $cht=chr(0xca).chr(0x8c); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x49); $cht=chr(0x69); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x69); $cht=chr(0x69); $phword=~s/$chf/$cht/g;
+    $chf=chr(0xd1).chr(0x85); $cht=chr(0xcb).chr(0x88); $phword=~s/$chf/$cht/g;
+    $chf=chr(0xd0).chr(0xb3); $cht=chr(0xcb).chr(0x8c); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x5a); $cht=chr(0xca).chr(0x92); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x4e); $cht=chr(0xc5).chr(0x8b); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x53); $cht=chr(0xca).chr(0x83); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x44); $cht=chr(0xc3).chr(0xb0); $phword=~s/$chf/$cht/g;
+    $chf=chr(0x54); $cht=chr(0xce).chr(0xb8); $phword=~s/$chf/$cht/g;
+
+    return $phword;
+}
+
+

Reply via email to