Definitely need a flag though, off by default, to continue the current
(no tone number) behavior.

Many people will be wanting to make romanizations for street name
signs, etc. where tone numbers are not appropriate, and just doing
s/\d//g on the output would also zap any innocent numbers in the
input.

While we are here, let's take a look at some Bopomofo etc. issues.
$ perl -ln011we 'print if /諷/' /usr/share/libchewing3/chewing/dict.bat|
perl -C -Mutf8 -MText::Unidecode -wnle 'print "$_\n",unidecode($_),"\n";'

   反諷 49 ㄈㄢˇ ㄈㄥ
   Fan Feng  49 FANV FENG

Better would be
   fan3 feng4  49 fan3 feng1
but there is no 1, so
   fan feng  49 fan3 feng
V is too weird too.

   諷 152 ㄈㄥˋ
   Feng  152 FENG\

   feng4  152 feng4

\ etc. are neat, but 4 is more scientific.
(P.S. this char is actually feng3 in most dictionaries, so actually a
bad example for this bug report.)

   諷語 0 ㄈㄥˋ ㄩˇ
   Feng Yu  0 FENG\ IUV

IU: Please use standard Hanyu pinyin... hmmm interesting case as we don't
want to keep track of it's surroundings... still: yu best. wait: u best.

   諷古說今 0 ㄈㄥˋ ㄍㄨˇ ㄕㄨㄛ ㄐㄧㄣ
   Feng Gu Shuo Jin  0 FENG\ GUV SHUO JIEN

JIEN: jin best. n is best match.

   諷刺 250 ㄈㄥˋ ㄘˋ
   Feng Ci  250 FENG\ C\

hmmm... seems c is indeed best...

   諷刺畫 1 ㄈㄥˋ ㄘˋ ㄏㄨㄚˋ
   Feng Ci Hua  1 FENG\ C\ HUA\

   諷刺性 6 ㄈㄥˋ ㄘˋ ㄒㄧㄥˋ
   Feng Ci Xing  6 FENG\ C\ XIENG\

X I NG: xing

   諷刺詩 3 ㄈㄥˋ ㄘˋ ㄕ
   Feng Ci Shi  3 FENG\ C\ SH

   諷誦 1 ㄈㄥˋ ㄙㄨㄥˋ
   Feng Song  1 FENG\ SUENG\

ah very interesting... S U NG: sung best, but when combined really
song: but we don't want to know this much about surrounding
characters, one could write a post filter if one wanted to fix them up.

   冷嘲熱諷 17 ㄌㄥˇ ㄔㄠˊ ㄖㄜˋ ㄈㄥ
   Leng Chao Re Feng  17 LENGV CHAU/ RE\ FENG

AU: ao

   借古諷今 2 ㄐㄧㄝˋ ㄍㄨˇ ㄈㄥˋ ㄐㄧㄣ
   Jie Gu Feng Jin  2 JIEH\ GUV FENG\ JIEN

Jie, no H.

Wait, just use all the *single* mappings in
http://www.cnpedia.com/pages/knowledge/baserule.htm
Never mind the more-than-one-bopomofo strings.


Reply via email to