Bug#692241: libexttextcat: botched encoding in Polish LM

Jakub Wilk Sat, 03 Nov 2012 16:03:17 -0700

Source: libexttextcat
Version: 3.2.0-2

Encoding of the Polish language model is broken. For example, line 25 ofpl.lm has:


³        1649

which should be:

ł        1649

You can recover the encoding by filtering the file through:

iconv -f UTF-8 -t Windows-1252 | iconv -f ISO-8859-2 -t UTF-8

However, I wonder if the language models shouldn't be somehowautomatically rebuilt from the ShortTexts/*.txt files. (Encoding ofpl.txt appears to be correct.)


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#692241: libexttextcat: botched encoding in Polish LM

Reply via email to