Source: libexttextcat
Version: 3.2.0-2

Encoding of the Polish language model is broken. For example, line 25 of pl.lm has:

³        1649

which should be:

ł        1649

You can recover the encoding by filtering the file through:

iconv -f UTF-8 -t Windows-1252 | iconv -f ISO-8859-2 -t UTF-8

However, I wonder if the language models shouldn't be somehow automatically rebuilt from the ShortTexts/*.txt files. (Encoding of pl.txt appears to be correct.)

--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to