Thanks, fixed upstream now as http://cgit.freedesktop.org/libreoffice/libexttextcat/commit/?id=07ad459ca83bddde8dcfad5535b3260386d222ff
re "I wonder if the language models shouldn't be somehow automatically rebuilt from the ShortTexts/*.txt files. (Encoding of pl.txt appears to be correct.)" Not all of the .LM's can be generated from the short texts, e.g. some of the very similar languages need a bit of extra training and tweaking to distinguish them from eachother. The short texts are used in the test suite. FWIW, nearly all the new language models that *I* added are derived from those ShortTexts (see the README), but lots of the older ones, including the Polish one, came from libtextcat originally and were trained with some unknown data. Generally I've assumed that the preexisting ones are of better quality than one trained on the text of the UDHR and left them alone. C. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org