Hi Jeffrey, first of all, thanks for the quick reply! I appreciate you for taking the time and even more so for doing so within only a few hours after my report.
Jeffrey Ratcliffe wrote: > 2009/3/2 Thomas Viehmann <t...@beamnet.de>: >> However, providing the data only in the preprocessed form is (bordering >> on the) problematic w.r.t. DFSG and inconvenient to the users. Please >> use the source upstream data for building the packages. > > How are the language packs problematic with respect to DFSG? Why are > they inconvenient to the users? > > The training data is mostly not available, and even if it were, I > don't see the advantage of building the language packs from it. Well, on the contrary, it seems that the training data is available in the boxtiff files[1]. The DFSG ask to ship source for your packages. The training data is arguably compiled from source, in particular when we see that both the compiler is provided and the source appears to be available. It is inconvenient when one wants to produce an enhanced dataset by training on more characters while incorporating previous data. (Mind you, the hard limit on the number of .tr files for the training programs - MAX_NUM_CONFIGS - also is a bit tedious, but this is a different matter.) Maybe there is another way, but looking at the training instructions[2], I was thinking "well, I should just use my .tif and .box alongside the available ones". I've put the severity as important (as opposed to serious) because the source requirement is not as clear cut with statistical data, but here I ended up in a situation of "the source would come in handy here and I'm using free software to not have to look for it" rather than actively systematically looking into DFSG compliance. Also, it seems to be prudent to not deliberately stay in the more shady areas of source requirement when the source data sets are even available. But then, what do I know. You are a Debian maintainer and I am not, so you are the undisputed authority on the DFSG in this bug report. Thanks for your consideration. Kind regards T. 1. http://code.google.com/p/tesseract-ocr/downloads/list 2. http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract -- Thomas Viehmann, http://thomas.viehmann.net/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org