On 2019/02/16 16:58, Raphael Graf wrote: > On Fri, Feb 15, 2019 at 08:51:32PM +0000, Stuart Henderson wrote: > > On 2019/02/15 19:28, Raphael Graf wrote: > > > There are lots of changes since 3.04.00: > > > https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes > > > > > > I have tested on amd64 and macppc, the result look very good. > > > The following dependent ports still compile: > > > graphics/pdfsandwich > > > mail/p5-FuzzyOcr > > > multimedia/ogmrip > > > x11/gnome/ocrfeeder > > > (at least pdfsandwich and ocrfeeder seem to work) > > > > > > The tessdata packages use the language data files from the 'tessdata_fast' > > > repository: https://github.com/tesseract-ocr/tessdata_fast > > > From the README: > > > "Most users will want to use these traineddata files to do OCR and these > > > will > > > be shipped as part of Linux distributions .." > > > > > > Unfortunately, the git-submodules required for running the tests are not > > > included in the distfile, so NO_TEST is set to Yes. > > > > > > I am unsure if the PLIST-* files need a @conflict line, can anyone tell? > > > > > > The DESCR-* text could be improved, but I find the 1995 sentence kind of > > > funny. > > > > > > Any comments? > > > > From a read-through: > > > > > +# The tests require additional git submodules > > > +NO_TEST= Yes > > > > It should be possible to fetch those as supplemental distfiles - > > it's useful to have tests where possible .. > > I think it is not possible without self-hosting the distfiles. > The problem is that some of the submodules are not versioned, like this one: > https://github.com/tesseract-ocr/test
That's the easy bit ;) DISTFILES= ${DISTNAME}${EXTRACT_SUFX} \ tesseract-test-6dd816c{6dd816cdaf3e76153271daf773e562e24c928bf5}.tar.gz:0 \ google-test-bf07131{bf07131c1d0a4e001daeee8936089f8b438b7f30}.tar.gz:1 \ abseil-e21380d{e821380d69a549dc64900693942789d21aa4df5e}.tar.gz:2 MASTER_SITES0= https://github.com/tesseract-ocr/test/archive/ MASTER_SITES1= https://github.com/google/googletest/archive/ MASTER_SITES2= https://github.com/abseil/abseil-cpp/archive/ post-extract: .for i in test googletest abseil rmdir ${WRKSRC}/$i; mv ${WRKDIR}/$i-* ${WRKSRC}/$i .endfor We can also get it to unpack the tessdata_fast files (that it also needs) from the tessdata port: TEST_DEPENDS= graphics/tesseract/tessdata:patch And link them to the directory where it says it wants them: pre-test: ln -s ${WRKDIR}/graphics/tesseract/tessdata/tessdata_fast-4.0.0 ${WRKDIR}/tessdata_fast But then it starts getting silly, some tests want tessdata, tessdata_best, and especially langdata_lstm, which is a 1.2GB set of wordlists, training text, etc. Something doesn't go quite right as it is supposed to look for test files in $top_srcdir/test/testing and data in $top_srcdir/../langdata_lstm and $top_srcdir/../tessdata but for some reason it doesn't set the dir correctly and actually looks in /: /test, /tessdata_data, etc. I haven't figured out what's going on there but by bodging things with symlinks in / I can at least get tests to run. There are failures in a couple of tests but given the above, there's obviously something wrong with how I've got the tests setup which might account for those problems. And the majority of them are successful. So at this point I think I'm OK to forget adding the tests for now. Could you revise the comment though please? e.g. something like # tests require 1GB+ extra files and some fiddling to get them to run NO_TEST= Yes Otherwise OK sthen@ for the update.