On 2019/02/16 16:58, Raphael Graf wrote:
> On Fri, Feb 15, 2019 at 08:51:32PM +0000, Stuart Henderson wrote:
> > On 2019/02/15 19:28, Raphael Graf wrote:
> > > There are lots of changes since 3.04.00:
> > > https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes
> > > 
> > > I have tested on amd64 and macppc, the result look very good.
> > > The following dependent ports still compile:
> > > graphics/pdfsandwich
> > > mail/p5-FuzzyOcr
> > > multimedia/ogmrip
> > > x11/gnome/ocrfeeder
> > > (at least pdfsandwich and ocrfeeder seem to work)
> > > 
> > > The tessdata packages use the language data files from the 'tessdata_fast'
> > > repository: https://github.com/tesseract-ocr/tessdata_fast
> > > From the README:
> > > "Most users will want to use these traineddata files to do OCR and these 
> > > will
> > > be shipped as part of Linux distributions .."
> > > 
> > > Unfortunately, the git-submodules required for running the tests are not
> > > included in the distfile, so NO_TEST is set to Yes.
> > > 
> > > I am unsure if the PLIST-* files need a @conflict line, can anyone tell?
> > > 
> > > The DESCR-* text could be improved, but I find the 1995 sentence kind of 
> > > funny.
> > > 
> > > Any comments?
> > 
> > From a read-through:
> > 
> > > +# The tests require additional git submodules
> > > +NO_TEST= Yes
> > 
> > It should be possible to fetch those as supplemental distfiles -
> > it's useful to have tests where possible ..
> 
> I think it is not possible without self-hosting the distfiles.
> The problem is that some of the submodules are not versioned, like this one:
> https://github.com/tesseract-ocr/test

That's the easy bit ;)

DISTFILES=      ${DISTNAME}${EXTRACT_SUFX} \
                
tesseract-test-6dd816c{6dd816cdaf3e76153271daf773e562e24c928bf5}.tar.gz:0 \
                   
google-test-bf07131{bf07131c1d0a4e001daeee8936089f8b438b7f30}.tar.gz:1 \
                        
abseil-e21380d{e821380d69a549dc64900693942789d21aa4df5e}.tar.gz:2
MASTER_SITES0=  https://github.com/tesseract-ocr/test/archive/
MASTER_SITES1=  https://github.com/google/googletest/archive/
MASTER_SITES2=  https://github.com/abseil/abseil-cpp/archive/

post-extract:
.for i in test googletest abseil
        rmdir ${WRKSRC}/$i; mv ${WRKDIR}/$i-* ${WRKSRC}/$i
.endfor

We can also get it to unpack the tessdata_fast files (that it also
needs) from the tessdata port:

TEST_DEPENDS=           graphics/tesseract/tessdata:patch

And link them to the directory where it says it wants them:

pre-test:
        ln -s ${WRKDIR}/graphics/tesseract/tessdata/tessdata_fast-4.0.0 
${WRKDIR}/tessdata_fast

But then it starts getting silly, some tests want tessdata, tessdata_best,
and especially langdata_lstm, which is a 1.2GB set of wordlists, training
text, etc.

Something doesn't go quite right as it is supposed to look for test
files in $top_srcdir/test/testing and data in $top_srcdir/../langdata_lstm
and $top_srcdir/../tessdata but for some reason it doesn't set the dir
correctly and actually looks in /: /test, /tessdata_data, etc. I haven't
figured out what's going on there but by bodging things with symlinks
in / I can at least get tests to run.

There are failures in a couple of tests but given the above,
there's obviously something wrong with how I've got the tests setup
which might account for those problems. And the majority of them are
successful.

So at this point I think I'm OK to forget adding the tests for now.
Could you revise the comment though please? e.g. something like

# tests require 1GB+ extra files and some fiddling to get them to run
NO_TEST=        Yes

Otherwise OK sthen@ for the update.

Reply via email to