All,
  I finished the regression tests, and the reports are available here:
http://162.242.228.174/reports/reports_tika_1.22_vs_1.23-pre-rc1.tgz
  My takeaways:
  a) we need to fix the new code in the PDFParser that set's whether or not
there is a digital signature.  That should be set, not add
  b) we are getting a few new exceptions on going over the safety maximum
for byte array allocation in POI.  We can make this configurable at the
Tika level.
  c) there are a few new problems with EMF parsing, but these won't harm
parsing the rest of the file.
  d) both runs (1.22 and 1.23-pre-rc1) only processed ~250k files, but
there were ~500k in the list...I need to figure out what went wrong.

  If I find nothing concerning on d), are we ready to roll 1.23-rc1?

              Cheers,

                           Tim

On Fri, Nov 22, 2019 at 8:25 AM Tim Allison <[email protected]> wrote:

> All,
>   I started the regression tests on a random set of 500k files.  I found
> this morning that it was _still_ going.  It turns out I had accidentally
> configured extract images for PDFs, which adds to the processing time and
> leads to more OOMs.
>   I restarted the regression tests this morning with that feature turned
> off.
>
>        Best,
>
>                    Tim
>

Reply via email to