Just a word of caution: I've been bitten by this bug, which affects Tika 0.6:
https://issues.apache.org/jira/browse/PDFBOX-541
It causes the parser to go into an infinite loop, which isn't exactly great
for server stability. Tika 0.4 is not affected in the same way - as far as I
remember, the p
I just copied in the newer .jars and got rid of the old ones and
everything seemed to work smoothly enough.
Liam
On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote:
> I've got a task open to upgrade to 0.6. Will try to get to it this week.
> Upgrading is usually pretty trivial.
>
>
> O
I've got a task open to upgrade to 0.6. Will try to get to it this week.
Upgrading is usually pretty trivial.
On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote:
> Afternoon,
>
> I've got a large collections of documents which I'm attempting to add to
> a Solr index using Tika via the Extracti
Afternoon,
I've got a large collections of documents which I'm attempting to add to
a Solr index using Tika via the ExtractingRequestHandler, but there are
a large number that it has problems with (PDFs, PPTX and XLS documents
mainly).
I've tried them with the most recent stand alone version of