Re: Debugging on Tika

Oleg Tikhonov Fri, 03 Feb 2012 02:37:19 -0800

Hi Arkadi,

You can try to extract text from your documents using Tika's CLI (more
details http://tika.apache.org/0.7/gettingstarted.html).
If you were succeeded that means that something goes wrong during the
indexing. Tika only extracts text and metadata from the documents and sends
this text to the Lucene. Lucene itself constructs the index. That index you
can check using LUKE (http://code.google.com/p/luke/).


Hope it helps.

Oleg


On Fri, Feb 3, 2012 at 10:43 AM, Arkadi Colson <ark...@smartbit.be> wrote:

> Hi
>
> I'm using Tika 0.10 for indexing my documents but I am not getting the
> expected results when doing a search. Even after I delete the index and
> started over again.
> Some of the words in for example a PDF document can be found but most of
> them not. Is it related to some language setting perhaps? How can I start
> debugging on Tika? Any tips?
>
> Thx!
>
> --
> Smartbit bvba
> Hoogstraat 13
> B-3670 Meeuwen
> T: +32 11 64 08 80
> F: +32 89 46 81 10
> W: http://www.smartbit.be
> E: ark...@smartbit.be
>
>

Re: Debugging on Tika

Reply via email to