Re: Problem with Pdf, Sol 1.4.1 Cell

Tommaso Teofili Mon, 26 Jul 2010 05:46:11 -0700

Hi,
I think there is an open bug for it at:
https://issues.apache.org/jira/browse/SOLR-1902
Using Solr 1.4.1 and upgrading Tika libraries to 0.8 snapshot I had also to
upgrade pdfbox, fontbox and jembox to 1.2.1; I got no errors and it seems
it's able to index PDFs without any errors (I can query them by id:doc1 for
example) but did not extract text or other metadata from them.
Building a new Solr distribution from trunk (ant distr) and using Tika 0.8
snapshot (with pdfbox, fontbox and jebox 1.2.1) it seems it's working.
My 2 cents,
Tommaso


2010/7/23 Alessandro Benedetti <benedetti.ale...@gmail.com>

> Hi all,
> as I saw in this discussion [1] there were many issues with PDF indexing in
> Solr 1.4  due to TIka library (0.4 Version).
> In Solr 1.4.1 the tika library is the same so I guess  the issues are the
> same.
> Could anyone, who contributed to the previous thread, help me in resolving
> these issues?
> I need a simple tutorial that could help me to upgrade Solr Cell!
>
> Something like this:
> 1) download tika core from trunk
> 2)create jar with maven dependecies
> 3)unjar Sol 1.4.1 and change tika library
> 4)jar the patched Solr 1.4.1 and enjoy!
>
> [1]
>
> http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results
>
> Best regards
>
> --
> --------------------------
>
> Benedetti Alessandro
> Personal Page: http://tigerbolt.altervista.org
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Problem with Pdf, Sol 1.4.1 Cell

Reply via email to