Hi, I think there is an open bug for it at: https://issues.apache.org/jira/browse/SOLR-1902 Using Solr 1.4.1 and upgrading Tika libraries to 0.8 snapshot I had also to upgrade pdfbox, fontbox and jembox to 1.2.1; I got no errors and it seems it's able to index PDFs without any errors (I can query them by id:doc1 for example) but did not extract text or other metadata from them. Building a new Solr distribution from trunk (ant distr) and using Tika 0.8 snapshot (with pdfbox, fontbox and jebox 1.2.1) it seems it's working. My 2 cents, Tommaso
2010/7/23 Alessandro Benedetti <benedetti.ale...@gmail.com> > Hi all, > as I saw in this discussion [1] there were many issues with PDF indexing in > Solr 1.4 due to TIka library (0.4 Version). > In Solr 1.4.1 the tika library is the same so I guess the issues are the > same. > Could anyone, who contributed to the previous thread, help me in resolving > these issues? > I need a simple tutorial that could help me to upgrade Solr Cell! > > Something like this: > 1) download tika core from trunk > 2)create jar with maven dependecies > 3)unjar Sol 1.4.1 and change tika library > 4)jar the patched Solr 1.4.1 and enjoy! > > [1] > > http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results > > Best regards > > -- > -------------------------- > > Benedetti Alessandro > Personal Page: http://tigerbolt.altervista.org > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >