Re: Apache solr not indexing complete pdf file using tikka

2012-04-03 Thread Ravish Bhagdev
I'd also suggest trying extracting text using tika-app (shipped with tika distribution as executable jar) on the PDF(s) in question to see if problem is with extraction or with indexing. Rav On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson wrote: > You can index 2B tokens, so upping maxFieldLength

Re: Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Erick Erickson
You can index 2B tokens, so upping maxFieldLength should have fixed your problem at least as far as Solr is concerned. How many tokens get indexed? I'm not as familiar with Tika, but there may be some kind of parameter there (although I don't remember this coming up before)... Did you restart Solr