Re: Apache solr not indexing complete pdf file using tikka

2012-04-03 Thread Ravish Bhagdev
I'd also suggest trying extracting text using tika-app (shipped with tika distribution as executable jar) on the PDF(s) in question to see if problem is with extraction or with indexing. Rav On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson wrote: > You can index 2B tokens, so upping maxFieldLength

Re: Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Erick Erickson
You can index 2B tokens, so upping maxFieldLength should have fixed your problem at least as far as Solr is concerned. How many tokens get indexed? I'm not as familiar with Tika, but there may be some kind of parameter there (although I don't remember this coming up before)... Did you restart Solr

Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Manoj Saini
Hello Guys, I am using apache solr 3.3.0 with Tikka 1.0. I have pdf files which I am pushing into solr for conent searching. Apache solr is indexing pdf files and I can see them in apache solr admin interface for search. But the issue is apache solr is not indexing whole file content. It is index