I'd also suggest trying extracting text using tika-app (shipped with tika
distribution as executable jar) on the PDF(s) in question to see if problem
is with extraction or with indexing.
Rav
On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson wrote:
> You can index 2B tokens, so upping maxFieldLength
You can index 2B tokens, so upping maxFieldLength should have
fixed your problem at least as far as Solr is concerned. How
many tokens get indexed? I'm not as familiar with Tika, but
there may be some kind of parameter there (although I
don't remember this coming up before)...
Did you restart Solr
Hello Guys,
I am using apache solr 3.3.0 with Tikka 1.0.
I have pdf files which I am pushing into solr for conent searching. Apache
solr is indexing pdf files and I can see them in apache solr admin interface
for search. But the issue is apache solr is not indexing whole file content.
It is index