I'd also suggest trying extracting text using tika-app (shipped with tika
distribution as executable jar) on the PDF(s) in question to see if problem
is with extraction or with indexing.
Rav
On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson wrote:
> You can index 2B tokens, so upping maxFieldLength
You can index 2B tokens, so upping maxFieldLength should have
fixed your problem at least as far as Solr is concerned. How
many tokens get indexed? I'm not as familiar with Tika, but
there may be some kind of parameter there (although I
don't remember this coming up before)...
Did you restart Solr