I'd also suggest trying extracting text using tika-app (shipped with tika distribution as executable jar) on the PDF(s) in question to see if problem is with extraction or with indexing.
Rav On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson <erickerick...@gmail.com>wrote: > You can index 2B tokens, so upping maxFieldLength should have > fixed your problem at least as far as Solr is concerned. How > many tokens get indexed? I'm not as familiar with Tika, but > there may be some kind of parameter there (although I > don't remember this coming up before)... > > Did you restart Solr after making the change to solrconfig.xml? > > If you're seeing 10,000 tokens or so, that's the default for > maxFieldLength.... > > I'd recommend stopping Solr, "rm -rf <solr home>/data/index" > and restarting Solr just to be sure you're not seeing leftover > junk, you'll have to re-index your docs after changing > the maxLength param. > > > Best > Erick > > > On Mon, Apr 2, 2012 at 7:19 AM, Manoj Saini <manoj.sa...@stigasoft.com> > wrote: > > Hello Guys, > > > > I am using apache solr 3.3.0 with Tikka 1.0. > > > > I have pdf files which I am pushing into solr for conent searching. > Apache > > solr is indexing pdf files and I can see them in apache solr admin > interface > > for search. But the issue is apache solr is not indexing whole file > content. > > It is indexing upto only limited size. > > > > Am I missing something, some configuration, or this is the behavior of > > apache solr? > > > > I have tried to update solrconfig.xml. I have updated ramBufferSizeMB, > > maxFieldLength. > > > > Thanks > > Manoj Saini > > > > > > > > > > > > Thanks, > > > > Best Regards, > > > > > > > > Manoj Saini | Sr. Software Engineer | Stigasoft > > > > m: +91 98 1034 1281 | > > > > e: <mailto:nseh...@stigasoft.com> manoj.sa...@stigasoft.com | w: > > <http://www.stigasoft.com> www.stigasoft.com > > > > > > >