I'd also suggest trying extracting text using tika-app (shipped with tika
distribution as executable jar) on the PDF(s) in question to see if problem
is with extraction or with indexing.

Rav

On Mon, Apr 2, 2012 at 1:55 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> You can index 2B tokens, so upping maxFieldLength should have
> fixed your problem at least as far as Solr is concerned. How
> many tokens get indexed? I'm not as familiar with Tika, but
> there may be some kind of parameter there (although I
> don't remember this coming up before)...
>
> Did you restart Solr after making the change to solrconfig.xml?
>
> If you're seeing 10,000 tokens or so, that's the default for
> maxFieldLength....
>
> I'd recommend stopping Solr, "rm -rf <solr home>/data/index"
> and restarting Solr just to be sure you're not seeing leftover
> junk, you'll have to re-index your docs after changing
> the maxLength param.
>
>
> Best
> Erick
>
>
> On Mon, Apr 2, 2012 at 7:19 AM, Manoj Saini <manoj.sa...@stigasoft.com>
> wrote:
> > Hello Guys,
> >
> > I am using apache solr 3.3.0 with Tikka 1.0.
> >
> > I have pdf files which I am pushing into solr for conent searching.
> Apache
> > solr is indexing pdf files and I can see them in apache solr admin
> interface
> > for search. But the issue is apache solr is not indexing whole file
> content.
> > It is indexing upto only limited size.
> >
> > Am I missing something, some configuration, or this is the behavior of
> > apache solr?
> >
> > I have tried to update solrconfig.xml. I have updated ramBufferSizeMB,
> > maxFieldLength.
> >
> > Thanks
> > Manoj Saini
> >
> >
> >
> >
> >
> > Thanks,
> >
> > Best Regards,
> >
> >
> >
> > Manoj Saini | Sr. Software Engineer  | Stigasoft
> >
> > m: +91 98 1034 1281 |
> >
> > e:  <mailto:nseh...@stigasoft.com> manoj.sa...@stigasoft.com | w:
> > <http://www.stigasoft.com> www.stigasoft.com
> >
> >
> >
>

Reply via email to