> To achieve this I have to keep the document field to "stored" right?

Yes, the field needs to be stored to return snippets.


> When I do this my index becomes huge 10 GB index, cause I have 10K
> docs but each is very lengthy HTML.  Is there any better solution?
> Why is index created by nutch so small in comparison (about 27 mb
> approx) but it still returns snippets!

Are you storing the complete html? If so I think you should strip out
the html then index the document.




>
> On 10/9/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
> > Late reply on this but I just wanted to say thanks for the
> > suggestions. I went through my whole schema and was storing things
> > that didn't need to be stored and indexing a lot of things that didn't
> > need to be indexed. Just completed a full reindex and it's a much more
> > reasonable size now.
> >
> > Kevin
> >
> > On 8/20/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> > >
> > > On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:
> > >
> > > > Are there any tips on reducing the index size or what factors most
> > > > impact index size?
> > > >
> > > > My index has 2.7 million documents and is 200 gigabytes and growing.
> > > > Most documents are around 2-3kb and there are about 30 indexed fields.
> > >
> > > An "ls -sh" will tell you roughly where the the space is being
> > > occupied.  There is something strange going on: 2.5kB * 2.7m is only
> > > 6GB, and I have trouble imagining where the 30-fold index size
> > > expansion is coming from.
> > >
> > > -Mike
> > >
> >
>

Reply via email to