Hi All,

I'm facing similar problem.  I want to index entire document as a
field.  But I also want to be able to retrieve snippets (like
Google/Nutch return in results page below the links).

To achieve this I have to keep the document field to "stored" right?
When I do this my index becomes huge 10 GB index, cause I have 10K
docs but each is very lengthy HTML.  Is there any better solution?
Why is index created by nutch so small in comparison (about 27 mb
approx) but it still returns snippets!

Ravish

On 10/9/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
> Late reply on this but I just wanted to say thanks for the
> suggestions. I went through my whole schema and was storing things
> that didn't need to be stored and indexing a lot of things that didn't
> need to be indexed. Just completed a full reindex and it's a much more
> reasonable size now.
>
> Kevin
>
> On 8/20/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >
> > On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:
> >
> > > Are there any tips on reducing the index size or what factors most
> > > impact index size?
> > >
> > > My index has 2.7 million documents and is 200 gigabytes and growing.
> > > Most documents are around 2-3kb and there are about 30 indexed fields.
> >
> > An "ls -sh" will tell you roughly where the the space is being
> > occupied.  There is something strange going on: 2.5kB * 2.7m is only
> > 6GB, and I have trouble imagining where the 30-fold index size
> > expansion is coming from.
> >
> > -Mike
> >
>

Reply via email to