> To achieve this I have to keep the document field to "stored" right?
Yes, the field needs to be stored to return snippets. > When I do this my index becomes huge 10 GB index, cause I have 10K > docs but each is very lengthy HTML. Is there any better solution? > Why is index created by nutch so small in comparison (about 27 mb > approx) but it still returns snippets! Are you storing the complete html? If so I think you should strip out the html then index the document. > > On 10/9/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote: > > Late reply on this but I just wanted to say thanks for the > > suggestions. I went through my whole schema and was storing things > > that didn't need to be stored and indexing a lot of things that didn't > > need to be indexed. Just completed a full reindex and it's a much more > > reasonable size now. > > > > Kevin > > > > On 8/20/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > > > On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: > > > > > > > Are there any tips on reducing the index size or what factors most > > > > impact index size? > > > > > > > > My index has 2.7 million documents and is 200 gigabytes and growing. > > > > Most documents are around 2-3kb and there are about 30 indexed fields. > > > > > > An "ls -sh" will tell you roughly where the the space is being > > > occupied. There is something strange going on: 2.5kB * 2.7m is only > > > 6GB, and I have trouble imagining where the 30-fold index size > > > expansion is coming from. > > > > > > -Mike > > > > > >