Muhammed,

It sounds like you are talking about the ratio of original data size vs. index 
size.  The exact ratio depends on things such as:
- whether you store fields or just index them
- whether you compress fields if you store them
- whether you have term vectors enabled or not
- analyzers and what they do - they could stem tokens, remove them, etc., but 
they could also insert synonyms, and so on
- nature of the input text - term distribution/variance

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Muhammed Sameer <samix_...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, May 25, 2009 1:22:15 PM
> Subject: Re: Index size concerns
> 
> 
> Salaam,
> 
> Sorry for this here is the big picture
> 
> Actually we use solr to index all the mails that come to us so that we can 
> allow 
> for faster look ups.
> 
> We have seen that after our mail server accepts say a GB of mails the index 
> size 
> goes upto 800MB 
> 
> I hope that this time I am clear in conveying the problem
> 
> What I wanted to know is that is this index size normal ?
> 
> Regards,
> Muhammed Sameer
> 
> --- On Mon, 5/25/09, Shalin Shekhar Mangar wrote:
> 
> > From: Shalin Shekhar Mangar 
> > Subject: Re: Index size concerns
> > To: solr-user@lucene.apache.org
> > Date: Monday, May 25, 2009, 11:19 AM
> > On Mon, May 25, 2009 at 3:53 PM,
> > Muhammed Sameer wrote:
> > 
> > >
> > > We are using apache-solr to index our files for faster
> > searches, all things
> > > happen without a problem, my only concern is the size
> > of the cache.
> > >
> > > It seems that the trend is that the if I cache 1 GB of
> > files the index goes
> > > to 800MB ie we are seeing a 80% cache size.
> > >
> > > Is this normal or am I missing something in the
> > configuration of solr
> > >
> > 
> > I'm sorry I do not understand your question. Which files
> > are you talking
> > about? The Solr cache has got nothing to do with files. It
> > caches the
> > query/filter results and solr documents.
> > 
> > -- 
> > Regards,
> > Shalin Shekhar Mangar.
> > 

Reply via email to