Muhammed, It sounds like you are talking about the ratio of original data size vs. index size. The exact ratio depends on things such as: - whether you store fields or just index them - whether you compress fields if you store them - whether you have term vectors enabled or not - analyzers and what they do - they could stem tokens, remove them, etc., but they could also insert synonyms, and so on - nature of the input text - term distribution/variance
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Muhammed Sameer <samix_...@yahoo.com> > To: solr-user@lucene.apache.org > Sent: Monday, May 25, 2009 1:22:15 PM > Subject: Re: Index size concerns > > > Salaam, > > Sorry for this here is the big picture > > Actually we use solr to index all the mails that come to us so that we can > allow > for faster look ups. > > We have seen that after our mail server accepts say a GB of mails the index > size > goes upto 800MB > > I hope that this time I am clear in conveying the problem > > What I wanted to know is that is this index size normal ? > > Regards, > Muhammed Sameer > > --- On Mon, 5/25/09, Shalin Shekhar Mangar wrote: > > > From: Shalin Shekhar Mangar > > Subject: Re: Index size concerns > > To: solr-user@lucene.apache.org > > Date: Monday, May 25, 2009, 11:19 AM > > On Mon, May 25, 2009 at 3:53 PM, > > Muhammed Sameer wrote: > > > > > > > > We are using apache-solr to index our files for faster > > searches, all things > > > happen without a problem, my only concern is the size > > of the cache. > > > > > > It seems that the trend is that the if I cache 1 GB of > > files the index goes > > > to 800MB ie we are seeing a 80% cache size. > > > > > > Is this normal or am I missing something in the > > configuration of solr > > > > > > > I'm sorry I do not understand your question. Which files > > are you talking > > about? The Solr cache has got nothing to do with files. It > > caches the > > query/filter results and solr documents. > > > > -- > > Regards, > > Shalin Shekhar Mangar. > >