Re: How to reduce the Solr index size..

Grant Ingersoll Thu, 20 Aug 2009 08:19:45 -0700


On Aug 20, 2009, at 11:00 AM, Silent Surfer wrote:

Hi,

I am newbie to Solr. We recently started using Solr.
We are using Solr to process the server logs. We are creating theindexes for each line of the logs, so that users would be able to doa fine grain search upto second/ms.
Now what we are observing is , the index size that is being createdis almost double the size of the actual log size. i.e if the logssize is say 1 MB, the actual index size is around 2 MB.
Could anyone let us know what can be done to reduce the index size.Do we need to change any configurations/delete any files which arecreated during the indexing processes, but not required forsearching..
Our schema is as follows:
<field name="pkey" type="string" indexed="true" stored="true"required="false" /><field name="date" type="date" indexed="true" stored="true"omitNorms="true"/>
  <field name="level" type="string" indexed="true" stored="true"/>
  <field name="app" type="string" indexed="true" stored="true"/>
  <field name="server" type="string" indexed="true" stored="true"/>
  <field name="port" type="string" indexed="true" stored="true"/>
  <field name="class" type="string" indexed="true" stored="true"/>
  <field name="method" type="string" indexed="true" stored="true"/>
  <field name="filename" type="string" indexed="true" stored="true"/>
<field name="linenumber" type="string" indexed="true"stored="true"/>
  <field name="message" type="text" indexed="true" stored="true"/>

message field holds the actual logtext.


There are a couple of things you can do:

1. stored = true only needs to be on if you are going to use thatvalue later in your application (i.e. for display). Storage is notneeded for search.2. You can omitNorms and termFreqsAndPositions for any fields that youaren't searching (but just displaying).

A doubling in size seems a bit much. However, 1 MB is likely notenough to show whether this holds true for a larger index. Oftentimes, the growth of the index is sublinear, since the same termsappear over and over again and Lucene can obtain pretty high levels ofcompression.

Also, are you adding any other content to what comes in (synonyms,etc.)?

I would open up the index in Luke, too and make sure everything looksright.



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: How to reduce the Solr index size..

Reply via email to