Re: index size tripled during optimization

Qingdi Wed, 28 Jan 2009 09:43:27 -0800

Hi Ryuuichi,

Thanks for your quick reply.
I checked the setting of <useCompoundFile> in solrconfig.xml, and the value
is 'false'. Here is what in our solrconfig.xml.
=======================================================================
  <indexDefaults>
   <!-- Values here affect all index writers and act as a default unless
overridden. -->
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>1000</mergeFactor> <!-- was 10 -->
    <maxBufferedDocs>10000</maxBufferedDocs> <!-- was 1000 -->
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>100000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
    <!--
      As long as Solr is the only process modifying your index, it is
      safe to use Lucene's in process locking mechanism.  But you may
      specify one of the other Lucene LockFactory implementations in
      the event that you have a custom situation.


      none = NoLockFactory (typically only used with read only indexes)
      single = SingleInstanceLockFactory (suggested)
      native = NativeFSLockFactory
      simple = SimpleFSLockFactory

      ('simple' is the default for backwards compatibility with Solr 1.2)
    -->
    <lockType>single</lockType>
  </indexDefaults>

  <mainIndex>
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>100000</maxFieldLength>

    <!-- If true, unlock any held write or commit locks on startup.
         This defeats the locking mechanism that allows multiple
         processes to safely access a lucene index, and should be
         used with care.
         This is not needed if lock type is 'none' or 'single'
     -->
    <unlockOnStartup>false</unlockOnStartup>

    <useRAMDirectory>false</useRAMDirectory>
  </mainIndex>
=======================================================================

Could there be any other reason causing the size tripled?

Thanks.

Qingdi


Ryuuichi KUMAI wrote:
> 
> Hello Qingdi,
> 
> Have you changed the "<useCompoundFile>" setting in solrconfig.xml?
> In my experience, when using compound-file index
> ("<useCompoundFile>true</useCompoundFile>"),
> the size of index grows up to triple during optimization.
> My understanding is that when writing a new segment in compound format,
> Lucene writes the multifile format first and then creates the compound
> index.
> So in the state immediately before optimization ends the size almost
> triples.
> 
> Regards,
> Ryuuichi Kumai.
> 
> 2009/1/28 Qingdi <qin...@nextbio.com>:
>>
>>
>> Hi,
>>
>> Starting about one week ago, our index size gets tripled during
>> optimization.
>>
>> The current index statistics are:
>> numDocs : 192702132
>> size: 76G
>> And we do optimization for every 6M docs update.
>>
>> Since we keep getting new data, the index size increases every day.
>> Before,
>> the index size was only doubled during optimization.
>>
>> Why the index size gets tripled instead of doubled during optimization?
>> Is
>> there anything we can do to keep the index only doubled during
>> optimization?
>>
>> Thanks.
>>
>> Qingdi
>> --
>> View this message in context:
>> http://www.nabble.com/index-size-tripled-during-optimization-tp21691596p21691596.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/index-size-tripled-during-optimization-tp21691596p21710810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index size tripled during optimization

Reply via email to