Re: Solr indexing configuration help

Gaku Mak Wed, 28 May 2008 19:31:12 -0700

I used the admin GUI to get the java info.
java.vm.specification.vendor = Sun Microsystems Inc.


Any suggestion?  Thanks a lot for your help!!

-Gaku


Yonik Seeley wrote:
> 
> Not sure why you would be getting an OOM from just indexing, and with
> the 1.5G heap you've given the JVM.
> Have you tried Sun's JVM?
> 
> -Yonik
> 
> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>
>> Hi all Solr users/developers/experts,
>>
>> I have the following scenario and I appreciate any advice for tuning my
>> solr
>> master server.
>>
>> I have a field in my schema that would index (but not stored) about
>> ~10000
>> ids for each document.  This field is expected to govern the size of the
>> document.  Each id can contain up to 6 characters.  I figure that there
>> are
>> two alternatives for this field, one is the use a string multi-valued
>> field,
>> and the other would be to pass a white-space-delimited string to solr and
>> have solr tokenize such string based on whitespace (the text_ws
>> fieldType).
>> The master server is expected to receive constant stream of updates.
>>
>> The expected/estimated document size can range from 50k to 100k for a
>> single
>> document.  (I know this is quite large). The number of documents is
>> expected
>> to be around 200,000 on each master server, and there can be multiple
>> master
>> servers (sharding).  I wish the master can handle more docs too if I can
>> figure a way out.
>>
>> Currently, I'm performing some basic stress tests to simulate the
>> indexing
>> side on the master server.  This stress test would continuously add new
>> documents at the rate of about 10 documents every 30 seconds.  Autocommit
>> is
>> being used (50 docs and 180 seconds constraints), but I have no idea if
>> this
>> is the preferred way.  The goal is to keep adding new documents until we
>> can
>> get at least 200,000 documents (or about 20GB of index) on the master (or
>> even more if the server can handle it)
>>
>> What I experienced from the indexing stress test is that the master
>> server
>> failed to respond after a while, such as non-pingable when there are
>> about
>> 30k documents.  When looking at the log, they are mostly:
>> java.lang.OutOfMemoryError: Java heap space
>> OR
>> Ping query caused exception: null (this is probably caused by the OOM
>> problem)
>>
>> There were also a few cases that the java process even went away.
>>
>> Questions:
>> 1)      Is it better to use the multi-valued string field or the text_ws
>> field
>> for this large field?
>> 2)      Is it better to have more outstanding docs per commit or more
>> frequent
>> commit, in term of maximizing server resources?  What is the preferred
>> way
>> to commit documents assuming that solr master receives updates
>> frequently?
>> How many updated docs should there be before issuing a commit?
>> 3)      How to avoid the OOM problem in my case? I'm already doing
>> (-Xms1536M
>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>> adding
>> more Ram would just delay the OOM problem.  Any additional JVM option to
>> consider?
>> 4)      Any recommendation for the master server configuration, in a
>> sense that I
>> can maximize the number of indexed docs?
>> 5)      How can it disable caching on the master altogether as queries
>> won't hit
>> the master?
>> 6)      For an average doc size of 50k-100k, is that too large for solr,
>> or even
>> solr is the right tool? If not, any alternative?  If we are able to
>> reduce
>> the size of docs, can we expect to index more documents?
>>
>> The followings are info related to software/hardware/configuration:
>>
>> Solr version (solr nightly build on 5/23/2008)
>>        Solr Specification Version: 1.2.2008.05.23.08.06.59
>>        Solr Implementation Version: nightly
>>        Lucene Specification Version: 2.3.2
>>        Lucene Implementation Version: 2.3.2 652650
>>        Jetty: 6.1.3
>>
>> Schema.xml (the section that I think are relevant to the master server.)
>>
>>    <fieldType name="string" class="solr.StrField" sortMissingLast="true"
>> omitNorms="true"/>
>>    <fieldType name="text_ws" class="solr.TextField"
>> positionIncrementGap="100">
>>      <analyzer>
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>      </analyzer>
>>    </fieldType>
>>
>> <field name="id" type="string" indexed="true" stored="true"
>> required="true"
>> />
>> <field name="hex_id_multi" type="string" indexed="true" stored="false"
>> multiValued="true" omitNorms="true"/>
>>        <field name="hex_id_string" type="text_ws" indexed="true"
>> stored="false"
>> omitNorms="true"/>
>>
>> <uniqueKey>id</uniqueKey>
>>
>> Solrconfig.xml
>>  <indexDefaults>
>>    <useCompoundFile>false</useCompoundFile>
>>    <mergeFactor>10</mergeFactor>
>>    <maxBufferedDocs>500</maxBufferedDocs>
>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>    <maxMergeDocs>5000</maxMergeDocs>
>>    <maxFieldLength>20000</maxFieldLength>
>>    <writeLockTimeout>1000</writeLockTimeout>
>>    <commitLockTimeout>10000</commitLockTimeout>
>>
>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>
>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>
>>    <lockType>single</lockType>
>>  </indexDefaults>
>>
>>  <mainIndex>
>>    <useCompoundFile>false</useCompoundFile>
>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>    <mergeFactor>10</mergeFactor>
>>    <!-- Deprecated -->
>>    <maxBufferedDocs>500</maxBufferedDocs>
>>    <maxMergeDocs>5000</maxMergeDocs>
>>    <maxFieldLength>20000</maxFieldLength>
>>    <unlockOnStartup>false</unlockOnStartup>
>>  </mainIndex>
>>  <updateHandler class="solr.DirectUpdateHandler2">
>>
>>    <autoCommit>
>>      <maxDocs>50</maxDocs>
>>      <maxTime>180000</maxTime>
>>    </autoCommit>
>>    <listener event="postCommit" class="solr.RunExecutableListener">
>>      <str name="exe">solr/bin/snapshooter</str>
>>      <str name="dir">.</str>
>>      <bool name="wait">true</bool>
>>    </listener>
>>  </updateHandler>
>>
>>  <query>
>>    <maxBooleanClauses>50</maxBooleanClauses>
>>    <filterCache
>>      class="solr.LRUCache"
>>      size="0"
>>      initialSize="0"
>>      autowarmCount="0"/>
>>    <queryResultCache
>>      class="solr.LRUCache"
>>      size="0"
>>      initialSize="0"
>>      autowarmCount="0"/>
>>    <documentCache
>>      class="solr.LRUCache"
>>      size="0"
>>      initialSize="0"
>>      autowarmCount="0"/>
>>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>
>>    <queryResultWindowSize>1</queryResultWindowSize>
>>    <queryResultMaxDocsCached>1</queryResultMaxDocsCached>
>>    <HashDocSet maxSize="1000" loadFactor="0.75"/>
>>    <listener event="newSearcher" class="solr.QuerySenderListener">
>>      <arr name="queries">
>>        <lst> <str name="q">user_id</str> <str name="start">0</str> <str
>> name="rows">1</str> </lst>
>>        <lst><str name="q">static newSearcher warming query from
>> solrconfig.xml</str></lst>
>>      </arr>
>>    </listener>
>>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>>      <arr name="queries">
>>        <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str
>> name="rows">10</str> </lst>
>>        <lst><str name="q">static firstSearcher warming query from
>> solrconfig.xml</str></lst>
>>      </arr>
>>    </listener>
>>    <useColdSearcher>false</useColdSearcher>
>>    <maxWarmingSearchers>4</maxWarmingSearchers>
>>  </query>
>>
>> Replication:
>>        The snappuller is scheduled to run every 15 mins for now.
>>
>> Hardware:
>>        AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>
>> OS:
>>        Fedora 8 (64-bit)
>>
>> JVM version:
>>        java version "1.7.0"
>> IcedTea Runtime Environment (build 1.7.0-b21)
>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>
>> Java options:
>>        java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>> -XX:+UseParallelGC -jar start.jar
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing configuration help

Reply via email to