Re: Solr indexing configuration help

Yonik Seeley Thu, 29 May 2008 19:19:35 -0700

It's most likely a
1) hardware issue: bad memory
 OR
2) incompatible libraries (most likely libc version for the JVM).


If you have another box around, try that.

-Yonik

On Thu, May 29, 2008 at 9:51 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>
> Hi Yonik and others,
>
> I'm getting this java error after switching to JVM 1.6.0_3.  This error
> occurs after the stress test has been going for a while and failed at 12K
> docs level and at 18K again.  Am I doing something wrong?  Please help!
>
> Thanks!
>
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00002aaaaadfbf6d, pid=25030, tid=1079175504
> #
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0_03-b05 mixed mode)
> # Problematic frame:
> # V  [libjvm.so+0x230f6d]
> #
> # An error report file with more information is saved as hs_err_pid25030.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
>
> -Gaku
>
>
> Yonik Seeley wrote:
>>
>> On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>>> I used the admin GUI to get the java info.
>>> java.vm.specification.vendor = Sun Microsystems Inc.
>> Well, your original email listed IcedTea... but that is mostly Sun
>> code,  so maybe that's why the vendor is still listed as Sun.
>>
>> I'd recommend downloading1.6.0_3 from java.sun.com and trying that.
>>
>> Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick
>> with 1.6.0_03 for now.
>>
>> -Yonik
>>
>>
>>> Any suggestion?  Thanks a lot for your help!!
>>>
>>> -Gaku
>>>
>>>
>>> Yonik Seeley wrote:
>>>>
>>>> Not sure why you would be getting an OOM from just indexing, and with
>>>> the 1.5G heap you've given the JVM.
>>>> Have you tried Sun's JVM?
>>>>
>>>> -Yonik
>>>>
>>>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Hi all Solr users/developers/experts,
>>>>>
>>>>> I have the following scenario and I appreciate any advice for tuning my
>>>>> solr
>>>>> master server.
>>>>>
>>>>> I have a field in my schema that would index (but not stored) about
>>>>> ~10000
>>>>> ids for each document.  This field is expected to govern the size of
>>>>> the
>>>>> document.  Each id can contain up to 6 characters.  I figure that there
>>>>> are
>>>>> two alternatives for this field, one is the use a string multi-valued
>>>>> field,
>>>>> and the other would be to pass a white-space-delimited string to solr
>>>>> and
>>>>> have solr tokenize such string based on whitespace (the text_ws
>>>>> fieldType).
>>>>> The master server is expected to receive constant stream of updates.
>>>>>
>>>>> The expected/estimated document size can range from 50k to 100k for a
>>>>> single
>>>>> document.  (I know this is quite large). The number of documents is
>>>>> expected
>>>>> to be around 200,000 on each master server, and there can be multiple
>>>>> master
>>>>> servers (sharding).  I wish the master can handle more docs too if I
>>>>> can
>>>>> figure a way out.
>>>>>
>>>>> Currently, I'm performing some basic stress tests to simulate the
>>>>> indexing
>>>>> side on the master server.  This stress test would continuously add new
>>>>> documents at the rate of about 10 documents every 30 seconds.
>>>>> Autocommit
>>>>> is
>>>>> being used (50 docs and 180 seconds constraints), but I have no idea if
>>>>> this
>>>>> is the preferred way.  The goal is to keep adding new documents until
>>>>> we
>>>>> can
>>>>> get at least 200,000 documents (or about 20GB of index) on the master
>>>>> (or
>>>>> even more if the server can handle it)
>>>>>
>>>>> What I experienced from the indexing stress test is that the master
>>>>> server
>>>>> failed to respond after a while, such as non-pingable when there are
>>>>> about
>>>>> 30k documents.  When looking at the log, they are mostly:
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>> OR
>>>>> Ping query caused exception: null (this is probably caused by the OOM
>>>>> problem)
>>>>>
>>>>> There were also a few cases that the java process even went away.
>>>>>
>>>>> Questions:
>>>>> 1)      Is it better to use the multi-valued string field or the
>>>>> text_ws
>>>>> field
>>>>> for this large field?
>>>>> 2)      Is it better to have more outstanding docs per commit or more
>>>>> frequent
>>>>> commit, in term of maximizing server resources?  What is the preferred
>>>>> way
>>>>> to commit documents assuming that solr master receives updates
>>>>> frequently?
>>>>> How many updated docs should there be before issuing a commit?
>>>>> 3)      How to avoid the OOM problem in my case? I'm already doing
>>>>> (-Xms1536M
>>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>>> adding
>>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>>> to
>>>>> consider?
>>>>> 4)      Any recommendation for the master server configuration, in a
>>>>> sense that I
>>>>> can maximize the number of indexed docs?
>>>>> 5)      How can it disable caching on the master altogether as queries
>>>>> won't hit
>>>>> the master?
>>>>> 6)      For an average doc size of 50k-100k, is that too large for
>>>>> solr,
>>>>> or even
>>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>>> reduce
>>>>> the size of docs, can we expect to index more documents?
>>>>>
>>>>> The followings are info related to software/hardware/configuration:
>>>>>
>>>>> Solr version (solr nightly build on 5/23/2008)
>>>>>        Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>>        Solr Implementation Version: nightly
>>>>>        Lucene Specification Version: 2.3.2
>>>>>        Lucene Implementation Version: 2.3.2 652650
>>>>>        Jetty: 6.1.3
>>>>>
>>>>> Schema.xml (the section that I think are relevant to the master
>>>>> server.)
>>>>>
>>>>>    <fieldType name="string" class="solr.StrField"
>>>>> sortMissingLast="true"
>>>>> omitNorms="true"/>
>>>>>    <fieldType name="text_ws" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>>      <analyzer>
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>      </analyzer>
>>>>>    </fieldType>
>>>>>
>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>> required="true"
>>>>> />
>>>>> <field name="hex_id_multi" type="string" indexed="true" stored="false"
>>>>> multiValued="true" omitNorms="true"/>
>>>>>        <field name="hex_id_string" type="text_ws" indexed="true"
>>>>> stored="false"
>>>>> omitNorms="true"/>
>>>>>
>>>>> <uniqueKey>id</uniqueKey>
>>>>>
>>>>> Solrconfig.xml
>>>>>  <indexDefaults>
>>>>>    <useCompoundFile>false</useCompoundFile>
>>>>>    <mergeFactor>10</mergeFactor>
>>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>>    <writeLockTimeout>1000</writeLockTimeout>
>>>>>    <commitLockTimeout>10000</commitLockTimeout>
>>>>>
>>>>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>
>>>>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>
>>>>>    <lockType>single</lockType>
>>>>>  </indexDefaults>
>>>>>
>>>>>  <mainIndex>
>>>>>    <useCompoundFile>false</useCompoundFile>
>>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>>    <mergeFactor>10</mergeFactor>
>>>>>    <!-- Deprecated -->
>>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>>    <unlockOnStartup>false</unlockOnStartup>
>>>>>  </mainIndex>
>>>>>  <updateHandler class="solr.DirectUpdateHandler2">
>>>>>
>>>>>    <autoCommit>
>>>>>      <maxDocs>50</maxDocs>
>>>>>      <maxTime>180000</maxTime>
>>>>>    </autoCommit>
>>>>>    <listener event="postCommit" class="solr.RunExecutableListener">
>>>>>      <str name="exe">solr/bin/snapshooter</str>
>>>>>      <str name="dir">.</str>
>>>>>      <bool name="wait">true</bool>
>>>>>    </listener>
>>>>>  </updateHandler>
>>>>>
>>>>>  <query>
>>>>>    <maxBooleanClauses>50</maxBooleanClauses>
>>>>>    <filterCache
>>>>>      class="solr.LRUCache"
>>>>>      size="0"
>>>>>      initialSize="0"
>>>>>      autowarmCount="0"/>
>>>>>    <queryResultCache
>>>>>      class="solr.LRUCache"
>>>>>      size="0"
>>>>>      initialSize="0"
>>>>>      autowarmCount="0"/>
>>>>>    <documentCache
>>>>>      class="solr.LRUCache"
>>>>>      size="0"
>>>>>      initialSize="0"
>>>>>      autowarmCount="0"/>
>>>>>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>>>>
>>>>>    <queryResultWindowSize>1</queryResultWindowSize>
>>>>>    <queryResultMaxDocsCached>1</queryResultMaxDocsCached>
>>>>>    <HashDocSet maxSize="1000" loadFactor="0.75"/>
>>>>>    <listener event="newSearcher" class="solr.QuerySenderListener">
>>>>>      <arr name="queries">
>>>>>        <lst> <str name="q">user_id</str> <str name="start">0</str> <str
>>>>> name="rows">1</str> </lst>
>>>>>        <lst><str name="q">static newSearcher warming query from
>>>>> solrconfig.xml</str></lst>
>>>>>      </arr>
>>>>>    </listener>
>>>>>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>>>>>      <arr name="queries">
>>>>>        <lst> <str name="q">fast_warm</str> <str name="start">0</str>
>>>>> <str
>>>>> name="rows">10</str> </lst>
>>>>>        <lst><str name="q">static firstSearcher warming query from
>>>>> solrconfig.xml</str></lst>
>>>>>      </arr>
>>>>>    </listener>
>>>>>    <useColdSearcher>false</useColdSearcher>
>>>>>    <maxWarmingSearchers>4</maxWarmingSearchers>
>>>>>  </query>
>>>>>
>>>>> Replication:
>>>>>        The snappuller is scheduled to run every 15 mins for now.
>>>>>
>>>>> Hardware:
>>>>>        AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>>
>>>>> OS:
>>>>>        Fedora 8 (64-bit)
>>>>>
>>>>> JVM version:
>>>>>        java version "1.7.0"
>>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>>
>>>>> Java options:
>>>>>        java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>>> -XX:+UseParallelGC -jar start.jar
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr indexing configuration help

Reply via email to