Re: Solr indexing configuration help

Otis Gospodnetic Wed, 28 May 2008 21:20:47 -0700

Gaku,

But what's this then:


>> JVM version:
>>        java version "1.7.0"
>> IcedTea Runtime Environment (build 1.7.0-b21)
>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)


Get the JVM from Sun.  Also, why do you have autoCommit on if all you are 
testing is indexing?  I'd turn that off.  The Java process going away sounds 
bad and smells like a Java/JVM problem more than Solr problem.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Gaku Mak <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, May 28, 2008 10:30:39 PM
> Subject: Re: Solr indexing configuration help
> 
> 
> I used the admin GUI to get the java info.
> java.vm.specification.vendor = Sun Microsystems Inc.
> 
> Any suggestion?  Thanks a lot for your help!!
> 
> -Gaku
> 
> 
> Yonik Seeley wrote:
> > 
> > Not sure why you would be getting an OOM from just indexing, and with
> > the 1.5G heap you've given the JVM.
> > Have you tried Sun's JVM?
> > 
> > -Yonik
> > 
> > On Wed, May 28, 2008 at 7:35 PM, gaku113 wrote:
> >>
> >> Hi all Solr users/developers/experts,
> >>
> >> I have the following scenario and I appreciate any advice for tuning my
> >> solr
> >> master server.
> >>
> >> I have a field in my schema that would index (but not stored) about
> >> ~10000
> >> ids for each document.  This field is expected to govern the size of the
> >> document.  Each id can contain up to 6 characters.  I figure that there
> >> are
> >> two alternatives for this field, one is the use a string multi-valued
> >> field,
> >> and the other would be to pass a white-space-delimited string to solr and
> >> have solr tokenize such string based on whitespace (the text_ws
> >> fieldType).
> >> The master server is expected to receive constant stream of updates.
> >>
> >> The expected/estimated document size can range from 50k to 100k for a
> >> single
> >> document.  (I know this is quite large). The number of documents is
> >> expected
> >> to be around 200,000 on each master server, and there can be multiple
> >> master
> >> servers (sharding).  I wish the master can handle more docs too if I can
> >> figure a way out.
> >>
> >> Currently, I'm performing some basic stress tests to simulate the
> >> indexing
> >> side on the master server.  This stress test would continuously add new
> >> documents at the rate of about 10 documents every 30 seconds.  Autocommit
> >> is
> >> being used (50 docs and 180 seconds constraints), but I have no idea if
> >> this
> >> is the preferred way.  The goal is to keep adding new documents until we
> >> can
> >> get at least 200,000 documents (or about 20GB of index) on the master (or
> >> even more if the server can handle it)
> >>
> >> What I experienced from the indexing stress test is that the master
> >> server
> >> failed to respond after a while, such as non-pingable when there are
> >> about
> >> 30k documents.  When looking at the log, they are mostly:
> >> java.lang.OutOfMemoryError: Java heap space
> >> OR
> >> Ping query caused exception: null (this is probably caused by the OOM
> >> problem)
> >>
> >> There were also a few cases that the java process even went away.
> >>
> >> Questions:
> >> 1)      Is it better to use the multi-valued string field or the text_ws
> >> field
> >> for this large field?
> >> 2)      Is it better to have more outstanding docs per commit or more
> >> frequent
> >> commit, in term of maximizing server resources?  What is the preferred
> >> way
> >> to commit documents assuming that solr master receives updates
> >> frequently?
> >> How many updated docs should there be before issuing a commit?
> >> 3)      How to avoid the OOM problem in my case? I'm already doing
> >> (-Xms1536M
> >> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
> >> adding
> >> more Ram would just delay the OOM problem.  Any additional JVM option to
> >> consider?
> >> 4)      Any recommendation for the master server configuration, in a
> >> sense that I
> >> can maximize the number of indexed docs?
> >> 5)      How can it disable caching on the master altogether as queries
> >> won't hit
> >> the master?
> >> 6)      For an average doc size of 50k-100k, is that too large for solr,
> >> or even
> >> solr is the right tool? If not, any alternative?  If we are able to
> >> reduce
> >> the size of docs, can we expect to index more documents?
> >>
> >> The followings are info related to software/hardware/configuration:
> >>
> >> Solr version (solr nightly build on 5/23/2008)
> >>        Solr Specification Version: 1.2.2008.05.23.08.06.59
> >>        Solr Implementation Version: nightly
> >>        Lucene Specification Version: 2.3.2
> >>        Lucene Implementation Version: 2.3.2 652650
> >>        Jetty: 6.1.3
> >>
> >> Schema.xml (the section that I think are relevant to the master server.)
> >>
> >>    
> >> omitNorms="true"/>
> >>    
> >> positionIncrementGap="100">
> >>      
> >>        
> >>      
> >>    
> >>
> >> 
> >> required="true"
> >> />
> >> 
> >> multiValued="true" omitNorms="true"/>
> >>        
> >> stored="false"
> >> omitNorms="true"/>
> >>
> >> id
> >>
> >> Solrconfig.xml
> >>  
> >>    false
> >>    10
> >>    500
> >>    50
> >>    5000
> >>    20000
> >>    1000
> >>    10000
> >>
> >> org.apache.lucene.index.LogByteSizeMergePolicy
> >> 
> org.apache.lucene.index.ConcurrentMergeScheduler
> >>    single
> >>  
> >>
> >>  
> >>    false
> >>    50
> >>    10
> >>    
> >>    500
> >>    5000
> >>    20000
> >>    false
> >>  
> >>  
> >>
> >>    
> >>      50
> >>      180000
> >>    
> >>    
> >>      solr/bin/snapshooter
> >>      .
> >>      true
> >>    
> >>  
> >>
> >>  
> >>    50
> >>    
> >>      class="solr.LRUCache"
> >>      size="0"
> >>      initialSize="0"
> >>      autowarmCount="0"/>
> >>    
> >>      class="solr.LRUCache"
> >>      size="0"
> >>      initialSize="0"
> >>      autowarmCount="0"/>
> >>    
> >>      class="solr.LRUCache"
> >>      size="0"
> >>      initialSize="0"
> >>      autowarmCount="0"/>
> >>    true
> >>
> >>    1
> >>    1
> >>    
> >>    
> >>      
> >>        user_id 0 
> >> name="rows">1 
> >>        static newSearcher warming query from
> >> solrconfig.xml
> >>      
> >>    
> >>    
> >>      
> >>        fast_warm 0 
> >> name="rows">10 
> >>        static firstSearcher warming query from
> >> solrconfig.xml
> >>      
> >>    
> >>    false
> >>    4
> >>  
> >>
> >> Replication:
> >>        The snappuller is scheduled to run every 15 mins for now.
> >>
> >> Hardware:
> >>        AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
> >>
> >> OS:
> >>        Fedora 8 (64-bit)
> >>
> >> JVM version:
> >>        java version "1.7.0"
> >> IcedTea Runtime Environment (build 1.7.0-b21)
> >> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
> >>
> >> Java options:
> >>        java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
> >> -XX:+UseParallelGC -jar start.jar
> >>
> >>
> >> --
> >> View this message in context:
> >> 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing configuration help

Reply via email to