I used the admin GUI to get the java info. java.vm.specification.vendor = Sun Microsystems Inc.
Any suggestion? Thanks a lot for your help!! -Gaku Yonik Seeley wrote: > > Not sure why you would be getting an OOM from just indexing, and with > the 1.5G heap you've given the JVM. > Have you tried Sun's JVM? > > -Yonik > > On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote: >> >> Hi all Solr users/developers/experts, >> >> I have the following scenario and I appreciate any advice for tuning my >> solr >> master server. >> >> I have a field in my schema that would index (but not stored) about >> ~10000 >> ids for each document. This field is expected to govern the size of the >> document. Each id can contain up to 6 characters. I figure that there >> are >> two alternatives for this field, one is the use a string multi-valued >> field, >> and the other would be to pass a white-space-delimited string to solr and >> have solr tokenize such string based on whitespace (the text_ws >> fieldType). >> The master server is expected to receive constant stream of updates. >> >> The expected/estimated document size can range from 50k to 100k for a >> single >> document. (I know this is quite large). The number of documents is >> expected >> to be around 200,000 on each master server, and there can be multiple >> master >> servers (sharding). I wish the master can handle more docs too if I can >> figure a way out. >> >> Currently, I'm performing some basic stress tests to simulate the >> indexing >> side on the master server. This stress test would continuously add new >> documents at the rate of about 10 documents every 30 seconds. Autocommit >> is >> being used (50 docs and 180 seconds constraints), but I have no idea if >> this >> is the preferred way. The goal is to keep adding new documents until we >> can >> get at least 200,000 documents (or about 20GB of index) on the master (or >> even more if the server can handle it) >> >> What I experienced from the indexing stress test is that the master >> server >> failed to respond after a while, such as non-pingable when there are >> about >> 30k documents. When looking at the log, they are mostly: >> java.lang.OutOfMemoryError: Java heap space >> OR >> Ping query caused exception: null (this is probably caused by the OOM >> problem) >> >> There were also a few cases that the java process even went away. >> >> Questions: >> 1) Is it better to use the multi-valued string field or the text_ws >> field >> for this large field? >> 2) Is it better to have more outstanding docs per commit or more >> frequent >> commit, in term of maximizing server resources? What is the preferred >> way >> to commit documents assuming that solr master receives updates >> frequently? >> How many updated docs should there be before issuing a commit? >> 3) How to avoid the OOM problem in my case? I'm already doing >> (-Xms1536M >> -Xmx1536M) on a 2-GB machine. Is that not enough? I'm concerned that >> adding >> more Ram would just delay the OOM problem. Any additional JVM option to >> consider? >> 4) Any recommendation for the master server configuration, in a >> sense that I >> can maximize the number of indexed docs? >> 5) How can it disable caching on the master altogether as queries >> won't hit >> the master? >> 6) For an average doc size of 50k-100k, is that too large for solr, >> or even >> solr is the right tool? If not, any alternative? If we are able to >> reduce >> the size of docs, can we expect to index more documents? >> >> The followings are info related to software/hardware/configuration: >> >> Solr version (solr nightly build on 5/23/2008) >> Solr Specification Version: 1.2.2008.05.23.08.06.59 >> Solr Implementation Version: nightly >> Lucene Specification Version: 2.3.2 >> Lucene Implementation Version: 2.3.2 652650 >> Jetty: 6.1.3 >> >> Schema.xml (the section that I think are relevant to the master server.) >> >> <fieldType name="string" class="solr.StrField" sortMissingLast="true" >> omitNorms="true"/> >> <fieldType name="text_ws" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> </analyzer> >> </fieldType> >> >> <field name="id" type="string" indexed="true" stored="true" >> required="true" >> /> >> <field name="hex_id_multi" type="string" indexed="true" stored="false" >> multiValued="true" omitNorms="true"/> >> <field name="hex_id_string" type="text_ws" indexed="true" >> stored="false" >> omitNorms="true"/> >> >> <uniqueKey>id</uniqueKey> >> >> Solrconfig.xml >> <indexDefaults> >> <useCompoundFile>false</useCompoundFile> >> <mergeFactor>10</mergeFactor> >> <maxBufferedDocs>500</maxBufferedDocs> >> <ramBufferSizeMB>50</ramBufferSizeMB> >> <maxMergeDocs>5000</maxMergeDocs> >> <maxFieldLength>20000</maxFieldLength> >> <writeLockTimeout>1000</writeLockTimeout> >> <commitLockTimeout>10000</commitLockTimeout> >> >> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy> >> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler> >> <lockType>single</lockType> >> </indexDefaults> >> >> <mainIndex> >> <useCompoundFile>false</useCompoundFile> >> <ramBufferSizeMB>50</ramBufferSizeMB> >> <mergeFactor>10</mergeFactor> >> <!-- Deprecated --> >> <maxBufferedDocs>500</maxBufferedDocs> >> <maxMergeDocs>5000</maxMergeDocs> >> <maxFieldLength>20000</maxFieldLength> >> <unlockOnStartup>false</unlockOnStartup> >> </mainIndex> >> <updateHandler class="solr.DirectUpdateHandler2"> >> >> <autoCommit> >> <maxDocs>50</maxDocs> >> <maxTime>180000</maxTime> >> </autoCommit> >> <listener event="postCommit" class="solr.RunExecutableListener"> >> <str name="exe">solr/bin/snapshooter</str> >> <str name="dir">.</str> >> <bool name="wait">true</bool> >> </listener> >> </updateHandler> >> >> <query> >> <maxBooleanClauses>50</maxBooleanClauses> >> <filterCache >> class="solr.LRUCache" >> size="0" >> initialSize="0" >> autowarmCount="0"/> >> <queryResultCache >> class="solr.LRUCache" >> size="0" >> initialSize="0" >> autowarmCount="0"/> >> <documentCache >> class="solr.LRUCache" >> size="0" >> initialSize="0" >> autowarmCount="0"/> >> <enableLazyFieldLoading>true</enableLazyFieldLoading> >> >> <queryResultWindowSize>1</queryResultWindowSize> >> <queryResultMaxDocsCached>1</queryResultMaxDocsCached> >> <HashDocSet maxSize="1000" loadFactor="0.75"/> >> <listener event="newSearcher" class="solr.QuerySenderListener"> >> <arr name="queries"> >> <lst> <str name="q">user_id</str> <str name="start">0</str> <str >> name="rows">1</str> </lst> >> <lst><str name="q">static newSearcher warming query from >> solrconfig.xml</str></lst> >> </arr> >> </listener> >> <listener event="firstSearcher" class="solr.QuerySenderListener"> >> <arr name="queries"> >> <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str >> name="rows">10</str> </lst> >> <lst><str name="q">static firstSearcher warming query from >> solrconfig.xml</str></lst> >> </arr> >> </listener> >> <useColdSearcher>false</useColdSearcher> >> <maxWarmingSearchers>4</maxWarmingSearchers> >> </query> >> >> Replication: >> The snappuller is scheduled to run every 15 mins for now. >> >> Hardware: >> AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive >> >> OS: >> Fedora 8 (64-bit) >> >> JVM version: >> java version "1.7.0" >> IcedTea Runtime Environment (build 1.7.0-b21) >> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) >> >> Java options: >> java -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M >> -XX:+UseParallelGC -jar start.jar >> >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html Sent from the Solr - User mailing list archive at Nabble.com.