On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote: > I used the admin GUI to get the java info. > java.vm.specification.vendor = Sun Microsystems Inc. Well, your original email listed IcedTea... but that is mostly Sun code, so maybe that's why the vendor is still listed as Sun.
I'd recommend downloading1.6.0_3 from java.sun.com and trying that. Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick with 1.6.0_03 for now. -Yonik > Any suggestion? Thanks a lot for your help!! > > -Gaku > > > Yonik Seeley wrote: >> >> Not sure why you would be getting an OOM from just indexing, and with >> the 1.5G heap you've given the JVM. >> Have you tried Sun's JVM? >> >> -Yonik >> >> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote: >>> >>> Hi all Solr users/developers/experts, >>> >>> I have the following scenario and I appreciate any advice for tuning my >>> solr >>> master server. >>> >>> I have a field in my schema that would index (but not stored) about >>> ~10000 >>> ids for each document. This field is expected to govern the size of the >>> document. Each id can contain up to 6 characters. I figure that there >>> are >>> two alternatives for this field, one is the use a string multi-valued >>> field, >>> and the other would be to pass a white-space-delimited string to solr and >>> have solr tokenize such string based on whitespace (the text_ws >>> fieldType). >>> The master server is expected to receive constant stream of updates. >>> >>> The expected/estimated document size can range from 50k to 100k for a >>> single >>> document. (I know this is quite large). The number of documents is >>> expected >>> to be around 200,000 on each master server, and there can be multiple >>> master >>> servers (sharding). I wish the master can handle more docs too if I can >>> figure a way out. >>> >>> Currently, I'm performing some basic stress tests to simulate the >>> indexing >>> side on the master server. This stress test would continuously add new >>> documents at the rate of about 10 documents every 30 seconds. Autocommit >>> is >>> being used (50 docs and 180 seconds constraints), but I have no idea if >>> this >>> is the preferred way. The goal is to keep adding new documents until we >>> can >>> get at least 200,000 documents (or about 20GB of index) on the master (or >>> even more if the server can handle it) >>> >>> What I experienced from the indexing stress test is that the master >>> server >>> failed to respond after a while, such as non-pingable when there are >>> about >>> 30k documents. When looking at the log, they are mostly: >>> java.lang.OutOfMemoryError: Java heap space >>> OR >>> Ping query caused exception: null (this is probably caused by the OOM >>> problem) >>> >>> There were also a few cases that the java process even went away. >>> >>> Questions: >>> 1) Is it better to use the multi-valued string field or the text_ws >>> field >>> for this large field? >>> 2) Is it better to have more outstanding docs per commit or more >>> frequent >>> commit, in term of maximizing server resources? What is the preferred >>> way >>> to commit documents assuming that solr master receives updates >>> frequently? >>> How many updated docs should there be before issuing a commit? >>> 3) How to avoid the OOM problem in my case? I'm already doing >>> (-Xms1536M >>> -Xmx1536M) on a 2-GB machine. Is that not enough? I'm concerned that >>> adding >>> more Ram would just delay the OOM problem. Any additional JVM option to >>> consider? >>> 4) Any recommendation for the master server configuration, in a >>> sense that I >>> can maximize the number of indexed docs? >>> 5) How can it disable caching on the master altogether as queries >>> won't hit >>> the master? >>> 6) For an average doc size of 50k-100k, is that too large for solr, >>> or even >>> solr is the right tool? If not, any alternative? If we are able to >>> reduce >>> the size of docs, can we expect to index more documents? >>> >>> The followings are info related to software/hardware/configuration: >>> >>> Solr version (solr nightly build on 5/23/2008) >>> Solr Specification Version: 1.2.2008.05.23.08.06.59 >>> Solr Implementation Version: nightly >>> Lucene Specification Version: 2.3.2 >>> Lucene Implementation Version: 2.3.2 652650 >>> Jetty: 6.1.3 >>> >>> Schema.xml (the section that I think are relevant to the master server.) >>> >>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" >>> omitNorms="true"/> >>> <fieldType name="text_ws" class="solr.TextField" >>> positionIncrementGap="100"> >>> <analyzer> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> </analyzer> >>> </fieldType> >>> >>> <field name="id" type="string" indexed="true" stored="true" >>> required="true" >>> /> >>> <field name="hex_id_multi" type="string" indexed="true" stored="false" >>> multiValued="true" omitNorms="true"/> >>> <field name="hex_id_string" type="text_ws" indexed="true" >>> stored="false" >>> omitNorms="true"/> >>> >>> <uniqueKey>id</uniqueKey> >>> >>> Solrconfig.xml >>> <indexDefaults> >>> <useCompoundFile>false</useCompoundFile> >>> <mergeFactor>10</mergeFactor> >>> <maxBufferedDocs>500</maxBufferedDocs> >>> <ramBufferSizeMB>50</ramBufferSizeMB> >>> <maxMergeDocs>5000</maxMergeDocs> >>> <maxFieldLength>20000</maxFieldLength> >>> <writeLockTimeout>1000</writeLockTimeout> >>> <commitLockTimeout>10000</commitLockTimeout> >>> >>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy> >>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler> >>> <lockType>single</lockType> >>> </indexDefaults> >>> >>> <mainIndex> >>> <useCompoundFile>false</useCompoundFile> >>> <ramBufferSizeMB>50</ramBufferSizeMB> >>> <mergeFactor>10</mergeFactor> >>> <!-- Deprecated --> >>> <maxBufferedDocs>500</maxBufferedDocs> >>> <maxMergeDocs>5000</maxMergeDocs> >>> <maxFieldLength>20000</maxFieldLength> >>> <unlockOnStartup>false</unlockOnStartup> >>> </mainIndex> >>> <updateHandler class="solr.DirectUpdateHandler2"> >>> >>> <autoCommit> >>> <maxDocs>50</maxDocs> >>> <maxTime>180000</maxTime> >>> </autoCommit> >>> <listener event="postCommit" class="solr.RunExecutableListener"> >>> <str name="exe">solr/bin/snapshooter</str> >>> <str name="dir">.</str> >>> <bool name="wait">true</bool> >>> </listener> >>> </updateHandler> >>> >>> <query> >>> <maxBooleanClauses>50</maxBooleanClauses> >>> <filterCache >>> class="solr.LRUCache" >>> size="0" >>> initialSize="0" >>> autowarmCount="0"/> >>> <queryResultCache >>> class="solr.LRUCache" >>> size="0" >>> initialSize="0" >>> autowarmCount="0"/> >>> <documentCache >>> class="solr.LRUCache" >>> size="0" >>> initialSize="0" >>> autowarmCount="0"/> >>> <enableLazyFieldLoading>true</enableLazyFieldLoading> >>> >>> <queryResultWindowSize>1</queryResultWindowSize> >>> <queryResultMaxDocsCached>1</queryResultMaxDocsCached> >>> <HashDocSet maxSize="1000" loadFactor="0.75"/> >>> <listener event="newSearcher" class="solr.QuerySenderListener"> >>> <arr name="queries"> >>> <lst> <str name="q">user_id</str> <str name="start">0</str> <str >>> name="rows">1</str> </lst> >>> <lst><str name="q">static newSearcher warming query from >>> solrconfig.xml</str></lst> >>> </arr> >>> </listener> >>> <listener event="firstSearcher" class="solr.QuerySenderListener"> >>> <arr name="queries"> >>> <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str >>> name="rows">10</str> </lst> >>> <lst><str name="q">static firstSearcher warming query from >>> solrconfig.xml</str></lst> >>> </arr> >>> </listener> >>> <useColdSearcher>false</useColdSearcher> >>> <maxWarmingSearchers>4</maxWarmingSearchers> >>> </query> >>> >>> Replication: >>> The snappuller is scheduled to run every 15 mins for now. >>> >>> Hardware: >>> AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive >>> >>> OS: >>> Fedora 8 (64-bit) >>> >>> JVM version: >>> java version "1.7.0" >>> IcedTea Runtime Environment (build 1.7.0-b21) >>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) >>> >>> Java options: >>> java -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M >>> -XX:+UseParallelGC -jar start.jar >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html > Sent from the Solr - User mailing list archive at Nabble.com. > >