Hi, Can you try to use single SOLR instance with heavy RAM (so that ramBufferSizeMB=8192 for instance) and mergeFactor=10? Single SOLR instance is fast enough (> 100 client threads of Tomcat; configurable) - I usually prefer single instance for single "writable" box with heavy RAM allocation and good I/O.
Merging 15 indexes and 4-times larger size could happen, for instance, because of differences in SOLR Schema and Lucene; ensure that schema is the same (using Luke for instance). SOLR 1.4 has some new powerful features such as document->term cache stored somewhere (uninverted index) (Yonik), term vectors, stored=true, copyField, etc. Do not do commit per 100; do it once at the end... -----Original Message----- From: engy.ali [mailto:omeshm...@hotmail.com] Sent: August-25-09 3:31 PM To: solr-user@lucene.apache.org Subject: Solr index - Size and indexing speed Summary =============== I had about 120,000 object of total size 71.2 GB, those objects are already indexed using Lucene. The index size is about 111 GB. I tried to use solr 1.4 nightly build to index the same collection. I divided collection on three servers, each server had 5 solr instances (not solr cores) up and running. After collection had been indexed, i merge the 15 indexes. Problems ============== 1. The new merged index size is about 411 GB (i.e: 4 times larger than old index using lucene) I tried to index only on object using lucene and same object using solr to verify the size and the result was that the new index is about twice size of old index. DO you have any idea what might be the reason? 2. the indexing speed is slow, 100 object on single solr instance were indexed in 1 hour so i estimated that 1000 on single instance can be done in 10 hours, but that was not the case, the indexing time exceeds estimated time by about 12 hour. is that might be related to the growth of index?if not, so what might be the reason. Note: I do a commit/100 object and an optimize by the end of the whole operation. I also changed the mergeFactor from 10 to 15. 3. I google and found out that solr is using an inverted index, but I want to know what is the internal structure of solr index,for example if i have a word and its stems, how it will be store in the index Thanks, Engy -- View this message in context: http://www.nabble.com/Solr-index---Size-and-indexing-speed-tp25140702p251407 02.html Sent from the Solr - User mailing list archive at Nabble.com.