Hi guys, I have set up a Solr instance and upon attempting to index document, the whole process is painfully slow. I will try to put as much info as I can in this mail. Pl. feel free to ask me anything else that might be required.
I am sending documents in batches not exceeding 2,000. The size of each of them depends but usually is around 10-15MiB. My indexing script tells me that Solr took T seconds to add N documents of size S. For the same data, the Solr Log add QTime is QT. Some of the sample data are: N S T QT ------------------------------------------------------------------------- 390 docs | 3,478,804 Bytes | 14.5s | 2297 852 docs | 6,039,535 Bytes | 25.3s | 4237 1345 docs | 11,147,512 Bytes | 47s | 8543 1147 docs | 9,457,717 Bytes | 44s | 2297 1096 docs | 13,058,204 Bytes | 54.3s | 8782 The time T includes the time of converting an array of Hash objects into XML, POSTing it to Solr and response acknowledged from Solr. Clearly, there is a huge difference between both the time T and QT. After a lot of efforts, I have no clue why these times do not match. The Server has 16 cores, 48GiB RAM. JVM options are -Xms5000M -Xmx5000M -XX:+UseParNewGC I believe my Indexing is getting slow. Relevant portion from my schema file are as follows. On a related note, every document has one dynamic field. Based on this rate, it takes me ~30hrs to do a full index of my database. I would really appreciate kindness of community in order to get this indexing faster. <indexDefaults> <useCompoundFile>false</useCompoundFile> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">10</int> <int name="maxThreadCount">10</int> </mergeScheduler> <ramBufferSizeMB>2048</ramBufferSizeMB> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>3000000</maxFieldLength> <writeLockTimeout>1000</writeLockTimeout> <maxBufferedDocs>50000</maxBufferedDocs> <termIndexInterval>256</termIndexInterval> <mergeFactor>10</mergeFactor> <useCompoundFile>false</useCompoundFile> <!-- <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnceExplicit">19</int> <int name="segmentsPerTier">9</int> </mergePolicy> --> </indexDefaults> <mainIndex> <unlockOnStartup>true</unlockOnStartup> <reopenReaders>true</reopenReaders> <deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> </deletionPolicy> <infoStream file="INFOSTREAM.txt">false</infoStream> </mainIndex> <updateHandler class="solr.DirectUpdateHandler2" > <autoCommit> <maxDocs>100000</maxDocs> </autoCommit> </updateHandler> *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>