Hi guys,

I have set up a Solr instance and upon attempting to index document, the
whole process is painfully slow. I will try to put as much info as I can in
this mail. Pl. feel free to ask me anything else that might be required.

I am sending documents in batches not exceeding 2,000. The size of each of
them depends but usually is around 10-15MiB. My indexing script tells me
that Solr took T seconds to add N documents of size S. For the same data,
the Solr Log add QTime is QT. Some of the sample data are:

   N                     S                T               QT
-------------------------------------------------------------------------
 390 docs  |   3,478,804 Bytes   | 14.5s    |  2297
 852 docs  |   6,039,535 Bytes   | 25.3s    |  4237
1345 docs | 11,147,512 Bytes   |  47s      |  8543
1147 docs |   9,457,717 Bytes   |  44s      |  2297
1096 docs | 13,058,204 Bytes   |  54.3s   |   8782

The time T includes the time of converting an array of Hash objects into
XML, POSTing it to Solr and response acknowledged from Solr. Clearly, there
is a huge difference between both the time T and QT. After a lot of efforts,
I have no clue why these times do not match.

The Server has 16 cores, 48GiB RAM. JVM options are -Xms5000M -Xmx5000M
-XX:+UseParNewGC

I believe my Indexing is getting slow. Relevant portion from my schema file
are as follows. On a related note, every document has one dynamic field.
Based on this rate, it takes me ~30hrs to do a full index of my database.
I would really appreciate kindness of community in order to get this
indexing faster.

<indexDefaults>

<useCompoundFile>false</useCompoundFile>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">

<int name="maxMergeCount">10</int>

<int name="maxThreadCount">10</int>

 </mergeScheduler>

<ramBufferSizeMB>2048</ramBufferSizeMB>

<maxMergeDocs>2147483647</maxMergeDocs>

<maxFieldLength>3000000</maxFieldLength>

<writeLockTimeout>1000</writeLockTimeout>

<maxBufferedDocs>50000</maxBufferedDocs>

<termIndexInterval>256</termIndexInterval>

<mergeFactor>10</mergeFactor>

<useCompoundFile>false</useCompoundFile>

<!-- <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">

 <int name="maxMergeAtOnceExplicit">19</int>

<int name="segmentsPerTier">9</int>

</mergePolicy> -->

</indexDefaults>

<mainIndex>

<unlockOnStartup>true</unlockOnStartup>

<reopenReaders>true</reopenReaders>

<deletionPolicy class="solr.SolrDeletionPolicy">

 <str name="maxCommitsToKeep">1</str>

<str name="maxOptimizedCommitsToKeep">0</str>

</deletionPolicy>

<infoStream file="INFOSTREAM.txt">false</infoStream>

</mainIndex>

<updateHandler class="solr.DirectUpdateHandler2" >

<autoCommit>

 <maxDocs>100000</maxDocs>

</autoCommit>

</updateHandler>


*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Reply via email to