On 5/15/2012 3:42 PM, Jon Drukman wrote:
I fixed it for now by upping the wait_timeout on the mysql server.
Apparently Solr doesn't like having its connection yanked out from under
it and/or isn't smart enough to reconnect if the server goes away. I'll
set it back the way it was and try your readOnly option.
I use DIH with MySQL. The only time I ran into timeouts while importing
was related to segment merging. A first level merge happens when the
number of segments reaches mergeFactor. A second level merge happens
when the number of merged segments reaches mergeFactor. A third level
merge happens when you get enough segments created by second level
merges. It's probably possible for this to extend to fourth level and
beyond, though I have not seen that personally.
When there are multiple merges happening at the same time (on 3.4 and
earlier, 3.5 may have changed this), only one of them actually runs, the
others are paused. Eventually, if you have a slow I/O system (SATA
RAID1 or slower) and a big enough index, your full-import can reach a
state where you have all three levels happening at the same time. When
this happens, indexing stops. If it stops for long enough, the server
will close the connection and DIH will fail once it begins indexing again.
Since my DIH config consists of a single SELECT statement that runs for
the entire three hour duration of the import, adding reconnect
capability to DIH would not help. The only way to make it work right is
to configure things such that Solr never stops indexing. I did this by
increasing my mergeFactor, and when I installed Solr 3.5, used
maxMergeAtOnce, segmentsPerTier, and maxMergeAtOnceExplicit. I also
increased maxMergeCount under mergeScheduler. Here's my current
indexDefaults section:
<indexDefaults>
<useCompoundFile>false</useCompoundFile>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">35</int>
<int name="segmentsPerTier">35</int>
<int name="maxMergeAtOnceExplicit">105</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
</mergeScheduler>
<ramBufferSizeMB>128</ramBufferSizeMB>
<maxFieldLength>32768</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
<lockType>native</lockType>
</indexDefaults>
Thanks,
Shawn