On 4/1/2017 4:17 PM, marotosg wrote:
> I am trying to load a big table into Solr using DataImportHandler and Mysql. 
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1". 
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config 
>
> <dataSource type="JdbcDataSource" 
>             driver="com.mysql.jdbc.Driver"
>             url="jdbc:mysql://188.68.190.85:3306/jobsdb" 
>             user="suer" 
>             password="passowrd"/>
> <document>
>   <entity name="jobsearch"  
>     pk="id"
>       batchSize="-1"

Setting batchSize to -1 is the proper solution, but you've got it in the
wrong place.  It goes on dataSource, not on entity.

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

When batchSize is -1, DIH executes setFetchSize(Integer.MIN_VALUE) on
the JDBC statement.  This causes the MySQL JDBC driver to stream the
results instead of buffering them.

You should upgrade to 6.4.2 or 6.5.0.  6.4.0 and 6.4.1 have a serious
performance bug.

https://issues.apache.org/jira/browse/SOLR-10130

You may also want to edit the maxMergeCount setting on the
mergeScheduler config, set it to at least 6.  I ran into a problem with
the database disconnecting while importing millions of rows with DIH
from MySQL; this was the solution.  See this thread:

http://lucene.472066.n3.nabble.com/Closed-connection-issue-while-doing-dataimport-td4327116.html

Thanks,
Shawn

Reply via email to