On 4/1/2017 4:17 PM, marotosg wrote: > I am trying to load a big table into Solr using DataImportHandler and Mysql. > I am getting OutOfMemory error because Solr is trying to load the full > table. I have been reading different posts and tried batchSize="-1". > https://wiki.apache.org/solr/DataImportHandlerFaq > > Do you have any idea what could be the issue? > Completely lost here. > > Solr.6.4.1 > mysql-connector-java-5.1.41-bin.jar > > data-config > > <dataSource type="JdbcDataSource" > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://188.68.190.85:3306/jobsdb" > user="suer" > password="passowrd"/> > <document> > <entity name="jobsearch" > pk="id" > batchSize="-1"
Setting batchSize to -1 is the proper solution, but you've got it in the wrong place. It goes on dataSource, not on entity. https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F When batchSize is -1, DIH executes setFetchSize(Integer.MIN_VALUE) on the JDBC statement. This causes the MySQL JDBC driver to stream the results instead of buffering them. You should upgrade to 6.4.2 or 6.5.0. 6.4.0 and 6.4.1 have a serious performance bug. https://issues.apache.org/jira/browse/SOLR-10130 You may also want to edit the maxMergeCount setting on the mergeScheduler config, set it to at least 6. I ran into a problem with the database disconnecting while importing millions of rows with DIH from MySQL; this was the solution. See this thread: http://lucene.472066.n3.nabble.com/Closed-connection-issue-while-doing-dataimport-td4327116.html Thanks, Shawn