Hi All, I have a requirement to import a large amount of data from a mysql database and index documents (about 1000 documents). During indexing process I need to do a special processing of a field by sending a enhancement requests to an external Apache Stanbol server. I have configured my dataimport-handler in solrconfig.xml to use the StanbolContentProcessor in the update chain, as below;
*<updateRequestProcessorChain name="stanbolInterceptor">* * <processor class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/>* * <processor class="solr.RunUpdateProcessorFactory" />* * </updateRequestProcessorChain>* * <requestHandler name="/dataimport" class="solr.DataImportHandler"> * * <lst name="defaults"> * * <str name="config">data-config.xml</str>* * <str name="update.chain">stanbolInterceptor</str>* * </lst> * * </requestHandler>* My sample data-config.xml is as below; *<dataConfig>* *<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solrTest" user="test" password="test123" batchSize="1" />* * <document name="stanboldata">* * <entity name="stanbolrequest" query="SELECT * FROM documents">* * <field column="id" name="id" />* * <field column="content" name="content" />* * <field column="title" name="title" />* * </entity>* * </document>* *</dataConfig>* When running a large import with about 1000 documents, my stanbol server goes down, I suspect due to heavy load from the above Solr Stanbolnterceptor. I would like to throttle the dataimport in batches, so that Stanbol can process a manageable number of requests concurrently. Is this achievable using batchSize parameter in dataSource element in the data-config? Can someone please give some ideas to throttle the dataimport load in Solr? Thanks, Dileepa