Hi All,

I have a requirement to import a large amount of data from a mysql database
and index documents (about 1000 documents).
During indexing process I need to do a special processing of a field by
sending a enhancement requests to an external Apache Stanbol server.
I have configured my dataimport-handler in solrconfig.xml to use the
StanbolContentProcessor in the update chain, as below;

 *<updateRequestProcessorChain name="stanbolInterceptor">*
* <processor
class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/>*
*        <processor class="solr.RunUpdateProcessorFactory" />*
*  </updateRequestProcessorChain>*

*  <requestHandler name="/dataimport" class="solr.DataImportHandler">   *
* <lst name="defaults">  *
* <str name="config">data-config.xml</str>*
* <str name="update.chain">stanbolInterceptor</str>*
* </lst> *
*   </requestHandler>*

My sample data-config.xml is as below;

*<dataConfig>*
*<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solrTest" user="test" password="test123"
batchSize="1" />*
*    <document name="stanboldata">*
*        <entity name="stanbolrequest" query="SELECT * FROM documents">*
*            <field column="id" name="id" />*
*            <field column="content" name="content" />*
*     <field column="title" name="title" />*
*        </entity>*
*    </document>*
*</dataConfig>*

When running a large import with about 1000 documents, my stanbol server
goes down, I suspect due to heavy load from the above Solr
Stanbolnterceptor.
I would like to throttle the dataimport in batches, so that Stanbol can
process a manageable number of requests concurrently.
Is this achievable using batchSize parameter in dataSource element in the
data-config?
Can someone please give some ideas to throttle the dataimport load in Solr?

Thanks,
Dileepa

Reply via email to