Re: How to use batchSize in DataImportHandler to throttle updates in a batch-mode

Dileepa Jayakody Sun, 01 Dec 2013 06:51:24 -0800

I actually tweaked the Stanbol server to handle more results and
successfully ran 10K imports within 30 minutes with no server issue.
I'm looking for further improving the results with regard to the efficiency
and NLP accuracy.


Thanks,
Dileepa


On Sun, Dec 1, 2013 at 8:17 PM, Dileepa Jayakody
<dileepajayak...@gmail.com>wrote:

> Thanks all, for your valuable ideas into this matter. I will try them. :)
>
> Regards,
> Dileepa
>
>
> On Sun, Dec 1, 2013 at 6:05 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> There is no support for throttling built into DIH. You can probably write
>> a
>> Transformer which sleeps a while after every N requests to simulate
>> throttling.
>> On 26 Nov 2013 14:21, "Dileepa Jayakody" <dileepajayak...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > I have a requirement to import a large amount of data from a mysql
>> database
>> > and index documents (about 1000 documents).
>> > During indexing process I need to do a special processing of a field by
>> > sending a enhancement requests to an external Apache Stanbol server.
>> > I have configured my dataimport-handler in solrconfig.xml to use the
>> > StanbolContentProcessor in the update chain, as below;
>> >
>> >  *<updateRequestProcessorChain name="stanbolInterceptor">*
>> > * <processor
>> > class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/>*
>> > *        <processor class="solr.RunUpdateProcessorFactory" />*
>> > *  </updateRequestProcessorChain>*
>> >
>> > *  <requestHandler name="/dataimport" class="solr.DataImportHandler">
>> *
>> > * <lst name="defaults">  *
>> > * <str name="config">data-config.xml</str>*
>> > * <str name="update.chain">stanbolInterceptor</str>*
>> > * </lst> *
>> > *   </requestHandler>*
>> >
>> > My sample data-config.xml is as below;
>> >
>> > *<dataConfig>*
>> > *<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
>> > url="jdbc:mysql://localhost:3306/solrTest" user="test"
>> password="test123"
>> > batchSize="1" />*
>> > *    <document name="stanboldata">*
>> > *        <entity name="stanbolrequest" query="SELECT * FROM documents">*
>> > *            <field column="id" name="id" />*
>> > *            <field column="content" name="content" />*
>> > *     <field column="title" name="title" />*
>> > *        </entity>*
>> > *    </document>*
>> > *</dataConfig>*
>> >
>> > When running a large import with about 1000 documents, my stanbol server
>> > goes down, I suspect due to heavy load from the above Solr
>> > Stanbolnterceptor.
>> > I would like to throttle the dataimport in batches, so that Stanbol can
>> > process a manageable number of requests concurrently.
>> > Is this achievable using batchSize parameter in dataSource element in
>> the
>> > data-config?
>> > Can someone please give some ideas to throttle the dataimport load in
>> Solr?
>> >
>> > Thanks,
>> > Dileepa
>> >
>>
>
>

Re: How to use batchSize in DataImportHandler to throttle updates in a batch-mode

Reply via email to