However you can create multiple DIH configs under a core/collection. You
can run them each in parallel and commit at the end.

SELECT *
 FROM existingtable
 WHERE column >= 1 AND column <= 2000;
SELECT *
 FROM existingtable
 WHERE column >= 2001 AND column <= 4000;


Something like that works for us to speed it up.

On Wed, Jan 25, 2017 at 4:01 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> DIH is not multi-threaded, and so the idea of "queueing" up requests is a
> misnomer.   You might be better off using something other than
> DataImportHandler.
> LogStash can pull what it calls "events" from a database and then push
> them into Solr, and you have some of the same row transformation
> capabilities that DataImportHandler has.
>
> This is also the bread and butter of ETL tools such as
> Kettle/Talend/MuleSoft/etc.
>
> That said, what I have done in the past is to take different streams of
> data and divide them into different requestHandlers, all using
> DataImportHandler.
> Each of these request handlers has its own context as to whether it is
> busy or not, and so each can be separately active/inactive.
>
>   <!-- Data Import Handler for Health Topics -->
>   <requestHandler name="/import/health-topics"
> class="solr.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">health-topics-conf.xml</str>
>     </lst>
>   </requestHandler>
>
>   <!-- Data Import Handler for Drugs and Supplements -->
>   <requestHandler name="/import/drugs" class="solr.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">drugs-conf.xml</str>
>     </lst>
>   </requestHandler>
>
>
> Both of the above or XML imports, but with database imports, I also
> one-time implemented a sort of multithreading by having 4 request handlers
> and 4 data-config files, each taking their own slice of data:
>
> data-config-0.xml
>     ...
>     <entity name="medsite" dataSource="proddb" rootEntity="true"
>             query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM
> my_data_view t) WHERE threadid = 0"
>             transformer="TemplateTransformer,LogTransformer"
>             logTemplate="topic thread 0">
>     ...
>
> data-config-1.xml:
>     ...
>     <entity name="medsite" dataSource="proddb" rootEntity="true"
>             query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM
> my_data_view t) WHERE threadid = 1"
>             transformer="TemplateTransformer,LogTransformer"
>             logTemplate="topic thread 1" logLevel="debug">
>     ...
>
> And so on...
>
> -----Original Message-----
> From: William Bell [mailto:billnb...@gmail.com]
> Sent: Wednesday, January 25, 2017 5:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Does DIH queues up requests
>
> What we do is :
>
> Run URL to delete *:*, but do not commit.
>
> 1. Kick off indexing on DIH1, clean=false, commit=false.
> 2. Kick off indexing on DIH2, clean=false, commit=false
>
> Then we manually commit.
>
> On Wed, Jan 25, 2017 at 2:57 PM, Nkeet Shah <nkeet.s...@mathworks.com>
> wrote:
>
> > Hi,
> > I have a multi-thread application that makes DIH request to perform
> > indexing. What I could not gather from the documentation is that does
> > DIH requests are queued up.
> >
> > In essence if a made a request to say DIH1 and it has accepted the
> > request and is working on the indexing. What would happen if another
> > request is made to the same DIH1. Will it be queued or rejected/
> >
> > Thanks
> > Ankit!
> >
> >
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Reply via email to