RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

Davis, Daniel (NIH/NLM) [C] Fri, 04 Jan 2019 09:55:38 -0800

DIH is also not designed to multi-thread very well.   One way I've handled this 
is to have a DIH XML that breaks-up a database query into multiple processes by 
taking the modulo of a row, as follows:


    <entity name="medsite" dataSource="oltp01_prod"
            rootEntity="true" 
            query="SELECT * FROM (SELECT t.*, mod(RowNum, 4) threadid FROM 
your_table t) WHERE threadid = 0"
            transformer="TemplateTransformer,LogTransformer" 
            logTemplate="topic thread 0" logLevel="debug">

This allows me to do sub-queries within the entity, but it is often better to 
just write a small program to get this data from the database, and ETL 
processors such as Pentaho DI (Kettle) and Talend DI do this quite well.

If you can express what you want in a database view, even a complicated one, 
then your best way to get it into Solr IMO is to use logstash with the jdbc 
input plugin.   It can do some transformation, but you'll need your database 
view to process the data.

> -----Original Message-----
> From: Shawn Heisey <elyog...@elyograg.org>
> Sent: Friday, January 4, 2019 12:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [solr-solrcloud] How does DIH work when there are multiple
> nodes?
> 
> On 1/4/2019 1:04 AM, 유정인 wrote:
> > The reader was looking for a way to do 'DIH' automatically.
> >
> > The reason was for HA configuration.
> 
> If you send a DIH request to the collection (as opposed to a specific
> core), that request will be load balanced across the cloud.  You won't
> know which replica/core actually handles it. This means that an import
> command may be handled by a different host than a status command.  In
> that situation, the status command will not know about the import,
> because it will be running on a different Solr core.
> 
> When doing DIH on SolrCloud, you should send your requests directly to a
> specific core on a specific node.  It's the only way to be sure what's
> happening.  High availability would have to be handled in your application.
> 
> Thanks,
> Shawn

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

Reply via email to