Okay, but please clarify further - do you simply wish to run DIH externally,
but still sending each document to SolrCloud for indexing, or... are you
expecting to generate the index completely external to the cluster and then
somehow "merge" that DIH "index" into the SolrCloud index?
It would be great to have a "standalone DIH" that runs as a separate server
and then sends standard Solr update requests to a Solr cluster.
-- Jack Krupansky
-----Original Message-----
From: Lee Chunki
Sent: Sunday, August 31, 2014 8:55 PM
To: solr-user@lucene.apache.org
Subject: Re: external indexer for Solr Cloud
Hi Shawn and Jack,
Thank you for your reply.
Yes, I want to run data import hander independently and sync it to Solr
Cloud.
because current my DIH node do not only DB fetch & join but also many
preprocessing.
Thanks,
Chunki.
On Aug 30, 2014, at 1:34 AM, Jack Krupansky <j...@basetechnology.com> wrote:
My other thought was that maybe he wants to do index updates outside of
the cluster that is handling queries, and then copy in the completed
index. Or... maybe take replicas out of the query rotation while they are
updated. Or... maybe this is yet another X-Y problem!
-- Jack Krupansky
-----Original Message----- From: Shawn Heisey
Sent: Friday, August 29, 2014 11:19 AM
To: solr-user@lucene.apache.org
Subject: Re: external indexer for Solr Cloud
On 8/29/2014 5:21 AM, Lee Chunki wrote:
Is there any way to run external indexer for solar cloud?
Jack asked an excellent question. What do you mean by this? Unless
you're using the dataimport handler, all indexing is external to Solr.
my situation is :
* running two indexer ( for fail over ) and two searcher.
* just use two searcher for service.
* have plan to move on Solr Cloud
however I wonder that if I run indexing job on one of the solr cloud
server, the server’s load would be higher than other nodes.
so, I want to build index out of sold cloud but….
In SolrCloud, every shard replica will be indexing -- it's not like
old-style replication, where the master indexes everything and the
slaves copy the completed index. The leader of each shard will be
working slightly harder than the other replicas, but you really don't
need to worry too much about sending all your updates to one server --
those requests get duplicated to the other servers and they all index
them, almost in parallel.
For my setup (non-cloud, but sharded), I use Pacemaker to ensure that
only one of my servers is running my indexing program and haproxy (plus
its shared IP address).
Thanks,
Shawn