@Andrea: I agree with you. Do you know if there is a way to initialize
SolrCloudClient directly from some information that I get
from SolrQueryRequest or from AddUpdateCommand object?

@Erick: Thank you for the information about
StatelessScriptUpdateProcessorFactory.

"In your situation, add this _before_ the update is distributed and instead
of
coreB, ask for collectionB."

Right, but how do I ask for for collectionB?

"Next, you want to get the value from “coreB”. Don’t do that, get it from
_collection_ B."

Right, but how do I get value _collection_B?



On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Have you looked at using one of the update processors?
>
> Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> anything
> you’d like to do in a script (Groovy, Postscript. Python I think, and
> others). See:
> ./example/files/conf/update-script.js for one example.
>
> You put it in your solrconfig file in the update handler, then put the
> script in your
> conf directory and push it to ZK and the rest is automagical.
>
> There are a bunch of other update processors that you can use that are also
> pretty much by configuration, but the one I referenced is the one that is
> the
> most general-purpose.
>
> In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB.
>
> Distributed updates go like this:
> 1. the doc gets routed to a leader for a shard
> 2. the doc gets forwarded to each replica.
>
> Now, depending on where you put the update processor (and you’ll have to
> dig a bit. Much of this distribution logic is implicit, but you can
> explicitly
> define it in solrconfig.xml), this either happens  _before_ the docs are
> sent
> to the rest of the replicas or _after_ the docs arrive at each replica.
> From what
> you’ve described, you want to do this before distribution so all copies
> have
> the new field. You don’t care what replica is the leader. You don’t care
> how many
> other replicas exist or where they are. You don’t even care if there’s any
> replica hosting this particular collection on the node that does this, it
> happens
> before distribution.
>
> Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B. Since you have the doc ID (presumably the <uniqueKey>),
> using get-by-id instead of a standard query will be very efficient. I can
> imagine
> under very heavy load this might introduce too much overhead, but it’s
> where I’d start.
>
> Best,
> Erick
>
> > On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com>
> wrote:
> >
> > I can't use  CloudSolrClient  because I need to intercept the incoming
> > indexing request and then add one more field to it. All this happens on
> > Solr side and not client side.
> >
> > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzar...@sease.io>
> > wrote:
> >
> >> Hi Arnold,
> >> why don't you use solrj (in this case a CloudSolrClient) instead of
> dealing
> >> with such low-level details? The actual location of the document you are
> >> looking for would be completely abstracted.
> >>
> >> Best,
> >> Andrea
> >>
> >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com>
> >> wrote:
> >>
> >>> So, here is the problem that I am trying to solve. I am moving from
> Solr
> >>> master-slave architecture to SolrCloud architecture. I have one custom
> >> Solr
> >>> plugin that does following:
> >>>
> >>> 1. When a document (say document with unique id doc1)is getting indexed
> >> to
> >>> a core say core A then this plugin adds one more field to the indexing
> >>> request. It fetches this new field from core B. Core B in our case
> >>> maintains popularity score field for each document which gets
> calculated
> >> in
> >>> a different project. It fetches the popularity score from score B for
> >> doc1
> >>> and adds it to indexing request.
> >>> 2. In following code, dataInfo.dataSource is the name of the core B.
> >>>
> >>> I can use the name of the core B like collection_shard1_replica_n21 and
> >> it
> >>> works. But it is not a good solution. What if I had a multiple shards
> for
> >>> core B? In that case the the doc1 that I am trying to find might not be
> >>> present in collection_shard1_replica_n21.
> >>>
> >>> So is there something like,
> >>>
> >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> >>>
> >>> @Override
> >>> public void processAdd(AddUpdateCommand cmd) throws IOException {
> >>>   SolrInputDocument doc = cmd.getSolrInputDocument();
> >>>   String uniqueId = getUniqueId(doc);
> >>>
> >>>   SolrCore dataCore =
> >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> >>>
> >>>   if (dataCore == null){
> >>>       LOG.error("Solr core '{}' to use as data source could not be
> >>> found!  "
> >>>               + "Please check if it is loaded.", dataInfo.dataSource);
> >>>   } else{
> >>>
> >>>          Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> >>>
> >>>          if (sourceDoc != null){
> >>>
> >>>              populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> >>>          }
> >>>   }
> >>>
> >>>   // pass it up the chain
> >>>   super.processAdd(cmd);
> >>> }
> >>>
> >>>
> >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>>
> >>>> No, you cannot just use the collection name. Replicas are just cores.
> >>>> You can host many replicas of a single collection on a single Solr
> node
> >>>> in a single CoreContainer (there’s only one per Solr JVM). If you just
> >>>> specified a collection name how would the code have any clue which
> >>>> of the possibilities to return?
> >>>>
> >>>> The name is in the form collection_shard1_replica_n21
> >>>>
> >>>> How do you know where the doc you’re working on? Put the ID through
> >>>> the hashing mechanism.
> >>>>
> >>>> This isn’t the same at all if you’re running stand-alone, then there’s
> >>> only
> >>>> one name.
> >>>>
> >>>> But as I indicated above, your ask for just using the collection name
> >>> isn’t
> >>>> going to work by definition.
> >>>>
> >>>> So perhaps this is an XY problem. You’re asking about getCore, which
> is
> >>>> a very specific, low-level concept. What are you trying to do at a
> >> higher
> >>>> level? Why do you think you need to get a core? What do you want to
> >> _do_
> >>>> with the doc that you need the core it resides in?
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <arnoldbron...@gmail.com
> >>>
> >>>> wrote:
> >>>>>
> >>>>> Wait, would I need to use core name like
> >> collection1_shard1_replica_n4
> >>>>> etc/? Can't I use collection name? What if  I have multiple shards,
> >> how
> >>>>> would I know where does the document that I am working with lives in
> >>>>> currently.
> >>>>> I would rather prefer to use collection name and expect the core
> >>>>> information to be abstracted out that way.
> >>>>>
> >>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> >>> erickerick...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hmmm, should work. What is your core_name? There’s strings like
> >>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
> >>> using
> >>>> the
> >>>>>> right one?
> >>>>>>
> >>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
> >> arnoldbron...@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> In a custom Solr plugin code,
> >>>>>>> req.getCore().getCoreContainer().getCore(core_name) is returning
> >> null
> >>>>>> even
> >>>>>>> if core by name core_name is loaded and up in Solr. req is object
> >>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >>>>>>>
> >>>>>>> Any ideas on why this might be the case?
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Reply via email to