Just like any other SolrCloud request. Simplest case is to fire an HTTP request from the update processor just like you would from a browser.
> On Aug 29, 2019, at 3:31 PM, Arnold Bronley <arnoldbron...@gmail.com> wrote: > > @Andrea: I agree with you. Do you know if there is a way to initialize > SolrCloudClient directly from some information that I get > from SolrQueryRequest or from AddUpdateCommand object? > > @Erick: Thank you for the information about > StatelessScriptUpdateProcessorFactory. > > "In your situation, add this _before_ the update is distributed and instead > of > coreB, ask for collectionB." > > Right, but how do I ask for for collectionB? > > "Next, you want to get the value from “coreB”. Don’t do that, get it from > _collection_ B." > > Right, but how do I get value _collection_B? > > > > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> Have you looked at using one of the update processors? >> >> Consider StatelessScriptUpdateProcessorFactory for instance. You can do >> anything >> you’d like to do in a script (Groovy, Postscript. Python I think, and >> others). See: >> ./example/files/conf/update-script.js for one example. >> >> You put it in your solrconfig file in the update handler, then put the >> script in your >> conf directory and push it to ZK and the rest is automagical. >> >> There are a bunch of other update processors that you can use that are also >> pretty much by configuration, but the one I referenced is the one that is >> the >> most general-purpose. >> >> In your situation, add this _before_ the update is distributed and instead >> of >> coreB, ask for collectionB. >> >> Distributed updates go like this: >> 1. the doc gets routed to a leader for a shard >> 2. the doc gets forwarded to each replica. >> >> Now, depending on where you put the update processor (and you’ll have to >> dig a bit. Much of this distribution logic is implicit, but you can >> explicitly >> define it in solrconfig.xml), this either happens _before_ the docs are >> sent >> to the rest of the replicas or _after_ the docs arrive at each replica. >> From what >> you’ve described, you want to do this before distribution so all copies >> have >> the new field. You don’t care what replica is the leader. You don’t care >> how many >> other replicas exist or where they are. You don’t even care if there’s any >> replica hosting this particular collection on the node that does this, it >> happens >> before distribution. >> >> Next, you want to get the value from “coreB”. Don’t do that, get it from >> _collection_ B. Since you have the doc ID (presumably the <uniqueKey>), >> using get-by-id instead of a standard query will be very efficient. I can >> imagine >> under very heavy load this might introduce too much overhead, but it’s >> where I’d start. >> >> Best, >> Erick >> >>> On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com> >> wrote: >>> >>> I can't use CloudSolrClient because I need to intercept the incoming >>> indexing request and then add one more field to it. All this happens on >>> Solr side and not client side. >>> >>> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzar...@sease.io> >>> wrote: >>> >>>> Hi Arnold, >>>> why don't you use solrj (in this case a CloudSolrClient) instead of >> dealing >>>> with such low-level details? The actual location of the document you are >>>> looking for would be completely abstracted. >>>> >>>> Best, >>>> Andrea >>>> >>>> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com> >>>> wrote: >>>> >>>>> So, here is the problem that I am trying to solve. I am moving from >> Solr >>>>> master-slave architecture to SolrCloud architecture. I have one custom >>>> Solr >>>>> plugin that does following: >>>>> >>>>> 1. When a document (say document with unique id doc1)is getting indexed >>>> to >>>>> a core say core A then this plugin adds one more field to the indexing >>>>> request. It fetches this new field from core B. Core B in our case >>>>> maintains popularity score field for each document which gets >> calculated >>>> in >>>>> a different project. It fetches the popularity score from score B for >>>> doc1 >>>>> and adds it to indexing request. >>>>> 2. In following code, dataInfo.dataSource is the name of the core B. >>>>> >>>>> I can use the name of the core B like collection_shard1_replica_n21 and >>>> it >>>>> works. But it is not a good solution. What if I had a multiple shards >> for >>>>> core B? In that case the the doc1 that I am trying to find might not be >>>>> present in collection_shard1_replica_n21. >>>>> >>>>> So is there something like, >>>>> >>>>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource); >>>>> >>>>> @Override >>>>> public void processAdd(AddUpdateCommand cmd) throws IOException { >>>>> SolrInputDocument doc = cmd.getSolrInputDocument(); >>>>> String uniqueId = getUniqueId(doc); >>>>> >>>>> SolrCore dataCore = >>>>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource); >>>>> >>>>> if (dataCore == null){ >>>>> LOG.error("Solr core '{}' to use as data source could not be >>>>> found! " >>>>> + "Please check if it is loaded.", dataInfo.dataSource); >>>>> } else{ >>>>> >>>>> Document sourceDoc = getSourceDocument(dataCore, uniqueId); >>>>> >>>>> if (sourceDoc != null){ >>>>> >>>>> populateDocToBeAddedFromSourceDoc(doc,sourceDoc); >>>>> } >>>>> } >>>>> >>>>> // pass it up the chain >>>>> super.processAdd(cmd); >>>>> } >>>>> >>>>> >>>>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson < >> erickerick...@gmail.com> >>>>> wrote: >>>>> >>>>>> No, you cannot just use the collection name. Replicas are just cores. >>>>>> You can host many replicas of a single collection on a single Solr >> node >>>>>> in a single CoreContainer (there’s only one per Solr JVM). If you just >>>>>> specified a collection name how would the code have any clue which >>>>>> of the possibilities to return? >>>>>> >>>>>> The name is in the form collection_shard1_replica_n21 >>>>>> >>>>>> How do you know where the doc you’re working on? Put the ID through >>>>>> the hashing mechanism. >>>>>> >>>>>> This isn’t the same at all if you’re running stand-alone, then there’s >>>>> only >>>>>> one name. >>>>>> >>>>>> But as I indicated above, your ask for just using the collection name >>>>> isn’t >>>>>> going to work by definition. >>>>>> >>>>>> So perhaps this is an XY problem. You’re asking about getCore, which >> is >>>>>> a very specific, low-level concept. What are you trying to do at a >>>> higher >>>>>> level? Why do you think you need to get a core? What do you want to >>>> _do_ >>>>>> with the doc that you need the core it resides in? >>>>>> >>>>>> Best, >>>>>> Erick >>>>>> >>>>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <arnoldbron...@gmail.com >>>>> >>>>>> wrote: >>>>>>> >>>>>>> Wait, would I need to use core name like >>>> collection1_shard1_replica_n4 >>>>>>> etc/? Can't I use collection name? What if I have multiple shards, >>>> how >>>>>>> would I know where does the document that I am working with lives in >>>>>>> currently. >>>>>>> I would rather prefer to use collection name and expect the core >>>>>>> information to be abstracted out that way. >>>>>>> >>>>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < >>>>> erickerick...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hmmm, should work. What is your core_name? There’s strings like >>>>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re >>>>> using >>>>>> the >>>>>>>> right one? >>>>>>>> >>>>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley < >>>> arnoldbron...@gmail.com >>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> In a custom Solr plugin code, >>>>>>>>> req.getCore().getCoreContainer().getCore(core_name) is returning >>>> null >>>>>>>> even >>>>>>>>> if core by name core_name is loaded and up in Solr. req is object >>>>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. >>>>>>>>> >>>>>>>>> Any ideas on why this might be the case? >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >> >>