@Andrea: I agree with you. Do you know if there is a way to initialize SolrCloudClient directly from some information that I get from SolrQueryRequest or from AddUpdateCommand object?
@Erick: Thank you for the information about StatelessScriptUpdateProcessorFactory. "In your situation, add this _before_ the update is distributed and instead of coreB, ask for collectionB." Right, but how do I ask for for collectionB? "Next, you want to get the value from “coreB”. Don’t do that, get it from _collection_ B." Right, but how do I get value _collection_B? On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com> wrote: > Have you looked at using one of the update processors? > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do > anything > you’d like to do in a script (Groovy, Postscript. Python I think, and > others). See: > ./example/files/conf/update-script.js for one example. > > You put it in your solrconfig file in the update handler, then put the > script in your > conf directory and push it to ZK and the rest is automagical. > > There are a bunch of other update processors that you can use that are also > pretty much by configuration, but the one I referenced is the one that is > the > most general-purpose. > > In your situation, add this _before_ the update is distributed and instead > of > coreB, ask for collectionB. > > Distributed updates go like this: > 1. the doc gets routed to a leader for a shard > 2. the doc gets forwarded to each replica. > > Now, depending on where you put the update processor (and you’ll have to > dig a bit. Much of this distribution logic is implicit, but you can > explicitly > define it in solrconfig.xml), this either happens _before_ the docs are > sent > to the rest of the replicas or _after_ the docs arrive at each replica. > From what > you’ve described, you want to do this before distribution so all copies > have > the new field. You don’t care what replica is the leader. You don’t care > how many > other replicas exist or where they are. You don’t even care if there’s any > replica hosting this particular collection on the node that does this, it > happens > before distribution. > > Next, you want to get the value from “coreB”. Don’t do that, get it from > _collection_ B. Since you have the doc ID (presumably the <uniqueKey>), > using get-by-id instead of a standard query will be very efficient. I can > imagine > under very heavy load this might introduce too much overhead, but it’s > where I’d start. > > Best, > Erick > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com> > wrote: > > > > I can't use CloudSolrClient because I need to intercept the incoming > > indexing request and then add one more field to it. All this happens on > > Solr side and not client side. > > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzar...@sease.io> > > wrote: > > > >> Hi Arnold, > >> why don't you use solrj (in this case a CloudSolrClient) instead of > dealing > >> with such low-level details? The actual location of the document you are > >> looking for would be completely abstracted. > >> > >> Best, > >> Andrea > >> > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com> > >> wrote: > >> > >>> So, here is the problem that I am trying to solve. I am moving from > Solr > >>> master-slave architecture to SolrCloud architecture. I have one custom > >> Solr > >>> plugin that does following: > >>> > >>> 1. When a document (say document with unique id doc1)is getting indexed > >> to > >>> a core say core A then this plugin adds one more field to the indexing > >>> request. It fetches this new field from core B. Core B in our case > >>> maintains popularity score field for each document which gets > calculated > >> in > >>> a different project. It fetches the popularity score from score B for > >> doc1 > >>> and adds it to indexing request. > >>> 2. In following code, dataInfo.dataSource is the name of the core B. > >>> > >>> I can use the name of the core B like collection_shard1_replica_n21 and > >> it > >>> works. But it is not a good solution. What if I had a multiple shards > for > >>> core B? In that case the the doc1 that I am trying to find might not be > >>> present in collection_shard1_replica_n21. > >>> > >>> So is there something like, > >>> > >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > >>> > >>> @Override > >>> public void processAdd(AddUpdateCommand cmd) throws IOException { > >>> SolrInputDocument doc = cmd.getSolrInputDocument(); > >>> String uniqueId = getUniqueId(doc); > >>> > >>> SolrCore dataCore = > >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > >>> > >>> if (dataCore == null){ > >>> LOG.error("Solr core '{}' to use as data source could not be > >>> found! " > >>> + "Please check if it is loaded.", dataInfo.dataSource); > >>> } else{ > >>> > >>> Document sourceDoc = getSourceDocument(dataCore, uniqueId); > >>> > >>> if (sourceDoc != null){ > >>> > >>> populateDocToBeAddedFromSourceDoc(doc,sourceDoc); > >>> } > >>> } > >>> > >>> // pass it up the chain > >>> super.processAdd(cmd); > >>> } > >>> > >>> > >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson < > erickerick...@gmail.com> > >>> wrote: > >>> > >>>> No, you cannot just use the collection name. Replicas are just cores. > >>>> You can host many replicas of a single collection on a single Solr > node > >>>> in a single CoreContainer (there’s only one per Solr JVM). If you just > >>>> specified a collection name how would the code have any clue which > >>>> of the possibilities to return? > >>>> > >>>> The name is in the form collection_shard1_replica_n21 > >>>> > >>>> How do you know where the doc you’re working on? Put the ID through > >>>> the hashing mechanism. > >>>> > >>>> This isn’t the same at all if you’re running stand-alone, then there’s > >>> only > >>>> one name. > >>>> > >>>> But as I indicated above, your ask for just using the collection name > >>> isn’t > >>>> going to work by definition. > >>>> > >>>> So perhaps this is an XY problem. You’re asking about getCore, which > is > >>>> a very specific, low-level concept. What are you trying to do at a > >> higher > >>>> level? Why do you think you need to get a core? What do you want to > >> _do_ > >>>> with the doc that you need the core it resides in? > >>>> > >>>> Best, > >>>> Erick > >>>> > >>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <arnoldbron...@gmail.com > >>> > >>>> wrote: > >>>>> > >>>>> Wait, would I need to use core name like > >> collection1_shard1_replica_n4 > >>>>> etc/? Can't I use collection name? What if I have multiple shards, > >> how > >>>>> would I know where does the document that I am working with lives in > >>>>> currently. > >>>>> I would rather prefer to use collection name and expect the core > >>>>> information to be abstracted out that way. > >>>>> > >>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < > >>> erickerick...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hmmm, should work. What is your core_name? There’s strings like > >>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re > >>> using > >>>> the > >>>>>> right one? > >>>>>> > >>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley < > >> arnoldbron...@gmail.com > >>>> > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> In a custom Solr plugin code, > >>>>>>> req.getCore().getCoreContainer().getCore(core_name) is returning > >> null > >>>>>> even > >>>>>>> if core by name core_name is loaded and up in Solr. req is object > >>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > >>>>>>> > >>>>>>> Any ideas on why this might be the case? > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > >