@Andrea: Yeah, I would try to avoid getting that information from System.getProperty. I am also looking for some class that will give this information.
@Erick: Is there any way to get the information about current Solr endpoint/Zk ensemble info from inside StatelessScriptUpdateProcessorFactory so that I can make that http request? On Thu, Aug 29, 2019 at 5:18 PM Andrea Gazzarini <a.gazzar...@sease.io> wrote: > I remember ZK coordinates (hosts, ports and root) are set as system > properties in Solr nodes (please open the admin console and see their > names). So, it would be just a matter of > > System.getProperty(ZK ensemble coordinates|root) > > Prior to go in that direction: I don't know/remember if there's some ZK > Solr specific class where they can be asked. If that class exists, it would > be a better way, otherwise you can go with the system property approach. > > Andrea > > On Thu, 29 Aug 2019, 21:32 Arnold Bronley, <arnoldbron...@gmail.com> > wrote: > > > @Andrea: I agree with you. Do you know if there is a way to initialize > > SolrCloudClient directly from some information that I get > > from SolrQueryRequest or from AddUpdateCommand object? > > > > @Erick: Thank you for the information about > > StatelessScriptUpdateProcessorFactory. > > > > "In your situation, add this _before_ the update is distributed and > instead > > of > > coreB, ask for collectionB." > > > > Right, but how do I ask for for collectionB? > > > > "Next, you want to get the value from “coreB”. Don’t do that, get it from > > _collection_ B." > > > > Right, but how do I get value _collection_B? > > > > > > > > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > > > Have you looked at using one of the update processors? > > > > > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do > > > anything > > > you’d like to do in a script (Groovy, Postscript. Python I think, and > > > others). See: > > > ./example/files/conf/update-script.js for one example. > > > > > > You put it in your solrconfig file in the update handler, then put the > > > script in your > > > conf directory and push it to ZK and the rest is automagical. > > > > > > There are a bunch of other update processors that you can use that are > > also > > > pretty much by configuration, but the one I referenced is the one that > is > > > the > > > most general-purpose. > > > > > > In your situation, add this _before_ the update is distributed and > > instead > > > of > > > coreB, ask for collectionB. > > > > > > Distributed updates go like this: > > > 1. the doc gets routed to a leader for a shard > > > 2. the doc gets forwarded to each replica. > > > > > > Now, depending on where you put the update processor (and you’ll have > to > > > dig a bit. Much of this distribution logic is implicit, but you can > > > explicitly > > > define it in solrconfig.xml), this either happens _before_ the docs > are > > > sent > > > to the rest of the replicas or _after_ the docs arrive at each replica. > > > From what > > > you’ve described, you want to do this before distribution so all copies > > > have > > > the new field. You don’t care what replica is the leader. You don’t > care > > > how many > > > other replicas exist or where they are. You don’t even care if there’s > > any > > > replica hosting this particular collection on the node that does this, > it > > > happens > > > before distribution. > > > > > > Next, you want to get the value from “coreB”. Don’t do that, get it > from > > > _collection_ B. Since you have the doc ID (presumably the <uniqueKey>), > > > using get-by-id instead of a standard query will be very efficient. I > can > > > imagine > > > under very heavy load this might introduce too much overhead, but it’s > > > where I’d start. > > > > > > Best, > > > Erick > > > > > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com > > > > > wrote: > > > > > > > > I can't use CloudSolrClient because I need to intercept the > incoming > > > > indexing request and then add one more field to it. All this happens > on > > > > Solr side and not client side. > > > > > > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini < > a.gazzar...@sease.io > > > > > > > wrote: > > > > > > > >> Hi Arnold, > > > >> why don't you use solrj (in this case a CloudSolrClient) instead of > > > dealing > > > >> with such low-level details? The actual location of the document you > > are > > > >> looking for would be completely abstracted. > > > >> > > > >> Best, > > > >> Andrea > > > >> > > > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com > > > > > >> wrote: > > > >> > > > >>> So, here is the problem that I am trying to solve. I am moving from > > > Solr > > > >>> master-slave architecture to SolrCloud architecture. I have one > > custom > > > >> Solr > > > >>> plugin that does following: > > > >>> > > > >>> 1. When a document (say document with unique id doc1)is getting > > indexed > > > >> to > > > >>> a core say core A then this plugin adds one more field to the > > indexing > > > >>> request. It fetches this new field from core B. Core B in our case > > > >>> maintains popularity score field for each document which gets > > > calculated > > > >> in > > > >>> a different project. It fetches the popularity score from score B > for > > > >> doc1 > > > >>> and adds it to indexing request. > > > >>> 2. In following code, dataInfo.dataSource is the name of the core > B. > > > >>> > > > >>> I can use the name of the core B like collection_shard1_replica_n21 > > and > > > >> it > > > >>> works. But it is not a good solution. What if I had a multiple > shards > > > for > > > >>> core B? In that case the the doc1 that I am trying to find might > not > > be > > > >>> present in collection_shard1_replica_n21. > > > >>> > > > >>> So is there something like, > > > >>> > > > >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > > > >>> > > > >>> @Override > > > >>> public void processAdd(AddUpdateCommand cmd) throws IOException { > > > >>> SolrInputDocument doc = cmd.getSolrInputDocument(); > > > >>> String uniqueId = getUniqueId(doc); > > > >>> > > > >>> SolrCore dataCore = > > > >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > > > >>> > > > >>> if (dataCore == null){ > > > >>> LOG.error("Solr core '{}' to use as data source could not be > > > >>> found! " > > > >>> + "Please check if it is loaded.", > > dataInfo.dataSource); > > > >>> } else{ > > > >>> > > > >>> Document sourceDoc = getSourceDocument(dataCore, > uniqueId); > > > >>> > > > >>> if (sourceDoc != null){ > > > >>> > > > >>> populateDocToBeAddedFromSourceDoc(doc,sourceDoc); > > > >>> } > > > >>> } > > > >>> > > > >>> // pass it up the chain > > > >>> super.processAdd(cmd); > > > >>> } > > > >>> > > > >>> > > > >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson < > > > erickerick...@gmail.com> > > > >>> wrote: > > > >>> > > > >>>> No, you cannot just use the collection name. Replicas are just > > cores. > > > >>>> You can host many replicas of a single collection on a single Solr > > > node > > > >>>> in a single CoreContainer (there’s only one per Solr JVM). If you > > just > > > >>>> specified a collection name how would the code have any clue which > > > >>>> of the possibilities to return? > > > >>>> > > > >>>> The name is in the form collection_shard1_replica_n21 > > > >>>> > > > >>>> How do you know where the doc you’re working on? Put the ID > through > > > >>>> the hashing mechanism. > > > >>>> > > > >>>> This isn’t the same at all if you’re running stand-alone, then > > there’s > > > >>> only > > > >>>> one name. > > > >>>> > > > >>>> But as I indicated above, your ask for just using the collection > > name > > > >>> isn’t > > > >>>> going to work by definition. > > > >>>> > > > >>>> So perhaps this is an XY problem. You’re asking about getCore, > which > > > is > > > >>>> a very specific, low-level concept. What are you trying to do at a > > > >> higher > > > >>>> level? Why do you think you need to get a core? What do you want > to > > > >> _do_ > > > >>>> with the doc that you need the core it resides in? > > > >>>> > > > >>>> Best, > > > >>>> Erick > > > >>>> > > > >>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley < > > arnoldbron...@gmail.com > > > >>> > > > >>>> wrote: > > > >>>>> > > > >>>>> Wait, would I need to use core name like > > > >> collection1_shard1_replica_n4 > > > >>>>> etc/? Can't I use collection name? What if I have multiple > shards, > > > >> how > > > >>>>> would I know where does the document that I am working with lives > > in > > > >>>>> currently. > > > >>>>> I would rather prefer to use collection name and expect the core > > > >>>>> information to be abstracted out that way. > > > >>>>> > > > >>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < > > > >>> erickerick...@gmail.com> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hmmm, should work. What is your core_name? There’s strings like > > > >>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure > you’re > > > >>> using > > > >>>> the > > > >>>>>> right one? > > > >>>>>> > > > >>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley < > > > >> arnoldbron...@gmail.com > > > >>>> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>> Hi, > > > >>>>>>> > > > >>>>>>> In a custom Solr plugin code, > > > >>>>>>> req.getCore().getCoreContainer().getCore(core_name) is > returning > > > >> null > > > >>>>>> even > > > >>>>>>> if core by name core_name is loaded and up in Solr. req is > object > > > >>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud > > mode. > > > >>>>>>> > > > >>>>>>> Any ideas on why this might be the case? > > > >>>>>> > > > >>>>>> > > > >>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >