Just like any other SolrCloud request. Simplest case is to fire an HTTP
request from the update processor just like you would from a browser.



> On Aug 29, 2019, at 3:31 PM, Arnold Bronley <arnoldbron...@gmail.com> wrote:
> 
> @Andrea: I agree with you. Do you know if there is a way to initialize
> SolrCloudClient directly from some information that I get
> from SolrQueryRequest or from AddUpdateCommand object?
> 
> @Erick: Thank you for the information about
> StatelessScriptUpdateProcessorFactory.
> 
> "In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB."
> 
> Right, but how do I ask for for collectionB?
> 
> "Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B."
> 
> Right, but how do I get value _collection_B?
> 
> 
> 
> On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> Have you looked at using one of the update processors?
>> 
>> Consider StatelessScriptUpdateProcessorFactory for instance. You can do
>> anything
>> you’d like to do in a script (Groovy, Postscript. Python I think, and
>> others). See:
>> ./example/files/conf/update-script.js for one example.
>> 
>> You put it in your solrconfig file in the update handler, then put the
>> script in your
>> conf directory and push it to ZK and the rest is automagical.
>> 
>> There are a bunch of other update processors that you can use that are also
>> pretty much by configuration, but the one I referenced is the one that is
>> the
>> most general-purpose.
>> 
>> In your situation, add this _before_ the update is distributed and instead
>> of
>> coreB, ask for collectionB.
>> 
>> Distributed updates go like this:
>> 1. the doc gets routed to a leader for a shard
>> 2. the doc gets forwarded to each replica.
>> 
>> Now, depending on where you put the update processor (and you’ll have to
>> dig a bit. Much of this distribution logic is implicit, but you can
>> explicitly
>> define it in solrconfig.xml), this either happens  _before_ the docs are
>> sent
>> to the rest of the replicas or _after_ the docs arrive at each replica.
>> From what
>> you’ve described, you want to do this before distribution so all copies
>> have
>> the new field. You don’t care what replica is the leader. You don’t care
>> how many
>> other replicas exist or where they are. You don’t even care if there’s any
>> replica hosting this particular collection on the node that does this, it
>> happens
>> before distribution.
>> 
>> Next, you want to get the value from “coreB”. Don’t do that, get it from
>> _collection_ B. Since you have the doc ID (presumably the <uniqueKey>),
>> using get-by-id instead of a standard query will be very efficient. I can
>> imagine
>> under very heavy load this might introduce too much overhead, but it’s
>> where I’d start.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com>
>> wrote:
>>> 
>>> I can't use  CloudSolrClient  because I need to intercept the incoming
>>> indexing request and then add one more field to it. All this happens on
>>> Solr side and not client side.
>>> 
>>> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzar...@sease.io>
>>> wrote:
>>> 
>>>> Hi Arnold,
>>>> why don't you use solrj (in this case a CloudSolrClient) instead of
>> dealing
>>>> with such low-level details? The actual location of the document you are
>>>> looking for would be completely abstracted.
>>>> 
>>>> Best,
>>>> Andrea
>>>> 
>>>> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com>
>>>> wrote:
>>>> 
>>>>> So, here is the problem that I am trying to solve. I am moving from
>> Solr
>>>>> master-slave architecture to SolrCloud architecture. I have one custom
>>>> Solr
>>>>> plugin that does following:
>>>>> 
>>>>> 1. When a document (say document with unique id doc1)is getting indexed
>>>> to
>>>>> a core say core A then this plugin adds one more field to the indexing
>>>>> request. It fetches this new field from core B. Core B in our case
>>>>> maintains popularity score field for each document which gets
>> calculated
>>>> in
>>>>> a different project. It fetches the popularity score from score B for
>>>> doc1
>>>>> and adds it to indexing request.
>>>>> 2. In following code, dataInfo.dataSource is the name of the core B.
>>>>> 
>>>>> I can use the name of the core B like collection_shard1_replica_n21 and
>>>> it
>>>>> works. But it is not a good solution. What if I had a multiple shards
>> for
>>>>> core B? In that case the the doc1 that I am trying to find might not be
>>>>> present in collection_shard1_replica_n21.
>>>>> 
>>>>> So is there something like,
>>>>> 
>>>>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
>>>>> 
>>>>> @Override
>>>>> public void processAdd(AddUpdateCommand cmd) throws IOException {
>>>>>  SolrInputDocument doc = cmd.getSolrInputDocument();
>>>>>  String uniqueId = getUniqueId(doc);
>>>>> 
>>>>>  SolrCore dataCore =
>>>>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
>>>>> 
>>>>>  if (dataCore == null){
>>>>>      LOG.error("Solr core '{}' to use as data source could not be
>>>>> found!  "
>>>>>              + "Please check if it is loaded.", dataInfo.dataSource);
>>>>>  } else{
>>>>> 
>>>>>         Document sourceDoc = getSourceDocument(dataCore, uniqueId);
>>>>> 
>>>>>         if (sourceDoc != null){
>>>>> 
>>>>>             populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
>>>>>         }
>>>>>  }
>>>>> 
>>>>>  // pass it up the chain
>>>>>  super.processAdd(cmd);
>>>>> }
>>>>> 
>>>>> 
>>>>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <
>> erickerick...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> No, you cannot just use the collection name. Replicas are just cores.
>>>>>> You can host many replicas of a single collection on a single Solr
>> node
>>>>>> in a single CoreContainer (there’s only one per Solr JVM). If you just
>>>>>> specified a collection name how would the code have any clue which
>>>>>> of the possibilities to return?
>>>>>> 
>>>>>> The name is in the form collection_shard1_replica_n21
>>>>>> 
>>>>>> How do you know where the doc you’re working on? Put the ID through
>>>>>> the hashing mechanism.
>>>>>> 
>>>>>> This isn’t the same at all if you’re running stand-alone, then there’s
>>>>> only
>>>>>> one name.
>>>>>> 
>>>>>> But as I indicated above, your ask for just using the collection name
>>>>> isn’t
>>>>>> going to work by definition.
>>>>>> 
>>>>>> So perhaps this is an XY problem. You’re asking about getCore, which
>> is
>>>>>> a very specific, low-level concept. What are you trying to do at a
>>>> higher
>>>>>> level? Why do you think you need to get a core? What do you want to
>>>> _do_
>>>>>> with the doc that you need the core it resides in?
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <arnoldbron...@gmail.com
>>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Wait, would I need to use core name like
>>>> collection1_shard1_replica_n4
>>>>>>> etc/? Can't I use collection name? What if  I have multiple shards,
>>>> how
>>>>>>> would I know where does the document that I am working with lives in
>>>>>>> currently.
>>>>>>> I would rather prefer to use collection name and expect the core
>>>>>>> information to be abstracted out that way.
>>>>>>> 
>>>>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
>>>>> erickerick...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hmmm, should work. What is your core_name? There’s strings like
>>>>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
>>>>> using
>>>>>> the
>>>>>>>> right one?
>>>>>>>> 
>>>>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
>>>> arnoldbron...@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> In a custom Solr plugin code,
>>>>>>>>> req.getCore().getCoreContainer().getCore(core_name) is returning
>>>> null
>>>>>>>> even
>>>>>>>>> if core by name core_name is loaded and up in Solr. req is object
>>>>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
>>>>>>>>> 
>>>>>>>>> Any ideas on why this might be the case?
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to