@Andrea: Yeah, I would try to avoid getting that information from
System.getProperty. I am also looking for some class that will give this
information.

@Erick: Is there any way to get the information about current Solr
endpoint/Zk ensemble info from inside  StatelessScriptUpdateProcessorFactory
so that I can make that http request?

On Thu, Aug 29, 2019 at 5:18 PM Andrea Gazzarini <a.gazzar...@sease.io>
wrote:

> I remember ZK coordinates (hosts, ports and root) are set as system
> properties in Solr nodes (please open the admin console and see their
> names). So, it would be just a matter of
>
> System.getProperty(ZK ensemble coordinates|root)
>
> Prior to go in that direction: I don't know/remember if there's some ZK
> Solr specific class where they can be asked. If that class exists, it would
> be a better way, otherwise you can go with the system property approach.
>
> Andrea
>
> On Thu, 29 Aug 2019, 21:32 Arnold Bronley, <arnoldbron...@gmail.com>
> wrote:
>
> > @Andrea: I agree with you. Do you know if there is a way to initialize
> > SolrCloudClient directly from some information that I get
> > from SolrQueryRequest or from AddUpdateCommand object?
> >
> > @Erick: Thank you for the information about
> > StatelessScriptUpdateProcessorFactory.
> >
> > "In your situation, add this _before_ the update is distributed and
> instead
> > of
> > coreB, ask for collectionB."
> >
> > Right, but how do I ask for for collectionB?
> >
> > "Next, you want to get the value from “coreB”. Don’t do that, get it from
> > _collection_ B."
> >
> > Right, but how do I get value _collection_B?
> >
> >
> >
> > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > Have you looked at using one of the update processors?
> > >
> > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> > > anything
> > > you’d like to do in a script (Groovy, Postscript. Python I think, and
> > > others). See:
> > > ./example/files/conf/update-script.js for one example.
> > >
> > > You put it in your solrconfig file in the update handler, then put the
> > > script in your
> > > conf directory and push it to ZK and the rest is automagical.
> > >
> > > There are a bunch of other update processors that you can use that are
> > also
> > > pretty much by configuration, but the one I referenced is the one that
> is
> > > the
> > > most general-purpose.
> > >
> > > In your situation, add this _before_ the update is distributed and
> > instead
> > > of
> > > coreB, ask for collectionB.
> > >
> > > Distributed updates go like this:
> > > 1. the doc gets routed to a leader for a shard
> > > 2. the doc gets forwarded to each replica.
> > >
> > > Now, depending on where you put the update processor (and you’ll have
> to
> > > dig a bit. Much of this distribution logic is implicit, but you can
> > > explicitly
> > > define it in solrconfig.xml), this either happens  _before_ the docs
> are
> > > sent
> > > to the rest of the replicas or _after_ the docs arrive at each replica.
> > > From what
> > > you’ve described, you want to do this before distribution so all copies
> > > have
> > > the new field. You don’t care what replica is the leader. You don’t
> care
> > > how many
> > > other replicas exist or where they are. You don’t even care if there’s
> > any
> > > replica hosting this particular collection on the node that does this,
> it
> > > happens
> > > before distribution.
> > >
> > > Next, you want to get the value from “coreB”. Don’t do that, get it
> from
> > > _collection_ B. Since you have the doc ID (presumably the <uniqueKey>),
> > > using get-by-id instead of a standard query will be very efficient. I
> can
> > > imagine
> > > under very heavy load this might introduce too much overhead, but it’s
> > > where I’d start.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com
> >
> > > wrote:
> > > >
> > > > I can't use  CloudSolrClient  because I need to intercept the
> incoming
> > > > indexing request and then add one more field to it. All this happens
> on
> > > > Solr side and not client side.
> > > >
> > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <
> a.gazzar...@sease.io
> > >
> > > > wrote:
> > > >
> > > >> Hi Arnold,
> > > >> why don't you use solrj (in this case a CloudSolrClient) instead of
> > > dealing
> > > >> with such low-level details? The actual location of the document you
> > are
> > > >> looking for would be completely abstracted.
> > > >>
> > > >> Best,
> > > >> Andrea
> > > >>
> > > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >>> So, here is the problem that I am trying to solve. I am moving from
> > > Solr
> > > >>> master-slave architecture to SolrCloud architecture. I have one
> > custom
> > > >> Solr
> > > >>> plugin that does following:
> > > >>>
> > > >>> 1. When a document (say document with unique id doc1)is getting
> > indexed
> > > >> to
> > > >>> a core say core A then this plugin adds one more field to the
> > indexing
> > > >>> request. It fetches this new field from core B. Core B in our case
> > > >>> maintains popularity score field for each document which gets
> > > calculated
> > > >> in
> > > >>> a different project. It fetches the popularity score from score B
> for
> > > >> doc1
> > > >>> and adds it to indexing request.
> > > >>> 2. In following code, dataInfo.dataSource is the name of the core
> B.
> > > >>>
> > > >>> I can use the name of the core B like collection_shard1_replica_n21
> > and
> > > >> it
> > > >>> works. But it is not a good solution. What if I had a multiple
> shards
> > > for
> > > >>> core B? In that case the the doc1 that I am trying to find might
> not
> > be
> > > >>> present in collection_shard1_replica_n21.
> > > >>>
> > > >>> So is there something like,
> > > >>>
> > > >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> > > >>>
> > > >>> @Override
> > > >>> public void processAdd(AddUpdateCommand cmd) throws IOException {
> > > >>>   SolrInputDocument doc = cmd.getSolrInputDocument();
> > > >>>   String uniqueId = getUniqueId(doc);
> > > >>>
> > > >>>   SolrCore dataCore =
> > > >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> > > >>>
> > > >>>   if (dataCore == null){
> > > >>>       LOG.error("Solr core '{}' to use as data source could not be
> > > >>> found!  "
> > > >>>               + "Please check if it is loaded.",
> > dataInfo.dataSource);
> > > >>>   } else{
> > > >>>
> > > >>>          Document sourceDoc = getSourceDocument(dataCore,
> uniqueId);
> > > >>>
> > > >>>          if (sourceDoc != null){
> > > >>>
> > > >>>              populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> > > >>>          }
> > > >>>   }
> > > >>>
> > > >>>   // pass it up the chain
> > > >>>   super.processAdd(cmd);
> > > >>> }
> > > >>>
> > > >>>
> > > >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <
> > > erickerick...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> No, you cannot just use the collection name. Replicas are just
> > cores.
> > > >>>> You can host many replicas of a single collection on a single Solr
> > > node
> > > >>>> in a single CoreContainer (there’s only one per Solr JVM). If you
> > just
> > > >>>> specified a collection name how would the code have any clue which
> > > >>>> of the possibilities to return?
> > > >>>>
> > > >>>> The name is in the form collection_shard1_replica_n21
> > > >>>>
> > > >>>> How do you know where the doc you’re working on? Put the ID
> through
> > > >>>> the hashing mechanism.
> > > >>>>
> > > >>>> This isn’t the same at all if you’re running stand-alone, then
> > there’s
> > > >>> only
> > > >>>> one name.
> > > >>>>
> > > >>>> But as I indicated above, your ask for just using the collection
> > name
> > > >>> isn’t
> > > >>>> going to work by definition.
> > > >>>>
> > > >>>> So perhaps this is an XY problem. You’re asking about getCore,
> which
> > > is
> > > >>>> a very specific, low-level concept. What are you trying to do at a
> > > >> higher
> > > >>>> level? Why do you think you need to get a core? What do you want
> to
> > > >> _do_
> > > >>>> with the doc that you need the core it resides in?
> > > >>>>
> > > >>>> Best,
> > > >>>> Erick
> > > >>>>
> > > >>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <
> > arnoldbron...@gmail.com
> > > >>>
> > > >>>> wrote:
> > > >>>>>
> > > >>>>> Wait, would I need to use core name like
> > > >> collection1_shard1_replica_n4
> > > >>>>> etc/? Can't I use collection name? What if  I have multiple
> shards,
> > > >> how
> > > >>>>> would I know where does the document that I am working with lives
> > in
> > > >>>>> currently.
> > > >>>>> I would rather prefer to use collection name and expect the core
> > > >>>>> information to be abstracted out that way.
> > > >>>>>
> > > >>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> > > >>> erickerick...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hmmm, should work. What is your core_name? There’s strings like
> > > >>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure
> you’re
> > > >>> using
> > > >>>> the
> > > >>>>>> right one?
> > > >>>>>>
> > > >>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
> > > >> arnoldbron...@gmail.com
> > > >>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> In a custom Solr plugin code,
> > > >>>>>>> req.getCore().getCoreContainer().getCore(core_name) is
> returning
> > > >> null
> > > >>>>>> even
> > > >>>>>>> if core by name core_name is loaded and up in Solr. req is
> object
> > > >>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud
> > mode.
> > > >>>>>>>
> > > >>>>>>> Any ideas on why this might be the case?
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to