Re: Custom update processor not kicking in

Rahul Goswami Thu, 19 Sep 2019 10:11:26 -0700

Eric,
The 200 million docs are all large as they are content indexed. Also it
would be hard to convince the customer to rebuild their index. But more
than that, I also want to clear my understanding on this topic and know if
it’s an expected behaviour for a distributed update processor to not call
any further custom processors other than the run update processor in
standalone mode? Alternatively, is there a way I can get a handle on a
complete document once it’s reconstructed from an atomic update?


Thanks,
Rahul

On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> _Why_ is reindexing not an option? 200M doc isn't that many.
> Since you have Atomic updates working, you could easily
> write a little program that pulled the docs from you existing
> collection and pushed them to a new one with the new schema.
>
> Do use CursorMark if you try that.... You have to be ready to
> reindex as time passes, either to upgrade to a major version
> 2 greater than what you're using now or because the requirements
> change yet again.
>
> Best,
> Erick
>
> On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >
> > Eric, Markus,
> > Thank you for your inputs. I made sure that the jar file is found
> correctly
> > since the core reloads fine and also prints the log lines from my
> processor
> > during update request (getInstane() method of the update factory). The
> > reason why I want to insert the processor between distributed update
> > processor (DUP) and run update processor (RUP) is because there are
> certain
> > fields which were indexed against a dynamic field “*” and later the
> schema
> > was patched to remove the * field, causing atomic updates to fail for
> such
> > documents. Reindexing is not option since the index has nearly 200
> million
> > docs. My understanding is that the atomic updates are stitched back to a
> > complete document in the DUP before being reindexed by RUP. Hence if I am
> > able to access the document before being indexed and check for fields
> which
> > are not defined in the schema, I can remove them from the stitched back
> > document so that the atomic update can happen successfully for such docs.
> > The documentation below mentions that even if I don’t include the DUP in
> my
> > chain it is automatically inserted just before RUP.
> >
> >
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
> >
> >
> > I tried both approaches viz. explicitly specifying my processor after DUP
> > in the chain and also tried using the “post-processor” option in the
> chain,
> > to have the custom processor execute after DUP. Still looks like the
> > processor is just short circuited. I have defined my logic in the
> > processAdd() of the  processor. Is this an expected behavior?
> >
> > Regards,
> > Rahul
> >
> >
> > On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > It Depends (tm). This is a little confused. Why do you have
> > > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > > distribute updates so that seems odd. Do try switching it around and
> > > putting it on top, this should be OK since distributed is irrelevant.
> > >
> > > You can also just set a breakpoint and see for instance, the
> > > instructions in the "IntelliJ" section here:
> > > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> > >
> > > One thing I'd do is make very, very sure that my jar file was being
> > > found. IIRC, the -v startup option will log exactly where solr looks
> > > for jar files. Be sure your custom jar is in one of them and is picked
> > > up. I've set a lib directive to one place only to discover that
> > > there's an old copy lying around someplace else....
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> > > <markus.jel...@openindex.io> wrote:
> > > >
> > > > Hello Rahul,
> > > >
> > > > I don't know why you don't see your logs lines, but if i remember
> > > correctly, you must put all custom processors above Log, Distributed
> and
> > > Run, at least i remember i read it somewhere a long time ago.
> > > >
> > > > We put all our custom processors on top of the three default
> processors
> > > and they run just fine.
> > > >
> > > > Try it.
> > > >
> > > > Regards,
> > > > Markus
> > > >
> > > > -----Original message-----
> > > > > From:Rahul Goswami <rahul196...@gmail.com>
> > > > > Sent: Wednesday 18th September 2019 22:20
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Custom update processor not kicking in
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am using solr 7.2.1 in a standalone mode. I created a custom
> update
> > > > > request processor and placed it between the distributed processor
> and
> > > run
> > > > > update processor in my chain. I made sure the chain is invoked
> since I
> > > see
> > > > > log lines from the getInstance() method of my processor factory.
> But I
> > > > > don’t see any log lines from the processAdd() method.
> > > > >
> > > > > Any inputs on why the processor is getting skipped if placed after
> > > > > distributed processor?
> > > > >
> > > > > Thanks,
> > > > > Rahul
> > > > >
> > >
>

Re: Custom update processor not kicking in

Reply via email to