_Why_ is reindexing not an option? 200M doc isn't that many.
Since you have Atomic updates working, you could easily
write a little program that pulled the docs from you existing
collection and pushed them to a new one with the new schema.

Do use CursorMark if you try that.... You have to be ready to
reindex as time passes, either to upgrade to a major version
2 greater than what you're using now or because the requirements
change yet again.

Best,
Erick

On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami <rahul196...@gmail.com> wrote:
>
> Eric, Markus,
> Thank you for your inputs. I made sure that the jar file is found correctly
> since the core reloads fine and also prints the log lines from my processor
> during update request (getInstane() method of the update factory). The
> reason why I want to insert the processor between distributed update
> processor (DUP) and run update processor (RUP) is because there are certain
> fields which were indexed against a dynamic field “*” and later the schema
> was patched to remove the * field, causing atomic updates to fail for such
> documents. Reindexing is not option since the index has nearly 200 million
> docs. My understanding is that the atomic updates are stitched back to a
> complete document in the DUP before being reindexed by RUP. Hence if I am
> able to access the document before being indexed and check for fields which
> are not defined in the schema, I can remove them from the stitched back
> document so that the atomic update can happen successfully for such docs.
> The documentation below mentions that even if I don’t include the DUP in my
> chain it is automatically inserted just before RUP.
>
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
>
>
> I tried both approaches viz. explicitly specifying my processor after DUP
> in the chain and also tried using the “post-processor” option in the chain,
> to have the custom processor execute after DUP. Still looks like the
> processor is just short circuited. I have defined my logic in the
> processAdd() of the  processor. Is this an expected behavior?
>
> Regards,
> Rahul
>
>
> On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > It Depends (tm). This is a little confused. Why do you have
> > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > distribute updates so that seems odd. Do try switching it around and
> > putting it on top, this should be OK since distributed is irrelevant.
> >
> > You can also just set a breakpoint and see for instance, the
> > instructions in the "IntelliJ" section here:
> > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> >
> > One thing I'd do is make very, very sure that my jar file was being
> > found. IIRC, the -v startup option will log exactly where solr looks
> > for jar files. Be sure your custom jar is in one of them and is picked
> > up. I've set a lib directive to one place only to discover that
> > there's an old copy lying around someplace else....
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> > <markus.jel...@openindex.io> wrote:
> > >
> > > Hello Rahul,
> > >
> > > I don't know why you don't see your logs lines, but if i remember
> > correctly, you must put all custom processors above Log, Distributed and
> > Run, at least i remember i read it somewhere a long time ago.
> > >
> > > We put all our custom processors on top of the three default processors
> > and they run just fine.
> > >
> > > Try it.
> > >
> > > Regards,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Rahul Goswami <rahul196...@gmail.com>
> > > > Sent: Wednesday 18th September 2019 22:20
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Custom update processor not kicking in
> > > >
> > > > Hello,
> > > >
> > > > I am using solr 7.2.1 in a standalone mode. I created a custom update
> > > > request processor and placed it between the distributed processor and
> > run
> > > > update processor in my chain. I made sure the chain is invoked since I
> > see
> > > > log lines from the getInstance() method of my processor factory. But I
> > > > don’t see any log lines from the processAdd() method.
> > > >
> > > > Any inputs on why the processor is getting skipped if placed after
> > > > distributed processor?
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> >

Reply via email to