Eric, The 200 million docs are all large as they are content indexed. Also it would be hard to convince the customer to rebuild their index. But more than that, I also want to clear my understanding on this topic and know if it’s an expected behaviour for a distributed update processor to not call any further custom processors other than the run update processor in standalone mode? Alternatively, is there a way I can get a handle on a complete document once it’s reconstructed from an atomic update?
Thanks, Rahul On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson <erickerick...@gmail.com> wrote: > _Why_ is reindexing not an option? 200M doc isn't that many. > Since you have Atomic updates working, you could easily > write a little program that pulled the docs from you existing > collection and pushed them to a new one with the new schema. > > Do use CursorMark if you try that.... You have to be ready to > reindex as time passes, either to upgrade to a major version > 2 greater than what you're using now or because the requirements > change yet again. > > Best, > Erick > > On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami <rahul196...@gmail.com> > wrote: > > > > Eric, Markus, > > Thank you for your inputs. I made sure that the jar file is found > correctly > > since the core reloads fine and also prints the log lines from my > processor > > during update request (getInstane() method of the update factory). The > > reason why I want to insert the processor between distributed update > > processor (DUP) and run update processor (RUP) is because there are > certain > > fields which were indexed against a dynamic field “*” and later the > schema > > was patched to remove the * field, causing atomic updates to fail for > such > > documents. Reindexing is not option since the index has nearly 200 > million > > docs. My understanding is that the atomic updates are stitched back to a > > complete document in the DUP before being reindexed by RUP. Hence if I am > > able to access the document before being indexed and check for fields > which > > are not defined in the schema, I can remove them from the stitched back > > document so that the atomic update can happen successfully for such docs. > > The documentation below mentions that even if I don’t include the DUP in > my > > chain it is automatically inserted just before RUP. > > > > > https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain > > > > > > I tried both approaches viz. explicitly specifying my processor after DUP > > in the chain and also tried using the “post-processor” option in the > chain, > > to have the custom processor execute after DUP. Still looks like the > > processor is just short circuited. I have defined my logic in the > > processAdd() of the processor. Is this an expected behavior? > > > > Regards, > > Rahul > > > > > > On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > > > It Depends (tm). This is a little confused. Why do you have > > > distributed processor in stand-alone Solr? Stand-alone doesn't, well, > > > distribute updates so that seems odd. Do try switching it around and > > > putting it on top, this should be OK since distributed is irrelevant. > > > > > > You can also just set a breakpoint and see for instance, the > > > instructions in the "IntelliJ" section here: > > > https://cwiki.apache.org/confluence/display/solr/HowToContribute > > > > > > One thing I'd do is make very, very sure that my jar file was being > > > found. IIRC, the -v startup option will log exactly where solr looks > > > for jar files. Be sure your custom jar is in one of them and is picked > > > up. I've set a lib directive to one place only to discover that > > > there's an old copy lying around someplace else.... > > > > > > Best, > > > Erick > > > > > > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma > > > <markus.jel...@openindex.io> wrote: > > > > > > > > Hello Rahul, > > > > > > > > I don't know why you don't see your logs lines, but if i remember > > > correctly, you must put all custom processors above Log, Distributed > and > > > Run, at least i remember i read it somewhere a long time ago. > > > > > > > > We put all our custom processors on top of the three default > processors > > > and they run just fine. > > > > > > > > Try it. > > > > > > > > Regards, > > > > Markus > > > > > > > > -----Original message----- > > > > > From:Rahul Goswami <rahul196...@gmail.com> > > > > > Sent: Wednesday 18th September 2019 22:20 > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Custom update processor not kicking in > > > > > > > > > > Hello, > > > > > > > > > > I am using solr 7.2.1 in a standalone mode. I created a custom > update > > > > > request processor and placed it between the distributed processor > and > > > run > > > > > update processor in my chain. I made sure the chain is invoked > since I > > > see > > > > > log lines from the getInstance() method of my processor factory. > But I > > > > > don’t see any log lines from the processAdd() method. > > > > > > > > > > Any inputs on why the processor is getting skipped if placed after > > > > > distributed processor? > > > > > > > > > > Thanks, > > > > > Rahul > > > > > > > > >