Re: Reindexing using dataimporthandler

Bjarke Buur Mortensen Mon, 27 Apr 2020 06:16:43 -0700

Wow, thanks. Erick. That's actually much better :-)
You live and you learn.


Cheers,
Bjarke

Den man. 27. apr. 2020 kl. 15.00 skrev Erick Erickson <
erickerick...@gmail.com>:

> What about the Collections API REINDEXCOLLECTION? That has the
> advantage of being something officially supported, puts the source
> collection into read-only mode, uses a much more efficient query
> process (streaming actually) etc.
>
> It has the disadvantage of producing a new collection under the
> covers and aliasing to it. But you can always rename the collection
> later.
>
> Best,
> Erick
>
> > On Apr 27, 2020, at 8:23 AM, Bjarke Buur Mortensen <
> morten...@eluence.com> wrote:
> >
> > Thanks for the reply,
> > I'm on solr 8.2 so cursorMark is there.
> >
> > Doing this from one collection to another collection, and then use a
> > collection alias is probably the way to go, but  actually, my suggestion
> > was a little more bold:
> >
> > I'm indexing on top of the same core, i.e from
> > http://localhost:8983/solr/mycollection to
> > http://localhost:8983/solr/mycollection
> >
> > (This is why I suggested adding a version:[* TO
> <current_highest_version>]
> > to ensure it terminates for large imports.)
> >
> > With this in mind, are you still thinking this is a safe approach?
> >
> > Thanks,
> > Bjarke
> >
> >
> > Den man. 27. apr. 2020 kl. 13.46 skrev Emir Arnautović <
> > emir.arnauto...@sematext.com>:
> >
> >> Hi Bjarke,
> >> I don’t see a problem with that approach if you have enough resources to
> >> handle both cores at the same time, especially if you are doing that
> while
> >> serving production queries. The only issue is that if you plan to do
> that
> >> then you have to have all fields stored. Also note that cursorMark
> support
> >> was added a bit later to entity processor, so if you are running a bit
> >> older version of Solr, you might not have cursors - I’ve found it the
> hard
> >> way.
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 27 Apr 2020, at 13:11, Bjarke Buur Mortensen <morten...@eluence.com
> >
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> Let's say I add a copyField to my solr schema, or change the analysis
> >> chain
> >>> of a field or some other change.
> >>> It seems to me to be an alluring choice to use a very simple
> >>> dataimporthandler to reindex all documents, by using a
> >> SolrEntityProcessor
> >>> that points to itself. I have just done this for a very small
> collection,
> >>> but I was wondering what the caveats are, since this is not the
> >> recommended
> >>> practice. What can go wrong using this approach?
> >>>
> >>> <document> <entity name="all_from_self" processor="SolrEntityProcessor"
> >> url=
> >>> "http://localhost:8983/solr/mycollection"; qt="lucene" query="*:*" wt=
> >>> "javabin" rows="1000" cursorMark="true" sort="id asc" fl=
> >>> "*,orig_version_l:_version_"/> </document>
> >>>
> >>> PS: (It is probably necessary to add a version:[* TO
> >>> <current_highest_version>] to ensure it terminates for large imports)
> >>> PPS: (Obviously you shouldn't add the clean parameter)
> >>>
> >>> /Bjarke
> >>
> >>
>
>

Re: Reindexing using dataimporthandler

Reply via email to