Did you read the long explanation in this thread already about segment merging? If so, can you ask specific questions about the information in those?
Best, Erick > On Oct 17, 2020, at 8:23 AM, Vinay Rajput <vinayrajput4...@gmail.com> wrote: > > Sorry to jump into this discussion. I also get confused whenever I see this > strange Solr/Lucene behaviour. Probably, As @Erick said in his last year > talk, this is how it has been designed to avoid many problems that are > hard/impossible to solve. > > That said, one more time I want to come back to the same question: why > solr/lucene can not handle this when we are updating all the documents? > Let's take a couple of examples :- > > *Ex 1:* > Let's say I have only 10 documents in my index and all of them are in a > single segment (Segment 1). Now, I change the schema (update field type in > this case) and reindex all of them. > This is what (according to me) should happen internally :- > > 1st update req : Solr will mark 1st doc as deleted and index it again > (might run the analyser chain based on config) > 2nd update req : Solr will mark 2st doc as deleted and index it again > (might run the analyser chain based on config) > And so on...... > based on autoSoftCommit/autoCommit configuration, all new documents will be > indexed and probably flushed to disk as part of new segment (Segment 2) > > > Now, whenever segment merging happens (during commit or later in time), > lucene will create a new segment (Segment 3) can discard all the docs > present in segment 1 as there are no live docs in it. And there would *NOT* > be any situation to decide whether to choose the old config or new config > as there is not even a single live document with the old config. Isn't it? > > *Ex 2:* > I see that it can be an issue if we think about reindexing millions of > docs. Because in that case, merging can be triggered when indexing is half > way through, and since there are some live docs in the old segment (with > old cofig), things will blow up. Please correct me if I am wrong. > > I am *NOT* a Solr/Lucene expert and just started learning the ways things > are working internally. In the above example, I can be wrong at many > places. Can someone confirm if scenarios like Ex-2 are the reasons behind > the fact that even re-indexing all documents doesn't help if some > incompatible schema changes are done? Any other insight would also be > helpful. > > Thanks, > Vinay > > On Sat, Oct 17, 2020 at 5:48 AM Shawn Heisey <apa...@elyograg.org> wrote: > >> On 10/16/2020 2:36 PM, David Hastings wrote: >>> sorry, i was thinking just using the >>> <delete><query>*:*</query></delete> >>> method for clearing the index would leave them still >> >> In theory, if you delete all documents at the Solr level, Lucene will >> delete all the segment files on the next commit, because they are empty. >> I have not confirmed with testing whether this actually happens. >> >> It is far safer to use a new index as Erick has said, or to delete the >> index directories completely and restart Solr ... so you KNOW the index >> has nothing in it. >> >> Thanks, >> Shawn >>