Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Erick Erickson Tue, 19 May 2020 15:00:32 -0700

Jean-Louis:

One of the great advantages of open source is that it allows people to look at 
a problem with “fresh eyes” and add to the project in a way that help other 
people who aren’t steeped in the arcana of Lucene/Solr. So it’d be great if you 
could go ahead and make a patch and JIRA to put this information in a place 
that makes the most sense to someone coming in fresh.


And I fully appreciate that “it’ in the reference guide” isn’t adequate, it’s 
over 1,300 pages last I knew. So putting this information somewhere that 
someone like yourself is likely to find it is the best option…

If you create a JIRA and patch, use “@erick” in the comment and I’ll see it and 
we can incorporate the info.

Best,
Erick.

> On May 19, 2020, at 2:57 PM, VILA Jean-Louis 
> <jean-louis.v...@sword-group.com> wrote:
> 
> Erick
> 
> I just suggest a dedicated page to upgrade path because reading the page 
> about indexUpgraderTool, we understand well that we can’t upgrade in one 
> phase but 6->7->8 must be made and nowhere it is specified that from Lucene 
> 6, the segments are marked V6 for ever. 
> Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my 
> point of view, we must speak about “since Lucene 6, version is compatible 
> previous major version of an index” not upgrading. The term is ambiguous.
> The thinks must be clear, I understand the problem :-)
> Jean louis 
> 
>> Le 19 mai 2020 à 19:03, Erick Erickson <erickerick...@gmail.com> a écrit :
>> 
>> Jean-Louis:
>> 
>> One explication is here: 
>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&amp;reserved=0,
>>  but then again the reference guide is very long, I’m not sure how to make 
>> it more findable. Or, for that matter, whether it should be part of the 
>> IndexUpgraderTool section or not. Please feel free to suggest (even better, 
>> submit a patch) if you can think of a place it’d be more easily findable. 
>> It’s always useful to have someone with fresh eyes weigh in.
>> 
>> Optimize won’t work. Under the covers, optimize is just a merge. It uses the 
>> exact same low-level merging code that background merging uses, including 
>> preserving the markers in the segment files. That’s why the Lucene devs use 
>> “forceMerge” rather than “optimize”, the latter is easy to interpret as 
>> something that does more than it really does.
>> 
>> This is also the same code that IndexUpgraderTool uses too for that matter. 
>> IndexUpgraderTool is, really, just a forceMerge down to one segment, which 
>> is all optimize is (assuming you specify maxSegments=1).
>> 
>> Best,
>> Erick
>> 
>>> On May 19, 2020, at 11:42 AM, Walter Underwood <wun...@wunderwood.org> 
>>> wrote:
>>> 
>>> Hmm, might be able to hack this with optimize (forced merge).
>>> 
>>> First, you would have to add enough extra documents to force a rewrite of 
>>> all segments. That might be as many documents as are already in the index. 
>>> You could set a “fake:true” field and filter them out with an fq. Or make 
>>> sure they have no searchable text.
>>> 
>>> After adding all those, run optimize. This should rewrite all the segments 
>>> in the new format.
>>> 
>>> Finally, delete all the extra documents. Might want to do another optimize 
>>> after that.
>>> 
>>> No guarantee that this desperate hack will work.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&amp;reserved=0
>>>   (my blog)
>>> 
>>>>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis 
>>>>> <jean-louis.v...@sword-group.com> wrote:
>>>> 
>>>> Many thanks for your answers Erik. 
>>>> 
>>>> Effectively, I've read this into many different threads that the migration 
>>>> path will not be guaranteed but, what's strange is that there's no formal 
>>>> information on this impossibility because clearly we can't migrate to v8 
>>>> if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al 
>>>> least a simple documentation about the fact that a Lucene 6 segments can't 
>>>> be upgrade into Lucene 8 would be appreciate.
>>>> 
>>>> More, the check tool just shows v7.7.3 index and there is no mention about 
>>>> "real" segment version which v6! So forbid to open v7 lucene indexes 
>>>> upgraded from v6, is quiet brutal and the rule about that we can migrate 
>>>> only from previous major version is not completely true :-(
>>>> I'll stay into v7.7.3
>>>> 
>>>> Thanks again,
>>>> Jean-Louis
>>>> 
>>>> -----Original Message-----
>>>> From: Erick Erickson <erickerick...@gmail.com> 
>>>> Sent: mardi 19 mai 2020 15:00
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>>>> 
>>>> This will not work. Lucene has never promised this upgrade path would 
>>>> work, the “one major version back-compat” means that Lucene X has special 
>>>> handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a 
>>>> marker is written into the segments recording the version of Lucene the 
>>>> segment was written with. That marker is preserved through all 
>>>> merges/upgrades/whatever.
>>>> 
>>>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no 
>>>> marker at all for earlier versions), then Lucene will refuse to open the 
>>>> index.
>>>> 
>>>> IndexUpgraderTool and the like simply cannot synthesize the new index 
>>>> format, the most succinct explanation I’ve seen is from Robert Muir:
>>>> 
>>>> “I think the key issue here is Lucene is an index not a database. Because 
>>>> it is a lossy index and does not retain all of the user's data, its not 
>>>> possible to safely migrate some things automagically. In the norms case 
>>>> IndexWriter needs to re-analyze the text ("re-index") and compute stats to 
>>>> get back the value, so it can be re-encoded. The function is y = f(x) and 
>>>> if x is not available its not possible, so lucene can't do it.”
>>>> 
>>>> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> 
>>>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis 
>>>>> <jean-louis.v...@sword-group.com> wrote:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest 
>>>>> version 8.5.1.
>>>>>            Context :
>>>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>>>>> reindex documents because old ones doesn't exist anymore, so no other 
>>>>> choices than upgrading indexes.
>>>>> 
>>>>> Our upgrading strategy is based on indexUpgrader Tool.
>>>>>            5.4.1 -> 5.5.5 : Ok
>>>>>            5.5.5 -> 6.6.6 : Ok
>>>>>            6.6.6 -> 7.7.3 : ok
>>>>>            Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 
>>>>> 8.5.1, indexUpgrader :
>>>>> 
>>>>> Exception in thread "main" 
>>>>> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
>>>>> supported (resource 
>>>>> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))):
>>>>>  This index was initially created with Lucene 6.x while the current 
>>>>> version is 8.5.1 and Lucene only supports reading the current and 
>>>>> previous major versions.. This version of Lucene only supports indexes 
>>>>> created with release 7.0 and later.
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>>>>    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>>>>    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>>>>    at 
>>>>> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>>>>    at 
>>>>> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>>>>    at 
>>>>> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>>>>    at 
>>>>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>>>>> 
>>>>> But when I check the index version with 7.7.3, the segment seems to be 
>>>>> 7.7.3!
>>>>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
>>>>> file=segments_2nz0 numSegments=1 version=7.7.3 
>>>>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>>>>> 1 of 1: name=_2rr9t maxDoc=50756501
>>>>> version=7.7.3
>>>>> id=9pubpiwgt38rzyxr7litvgcu5
>>>>> codec=Lucene70
>>>>> compound=false
>>>>> numFiles=10
>>>>> size (MB)=338,143.905
>>>>> diagnostics = {os=Linux, java.vendor=Oracle Corporation, 
>>>>> java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, 
>>>>> mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, 
>>>>> source=merge, mergeFactor=2, os.version=3.13.0-147-generic, 
>>>>> timestamp=1589484981711}
>>>>> no deletions
>>>>> test: open reader.........OK [took 2.779 sec]
>>>>> 
>>>>> When I read the different thread, some people say that when a segment is 
>>>>> "marked as v6 lucene index", this mark remains across upgrading, so we 
>>>>> are stucked in 7.7.3 version.
>>>>> 
>>>>> What are my options?
>>>>> 
>>>>> Many many thanks for your help,
>>>>> Jean-Louis
>>>>> 
>>>>> 
>>>>> 
>>>>> Jean-Louis Vila, PhD
>>>>> Directeur technique
>>>>> Sword SAS
>>>>> 
>>>>> d         +33 4 72 85 37 60
>>>>> m        +33 6 17 81 14 69
>>>>> t          +33 4 72 85 37 40
>>>>> e         
>>>>> jean-louis.v...@sword-group.com<mailto:jean-louis.v...@sword-group.com>
>>>>> 
>>>>> 9 avenue Charles de Gaulle
>>>>> 69771, Saint Didier au Mont d'Or
>>>>> France
>>>>> http://www.sword-group.com/<http://www.sword-group.com/>
>>>>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider 
>>>>> the environment before printing this mail note.
>>>>> Ce message et toutes les pièces jointes (ci-après le "message") sont 
>>>>> établis à l'intention exclusive de ses destinataires et sont 
>>>>> confidentiels. Si vous recevez ce message par erreur, merci de le 
>>>>> détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de 
>>>>> ce message non conforme à sa destination, toute diffusion ou toute 
>>>>> publication, totale ou partielle, est interdite, sauf autorisation 
>>>>> expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, 
>>>>> le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au 
>>>>> titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou 
>>>>> falsifié. Le Groupe Sword vous remercie de votre attention.
>>>>> 
>>>> 
>>> 
>>

Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Reply via email to