Jean-Louis: One of the great advantages of open source is that it allows people to look at a problem with “fresh eyes” and add to the project in a way that help other people who aren’t steeped in the arcana of Lucene/Solr. So it’d be great if you could go ahead and make a patch and JIRA to put this information in a place that makes the most sense to someone coming in fresh.
And I fully appreciate that “it’ in the reference guide” isn’t adequate, it’s over 1,300 pages last I knew. So putting this information somewhere that someone like yourself is likely to find it is the best option… If you create a JIRA and patch, use “@erick” in the comment and I’ll see it and we can incorporate the info. Best, Erick. > On May 19, 2020, at 2:57 PM, VILA Jean-Louis > <jean-louis.v...@sword-group.com> wrote: > > Erick > > I just suggest a dedicated page to upgrade path because reading the page > about indexUpgraderTool, we understand well that we can’t upgrade in one > phase but 6->7->8 must be made and nowhere it is specified that from Lucene > 6, the segments are marked V6 for ever. > Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my > point of view, we must speak about “since Lucene 6, version is compatible > previous major version of an index” not upgrading. The term is ambiguous. > The thinks must be clear, I understand the problem :-) > Jean louis > >> Le 19 mai 2020 à 19:03, Erick Erickson <erickerick...@gmail.com> a écrit : >> >> Jean-Louis: >> >> One explication is here: >> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&reserved=0, >> but then again the reference guide is very long, I’m not sure how to make >> it more findable. Or, for that matter, whether it should be part of the >> IndexUpgraderTool section or not. Please feel free to suggest (even better, >> submit a patch) if you can think of a place it’d be more easily findable. >> It’s always useful to have someone with fresh eyes weigh in. >> >> Optimize won’t work. Under the covers, optimize is just a merge. It uses the >> exact same low-level merging code that background merging uses, including >> preserving the markers in the segment files. That’s why the Lucene devs use >> “forceMerge” rather than “optimize”, the latter is easy to interpret as >> something that does more than it really does. >> >> This is also the same code that IndexUpgraderTool uses too for that matter. >> IndexUpgraderTool is, really, just a forceMerge down to one segment, which >> is all optimize is (assuming you specify maxSegments=1). >> >> Best, >> Erick >> >>> On May 19, 2020, at 11:42 AM, Walter Underwood <wun...@wunderwood.org> >>> wrote: >>> >>> Hmm, might be able to hack this with optimize (forced merge). >>> >>> First, you would have to add enough extra documents to force a rewrite of >>> all segments. That might be as many documents as are already in the index. >>> You could set a “fake:true” field and filter them out with an fq. Or make >>> sure they have no searchable text. >>> >>> After adding all those, run optimize. This should rewrite all the segments >>> in the new format. >>> >>> Finally, delete all the extra documents. Might want to do another optimize >>> after that. >>> >>> No guarantee that this desperate hack will work. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&reserved=0 >>> (my blog) >>> >>>>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis >>>>> <jean-louis.v...@sword-group.com> wrote: >>>> >>>> Many thanks for your answers Erik. >>>> >>>> Effectively, I've read this into many different threads that the migration >>>> path will not be guaranteed but, what's strange is that there's no formal >>>> information on this impossibility because clearly we can't migrate to v8 >>>> if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al >>>> least a simple documentation about the fact that a Lucene 6 segments can't >>>> be upgrade into Lucene 8 would be appreciate. >>>> >>>> More, the check tool just shows v7.7.3 index and there is no mention about >>>> "real" segment version which v6! So forbid to open v7 lucene indexes >>>> upgraded from v6, is quiet brutal and the rule about that we can migrate >>>> only from previous major version is not completely true :-( >>>> I'll stay into v7.7.3 >>>> >>>> Thanks again, >>>> Jean-Louis >>>> >>>> -----Original Message----- >>>> From: Erick Erickson <erickerick...@gmail.com> >>>> Sent: mardi 19 mai 2020 15:00 >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6 >>>> >>>> This will not work. Lucene has never promised this upgrade path would >>>> work, the “one major version back-compat” means that Lucene X has special >>>> handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a >>>> marker is written into the segments recording the version of Lucene the >>>> segment was written with. That marker is preserved through all >>>> merges/upgrades/whatever. >>>> >>>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no >>>> marker at all for earlier versions), then Lucene will refuse to open the >>>> index. >>>> >>>> IndexUpgraderTool and the like simply cannot synthesize the new index >>>> format, the most succinct explanation I’ve seen is from Robert Muir: >>>> >>>> “I think the key issue here is Lucene is an index not a database. Because >>>> it is a lossy index and does not retain all of the user's data, its not >>>> possible to safely migrate some things automagically. In the norms case >>>> IndexWriter needs to re-analyze the text ("re-index") and compute stats to >>>> get back the value, so it can be re-encoded. The function is y = f(x) and >>>> if x is not available its not possible, so lucene can't do it.” >>>> >>>> So you’ll have to re-index your corpus with Solr 8 I’m afraid. >>>> >>>> Best, >>>> Erick >>>> >>>> >>>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis >>>>> <jean-louis.v...@sword-group.com> wrote: >>>>> >>>>> Dear all, >>>>> >>>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest >>>>> version 8.5.1. >>>>> Context : >>>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't >>>>> reindex documents because old ones doesn't exist anymore, so no other >>>>> choices than upgrading indexes. >>>>> >>>>> Our upgrading strategy is based on indexUpgrader Tool. >>>>> 5.4.1 -> 5.5.5 : Ok >>>>> 5.5.5 -> 6.6.6 : Ok >>>>> 6.6.6 -> 7.7.3 : ok >>>>> Unable to upgrade 7.7.3 to 8.5.1 : here my problem using >>>>> 8.5.1, indexUpgrader : >>>>> >>>>> Exception in thread "main" >>>>> org.apache.lucene.index.IndexFormatTooOldException: Format version is not >>>>> supported (resource >>>>> BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): >>>>> This index was initially created with Lucene 6.x while the current >>>>> version is 8.5.1 and Lucene only supports reading the current and >>>>> previous major versions.. This version of Lucene only supports indexes >>>>> created with release 7.0 and later. >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289) >>>>> at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432) >>>>> at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434) >>>>> at >>>>> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285) >>>>> at >>>>> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158) >>>>> at >>>>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78) >>>>> >>>>> But when I check the index version with 7.7.3, the segment seems to be >>>>> 7.7.3! >>>>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments >>>>> file=segments_2nz0 numSegments=1 version=7.7.3 >>>>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951} >>>>> 1 of 1: name=_2rr9t maxDoc=50756501 >>>>> version=7.7.3 >>>>> id=9pubpiwgt38rzyxr7litvgcu5 >>>>> codec=Lucene70 >>>>> compound=false >>>>> numFiles=10 >>>>> size (MB)=338,143.905 >>>>> diagnostics = {os=Linux, java.vendor=Oracle Corporation, >>>>> java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, >>>>> mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, >>>>> source=merge, mergeFactor=2, os.version=3.13.0-147-generic, >>>>> timestamp=1589484981711} >>>>> no deletions >>>>> test: open reader.........OK [took 2.779 sec] >>>>> >>>>> When I read the different thread, some people say that when a segment is >>>>> "marked as v6 lucene index", this mark remains across upgrading, so we >>>>> are stucked in 7.7.3 version. >>>>> >>>>> What are my options? >>>>> >>>>> Many many thanks for your help, >>>>> Jean-Louis >>>>> >>>>> >>>>> >>>>> Jean-Louis Vila, PhD >>>>> Directeur technique >>>>> Sword SAS >>>>> >>>>> d +33 4 72 85 37 60 >>>>> m +33 6 17 81 14 69 >>>>> t +33 4 72 85 37 40 >>>>> e >>>>> jean-louis.v...@sword-group.com<mailto:jean-louis.v...@sword-group.com> >>>>> >>>>> 9 avenue Charles de Gaulle >>>>> 69771, Saint Didier au Mont d'Or >>>>> France >>>>> http://www.sword-group.com/<http://www.sword-group.com/> >>>>> P Pensez à l'environnement avant d'imprimer ce message / Please consider >>>>> the environment before printing this mail note. >>>>> Ce message et toutes les pièces jointes (ci-après le "message") sont >>>>> établis à l'intention exclusive de ses destinataires et sont >>>>> confidentiels. Si vous recevez ce message par erreur, merci de le >>>>> détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de >>>>> ce message non conforme à sa destination, toute diffusion ou toute >>>>> publication, totale ou partielle, est interdite, sauf autorisation >>>>> expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, >>>>> le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au >>>>> titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou >>>>> falsifié. Le Groupe Sword vous remercie de votre attention. >>>>> >>>> >>> >>