My understanding from our previous discussion about upgrading lucene was that we talked about pausing the asynchronous indexing process during the rolling upgrade. I don't remember a discussion that it was ok to not allow queries during the upgrade. But this is what we added to the docs:
"All cluster members must be running the same major Lucene version in order to execute Lucene queries." What happens if a user runs a query during the rolling upgrade and why do we need to have this restriction? It seems to me like at a minimum we need to allow queries during the upgrade. We also should consider what will happen to users with server-side query or indexing code - will they be able to upgrade or are they likely to hit breaking changes in the Lucene API? -Dan ________________________________ From: Nabarun Nag <n...@vmware.com> Sent: Tuesday, September 28, 2021 7:13 AM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 But Mario, just for my clarification, if we re-enable the queries in the tests in the mixed version servers mode, it goes into a stackoverflow situation. That what i saw when i set hasLuceneVersionMismatch(host) to false in the test so that it does the query. Regards Naba ________________________________ From: Mario Kevo <mario.k...@est.tech> Sent: Tuesday, September 28, 2021 4:49 AM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Odg: [DISCUSS] Upgrading to Lucene 7.1.0 Hi all, Just a small clarification of the reverted PR. There were a lot of changes between Lucene versions 6.x and 7.x. There is an article for that Upgrading+to+Lucene+7.1.0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FUpgrading%2Bto%2BLucene%2B7.1.0&data=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682888690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QMiJUG1HtmOKNCXzn6KrBVM4gYdLoJeV8FjDFMkUL8I%3D&reserved=0>. The first larger change was in the scoring mechanism. We adapt it to one that is correct for us. (verified by DistributedScoringJUnitTest) The main change was in Lucene index format. There we come into a problem with our tests. Lucene 6.x cannot read the index format of Lucene 7.x. Through PRs we decided to include Lucene uplift in Geode 1.15.0 and add check if all members are on 1.15.0 version or higher (after uplift Lucene to a newer version with index format changes this should be changed). If a check is passed it will allow doing Lucene query, if not there will be a printed log that not all members on 1.15.0 or higher version. Also, you can found a discussion on dev list from 2 years ago about Lucene upgrade: Lucene Upgrade<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fmessage%2Fqwooctuz7ekaezor%3Fq%3Dlist%3Aorg.apache.geode.dev%2Border%3Adate-backward%2BLucene%2Bupgrade%26page%3D4%23query%3Alist%253Aorg.apache.geode.dev%2520order%253Adate-backward%2520Lucene%2520upgrade%2Bpage%3A4%2Bmid%3Aygjhsuikdrbuihap%2Bstate%3Aresults&data=04%7C01%7Cdasmith%40vmware.com%7C1d4830d3975e4380893508d9828a4707%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637684352682898695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZyKGoWh6nhpWPTiNIJHzLicfjCmW0yoq1fZa9aLngbQ%3D&reserved=0> BR, Mario ________________________________ Šalje: Udo Kohlmeyer <u...@vmware.com> Poslano: 28. rujna 2021. 1:44 Prima: dev@geode.apache.org <dev@geode.apache.org> Predmet: Re: [DISCUSS] Upgrading to Lucene 7.1.0 Might I propose something here. There is currently a significant amount of work going into completing Geode-8705, which is the Classloader isolation. We are currently targeting to getting this release in Geode 1.16. My proposal is, that we use the capability that Patrick demo’d at the Community meeting (on this topic) where one, at runtime, can unload / load extensions (like our integration with Lucene). This means that one could possibly do a rolling upgrade on the existing system, and keep the versions of the Lucene integration stable. Once the whole system has been upgraded, the existing Lucene extension component is then unloaded, and the newer version of the extension component is then loaded. What this means, is that at runtime, there will be a period of time where Lucene queries will not be available and as part of the “load” lifecycle of the extension, there needs to be an initialization step, which will initialize the extension component safely. Once initialized, Lucene queries can then become available again, etc. This if course requires some work around the lifecycles of extension components and making sure that I can add the extension on at runtime and safely initialize it. I think this approach allows for a more seamless (lower downtime) upgrading of system and extension components. Thoughts? --Udo From: Nabarun Nag <n...@vmware.com> Date: Tuesday, September 28, 2021 at 7:33 AM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 The solution for preventing the query executions to occur in the mixed version mode also caused some problems where the query function executions get repeatedly executed and that results in stack overflow. ________________________________ From: Nabarun Nag <n...@vmware.com> Sent: Monday, September 27, 2021 2:30 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 In simple words, if Lucene indexes were created by a new version (7.1.0), then replicated to others that are still in the older version, they won't understand the index, and the event processors start throwing exceptions. This can be simply seen by just re-enabling the query execution in the DUnit tests and commenting out the check blocks: [develop SHA: 68629356f561a932f5dfbace70b01d9971a42473] In LuceneEventListener if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return false; } In IndexRepositoryFactory: if (userRegion.getCache() != null && userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return null; } This is the exception that will be encountered: [Exception] [vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_5> tid=102] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:80) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:609) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.run(AbstractGatewaySenderEventProcessor.java:1051) [vm2_v1.2.0] Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) [vm2_v1.2.0] at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:216) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:302) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286) [vm2_v1.2.0] at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:84) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:42) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:116) [vm2_v1.2.0] ... 10 more Also: [vm2_v1.2.0] [warn 2021/09/27 14:24:42.134 PDT <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_7> tid=106] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:80) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:609) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.run(AbstractGatewaySenderEventProcessor.java:1051) [vm2_v1.2.0] Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) [vm2_v1.2.0] at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:216) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:302) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286) [vm2_v1.2.0] at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:84) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:42) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:116) [vm2_v1.2.0] ... 10 more ________________________________ From: Jacob Barrett <jabarr...@vmware.com> Sent: Monday, September 27, 2021 2:08 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 > On Sep 27, 2021, at 11:48 AM, nabarun nag <n...@apache.org> wrote: > > Recently, a commit was pushed to develop which upgraded the Lucene > version used in Apache Geode to 7.1.0. These new Lucene indexes are > not compatible with the previous versions hence it breaks the rolling > upgrade contract. We are no longer able to execute Lucene queries when > there are severs of mixed versions in the cluster. Can you describe the problem with a little more detail? Does this mean that while there is a mix the execution throws an exception on all servers or is there a subset for which it works? If there is a subset for which it works, are those instances sufficient to provide accurate results if the instances that fail are ignored? -Jake