[DISCUSS] Upgrading to Lucene 7.1.0
Hi dev team, Recently, a commit was pushed to develop which upgraded the Lucene version used in Apache Geode to 7.1.0. These new Lucene indexes are not compatible with the previous versions hence it breaks the rolling upgrade contract. We are no longer able to execute Lucene queries when there are severs of mixed versions in the cluster. One solution that was provided was to not allow Lucene queries to be executed when there are mixed versions of servers present in the cluster. After a discussion, it was put forward that this was not an optimal user experience. Also, that this change of default behavior was not discussed in the dev list. Alternative solutions that were put forward were that Geode allowed the user to pick the version of Lucene to be used. Another that since this is a major upgrade to Lucene with some breaking API changes, should we sync this upgrade with the release of Apache Geode 2.0 We would love to hear your thoughts or alternative solutions to this issue. Regards Nabarun.
Re: [DISCUSS] Upgrading to Lucene 7.1.0
Would it be possible to allow an operator to choose which version of Lucene to use? Such that if they are prepared for the issues you describe, they could still go ahead and upgrade to 7.10. Or are there breaking API changes which would make that hard/impossible to accommodate in our code base? --Jens From: nabarun nag Date: Monday, September 27, 2021 at 11:49 AM To: dev@geode.apache.org Subject: [DISCUSS] Upgrading to Lucene 7.1.0 Hi dev team, Recently, a commit was pushed to develop which upgraded the Lucene version used in Apache Geode to 7.1.0. These new Lucene indexes are not compatible with the previous versions hence it breaks the rolling upgrade contract. We are no longer able to execute Lucene queries when there are severs of mixed versions in the cluster. One solution that was provided was to not allow Lucene queries to be executed when there are mixed versions of servers present in the cluster. After a discussion, it was put forward that this was not an optimal user experience. Also, that this change of default behavior was not discussed in the dev list. Alternative solutions that were put forward were that Geode allowed the user to pick the version of Lucene to be used. Another that since this is a major upgrade to Lucene with some breaking API changes, should we sync this upgrade with the release of Apache Geode 2.0 We would love to hear your thoughts or alternative solutions to this issue. Regards Nabarun.
Re: [DISCUSS] Upgrading to Lucene 7.1.0
Does anyone have more context on why lucene queries won't work during the rolling upgrade? I can see what added a line to the documentation and changed the tests not to do queries, but I'm not sure why we needed to do that. -Dan From: nabarun nag Sent: Monday, September 27, 2021 11:48 AM To: dev@geode.apache.org Subject: [DISCUSS] Upgrading to Lucene 7.1.0 Hi dev team, Recently, a commit was pushed to develop which upgraded the Lucene version used in Apache Geode to 7.1.0. These new Lucene indexes are not compatible with the previous versions hence it breaks the rolling upgrade contract. We are no longer able to execute Lucene queries when there are severs of mixed versions in the cluster. One solution that was provided was to not allow Lucene queries to be executed when there are mixed versions of servers present in the cluster. After a discussion, it was put forward that this was not an optimal user experience. Also, that this change of default behavior was not discussed in the dev list. Alternative solutions that were put forward were that Geode allowed the user to pick the version of Lucene to be used. Another that since this is a major upgrade to Lucene with some breaking API changes, should we sync this upgrade with the release of Apache Geode 2.0 We would love to hear your thoughts or alternative solutions to this issue. Regards Nabarun.
Re: [DISCUSS] Upgrading to Lucene 7.1.0
> On Sep 27, 2021, at 11:48 AM, nabarun nag wrote: > > Recently, a commit was pushed to develop which upgraded the Lucene > version used in Apache Geode to 7.1.0. These new Lucene indexes are > not compatible with the previous versions hence it breaks the rolling > upgrade contract. We are no longer able to execute Lucene queries when > there are severs of mixed versions in the cluster. Can you describe the problem with a little more detail? Does this mean that while there is a mix the execution throws an exception on all servers or is there a subset for which it works? If there is a subset for which it works, are those instances sufficient to provide accurate results if the instances that fail are ignored? -Jake
Re: [DISCUSS] Upgrading to Lucene 7.1.0
In simple words, if Lucene indexes were created by a new version (7.1.0), then replicated to others that are still in the older version, they won't understand the index, and the event processors start throwing exceptions. This can be simply seen by just re-enabling the query execution in the DUnit tests and commenting out the check blocks: [develop SHA: 68629356f561a932f5dfbace70b01d9971a42473] In LuceneEventListener if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return false; } In IndexRepositoryFactory: if (userRegion.getCache() != null && userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return null; } This is the exception that will be encountered: [Exception] [vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT tid=102] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:80) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:609) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.run(AbstractGatewaySenderEventProcessor.java:1051) [vm2_v1.2.0] Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) [vm2_v1.2.0] at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:216) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:302) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286) [vm2_v1.2.0] at org.apache.lucene.index.IndexWriter.(IndexWriter.java:938) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:84) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:42) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:116) [vm2_v1.2.0] ... 10 more Also: [vm2_v1.2.0] [warn 2021/09/27 14:24:42.134 PDT tid=106] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1
Re: [DISCUSS] Upgrading to Lucene 7.1.0
The solution for preventing the query executions to occur in the mixed version mode also caused some problems where the query function executions get repeatedly executed and that results in stack overflow. From: Nabarun Nag Sent: Monday, September 27, 2021 2:30 PM To: dev@geode.apache.org Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 In simple words, if Lucene indexes were created by a new version (7.1.0), then replicated to others that are still in the older version, they won't understand the index, and the event processors start throwing exceptions. This can be simply seen by just re-enabling the query execution in the DUnit tests and commenting out the check blocks: [develop SHA: 68629356f561a932f5dfbace70b01d9971a42473] In LuceneEventListener if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return false; } In IndexRepositoryFactory: if (userRegion.getCache() != null && userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return null; } This is the exception that will be encountered: [Exception] [vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT tid=102] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:80) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:609) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.run(AbstractGatewaySenderEventProcessor.java:1051) [vm2_v1.2.0] Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) [vm2_v1.2.0] at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:216) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:302) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286) [vm2_v1.2.0] at org.apache.lucene.index.IndexWriter.(IndexWriter.java:938) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:84) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:42) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:116) [vm2_v1.2.0] ... 10 more Also: [vm2_v1.2.0] [warn 2021/09/27 14:24:42.134 PDT tid=106] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0
Re: [DISCUSS] Upgrading to Lucene 7.1.0
Might I propose something here. There is currently a significant amount of work going into completing Geode-8705, which is the Classloader isolation. We are currently targeting to getting this release in Geode 1.16. My proposal is, that we use the capability that Patrick demo’d at the Community meeting (on this topic) where one, at runtime, can unload / load extensions (like our integration with Lucene). This means that one could possibly do a rolling upgrade on the existing system, and keep the versions of the Lucene integration stable. Once the whole system has been upgraded, the existing Lucene extension component is then unloaded, and the newer version of the extension component is then loaded. What this means, is that at runtime, there will be a period of time where Lucene queries will not be available and as part of the “load” lifecycle of the extension, there needs to be an initialization step, which will initialize the extension component safely. Once initialized, Lucene queries can then become available again, etc. This if course requires some work around the lifecycles of extension components and making sure that I can add the extension on at runtime and safely initialize it. I think this approach allows for a more seamless (lower downtime) upgrading of system and extension components. Thoughts? --Udo From: Nabarun Nag Date: Tuesday, September 28, 2021 at 7:33 AM To: dev@geode.apache.org Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 The solution for preventing the query executions to occur in the mixed version mode also caused some problems where the query function executions get repeatedly executed and that results in stack overflow. From: Nabarun Nag Sent: Monday, September 27, 2021 2:30 PM To: dev@geode.apache.org Subject: Re: [DISCUSS] Upgrading to Lucene 7.1.0 In simple words, if Lucene indexes were created by a new version (7.1.0), then replicated to others that are still in the older version, they won't understand the index, and the event processors start throwing exceptions. This can be simply seen by just re-enabling the query execution in the DUnit tests and commenting out the check blocks: [develop SHA: 68629356f561a932f5dfbace70b01d9971a42473] In LuceneEventListener if (cache.hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return false; } In IndexRepositoryFactory: if (userRegion.getCache() != null && userRegion.getCache().hasMemberOlderThan(KnownVersion.GEODE_1_15_0)) { logger.info("Some members are older than " + KnownVersion.GEODE_1_15_0.getName()); return null; } This is the exception that will be encountered: [Exception] [vm2_v1.2.0] [warn 2021/09/27 14:24:42.251 PDT tid=102] An Exception occurred. The dispatcher will continue. [vm2_v1.2.0] org.apache.geode.InternalGemFireError: Unable to create index repository [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.lambda$computeRepository$0(AbstractPartitionedRepositoryManager.java:118) [vm2_v1.2.0] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.computeRepository(AbstractPartitionedRepositoryManager.java:108) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:137) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.AbstractPartitionedRepositoryManager.getRepository(AbstractPartitionedRepositoryManager.java:76) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.process(LuceneEventListener.java:87) [vm2_v1.2.0] at org.apache.geode.cache.lucene.internal.LuceneEventListener.processEvents(LuceneEventListener.java:64) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:154) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:80) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:609) [vm2_v1.2.0] at org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.run(AbstractGatewaySenderEventProcessor.java:1051) [vm2_v1.2.0] Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6) [vm2_v1.2.0] at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:216) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:302) [vm2_v1.2.0] at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos