Re: Lucene upgrade
Hi Dan, thanks for suggestions. I didn't found a way to write lucene in older format. They only support reading old format indexes with newer version by using lucene-backward- codec. Regarding to freeze writes to the lucene index, that means that we need to start locators and servers, create lucene index on the server, roll it to current and then do puts. In this case tests passed. Is it ok? BR, Mario On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: > I think the issue probably has to do with doing a rolling upgrade > from an > old version of geode (with an old version of lucene) to the new > version of > geode. > > Geode's lucene integration works by writing the lucene index to a > colocated > region. So lucene index data that was generated on one server can be > replicated or rebalanced to other servers. > > I think what may be happening is that data written by a geode member > with a > newer version is being read by a geode member with an old version. > Because > this is a rolling upgrade test, members with multiple versions will > be > running as part of the same cluster. > > I think to really fix this rolling upgrade issue we would need to > somehow > configure the new version of lucene to write data in the old format, > at > least until the rolling upgrade is complete. I'm not sure if that is > possible with lucene or not - but perhaps? Another option might be to > freeze writes to the lucene index during the rolling upgrade process. > Lucene indexes are asynchronous, so this wouldn't necessarily require > blocking all puts. But it would require queueing up a lot of updates. > > -Dan > > On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo > wrote: > > > Hi geode dev, > > > > I'm working on upgrade lucene to a newer version. ( > > https://issues.apache.org/jira/browse/GEODE-7309) > > > > I followed instruction from > > https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0 > > Also add some other changes that is needed for lucene 8.2.0. > > > > I found some problems with tests: > > * geode- > >lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist > > ribu > >ted/DistributedScoringJUnitTest.java: > > > > > > * > > geode- > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j > > ava: > > * > > geode- > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll > > ed.java: > > * > > ./geode- > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio > > n.java: > > * > > ./geode- > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart > > itionRegion.java: > > > > -> failed due to > > Caused by: org.apache.lucene.index.IndexFormatTooOldException: > > Format > > version is not supported (resource > > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 > > and > > 9). This version of Lucene only supports indexes created with > > release > > 6.0 and later. > > at > > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav > > a:21 > > 3) > > at > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3 > > 05) > > at > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2 > > 89) > > at > > org.apache.lucene.index.IndexWriter.(IndexWriter.java:846) > > at > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis > > hCom > > putingRepository(IndexRepositoryFactory.java:123) > > at > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu > > teIn > > dexRepository(IndexRepositoryFactory.java:66) > > at > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager > > .com > > puteRepository(PartitionedRepositoryManager.java:151) > > at > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager > > .lam > > bda$computeRepository$1(PartitionedRepositoryManager.java:170) > > ... 16 more > > > > > > * > > geode- > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl > > lBucketsCreated.java: > > > > -> failed with the same exception as previous tests > > > > > > I found this on web > > > > https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1 > > , but not have an idea how to proceed with that. > > > > Does anyone has any idea how to fix it? > > > > BR, > > Mario > > > > > > > > > >
Re: bug fix needed for release/1.11.0
The fix for this problem is in the CI pipeline today: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 On 11/5/19 10:49 AM, Owen Nichols wrote: +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks on develop) On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt wrote: The PR for GEODE-6661 introduced a problem in SSL communications that needs to be fixed. It changed SSL handshakes to use a temporary buffer that's discarded when the handshake completes, but sometimes this buffer contains application data that must be retained. This seems to be causing our Benchmark SSL test failures in CI. I'm preparing a fix. We can either revert the PR for GEODE-6661 on that branch or cherry-pick the correction when it's ready.
Re: bug fix needed for release/1.11.0
Any other votes? I have 2 people in favor. Voting will close at noon. Thanks, Mark > On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt wrote: > > The fix for this problem is in the CI pipeline today: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 > > On 11/5/19 10:49 AM, Owen Nichols wrote: >> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks >> on develop) >> >>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt >>> wrote: >>> >>> The PR for GEODE-6661 introduced a problem in SSL communications that needs >>> to be fixed. It changed SSL handshakes to use a temporary buffer that's >>> discarded when the handshake completes, but sometimes this buffer contains >>> application data that must be retained. This seems to be causing our >>> Benchmark SSL test failures in CI. >>> >>> I'm preparing a fix. We can either revert the PR for GEODE-6661 on that >>> branch or cherry-pick the correction when it's ready. >>>
Odg: bug fix needed for release/1.11.0
+1 for bringing this fix to release/1.11.0 Šalje: Mark Hanson Poslano: 6. studenog 2019. 18:28 Prima: dev@geode.apache.org Predmet: Re: bug fix needed for release/1.11.0 Any other votes? I have 2 people in favor. Voting will close at noon. Thanks, Mark > On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt wrote: > > The fix for this problem is in the CI pipeline today: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 > > On 11/5/19 10:49 AM, Owen Nichols wrote: >> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks >> on develop) >> >>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt >>> wrote: >>> >>> The PR for GEODE-6661 introduced a problem in SSL communications that needs >>> to be fixed. It changed SSL handshakes to use a temporary buffer that's >>> discarded when the handshake completes, but sometimes this buffer contains >>> application data that must be retained. This seems to be causing our >>> Benchmark SSL test failures in CI. >>> >>> I'm preparing a fix. We can either revert the PR for GEODE-6661 on that >>> branch or cherry-pick the correction when it's ready. >>>
Re: Odg: bug fix needed for release/1.11.0
Thanks Mario. Your vote reminded me not all voters are in the PST time zone.. Pardon my thoughtlessness.. Voting closes at 12pm PST > On Nov 6, 2019, at 9:33 AM, Mario Ivanac wrote: > > +1 for bringing this fix to release/1.11.0 > > Šalje: Mark Hanson > Poslano: 6. studenog 2019. 18:28 > Prima: dev@geode.apache.org > Predmet: Re: bug fix needed for release/1.11.0 > > Any other votes? I have 2 people in favor. > > Voting will close at noon. > > Thanks, > Mark > >> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt wrote: >> >> The fix for this problem is in the CI pipeline today: >> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 >> >> On 11/5/19 10:49 AM, Owen Nichols wrote: >>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks >>> on develop) >>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt wrote: The PR for GEODE-6661 introduced a problem in SSL communications that needs to be fixed. It changed SSL handshakes to use a temporary buffer that's discarded when the handshake completes, but sometimes this buffer contains application data that must be retained. This seems to be causing our Benchmark SSL test failures in CI. I'm preparing a fix. We can either revert the PR for GEODE-6661 on that branch or cherry-pick the correction when it's ready. >
Re: Odg: bug fix needed for release/1.11.0
Also perhaps some people are waiting to see how the fix actually fares in the develop pipeline, which we won’t know until around 4 or 5PM today... > On Nov 6, 2019, at 9:44 AM, Mark Hanson wrote: > > Thanks Mario. Your vote reminded me not all voters are in the PST time zone.. > Pardon my thoughtlessness.. > > Voting closes at 12pm PST > > >> On Nov 6, 2019, at 9:33 AM, Mario Ivanac wrote: >> >> +1 for bringing this fix to release/1.11.0 >> >> Šalje: Mark Hanson >> Poslano: 6. studenog 2019. 18:28 >> Prima: dev@geode.apache.org >> Predmet: Re: bug fix needed for release/1.11.0 >> >> Any other votes? I have 2 people in favor. >> >> Voting will close at noon. >> >> Thanks, >> Mark >> >>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt wrote: >>> >>> The fix for this problem is in the CI pipeline today: >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 >>> >>> On 11/5/19 10:49 AM, Owen Nichols wrote: +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks on develop) > On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt > wrote: > > The PR for GEODE-6661 introduced a problem in SSL communications that > needs to be fixed. It changed SSL handshakes to use a temporary buffer > that's discarded when the handshake completes, but sometimes this buffer > contains application data that must be retained. This seems to be > causing our Benchmark SSL test failures in CI. > > I'm preparing a fix. We can either revert the PR for GEODE-6661 on that > branch or cherry-pick the correction when it's ready. > >> >
Re: Odg: bug fix needed for release/1.11.0
we could just use GMT (Geode Mean Time) On Wed, Nov 6, 2019 at 9:45 AM Mark Hanson wrote: > Thanks Mario. Your vote reminded me not all voters are in the PST time > zone.. Pardon my thoughtlessness.. > > Voting closes at 12pm PST > > > > On Nov 6, 2019, at 9:33 AM, Mario Ivanac wrote: > > > > +1 for bringing this fix to release/1.11.0 > > > > Šalje: Mark Hanson > > Poslano: 6. studenog 2019. 18:28 > > Prima: dev@geode.apache.org > > Predmet: Re: bug fix needed for release/1.11.0 > > > > Any other votes? I have 2 people in favor. > > > > Voting will close at noon. > > > > Thanks, > > Mark > > > >> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt > wrote: > >> > >> The fix for this problem is in the CI pipeline today: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 > >> > >> On 11/5/19 10:49 AM, Owen Nichols wrote: > >>> +1 for bringing this fix to release/1.11.0 (after it has passed > Benchmarks on develop) > >>> > On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt > wrote: > > The PR for GEODE-6661 introduced a problem in SSL communications that > needs to be fixed. It changed SSL handshakes to use a temporary buffer > that's discarded when the handshake completes, but sometimes this buffer > contains application data that must be retained. This seems to be causing > our Benchmark SSL test failures in CI. > > I'm preparing a fix. We can either revert the PR for GEODE-6661 on > that branch or cherry-pick the correction when it's ready. > > > > >
IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs
IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging intermittently in the DistributedTest job of CI and precheckin. I filed GEODE-7411 with all the involved thread stacks that I could find: https://issues.apache.org/jira/browse/GEODE-7411 If anyone knows of any recent changes to backups, diskstore locking, or the locking of diskstores during cache close, please let Mark or I know. Thanks, Kirk
Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs
I'm working on a fix and have a PR up for another hang in the same test that I think fixes this issue. https://github.com/apache/geode/pull/4255 On Wed, Nov 6, 2019 at 10:47 AM Kirk Lund wrote: > IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging > intermittently in the DistributedTest job of CI and precheckin. > > I filed GEODE-7411 with all the involved thread stacks that I could find: > https://issues.apache.org/jira/browse/GEODE-7411 > > If anyone knows of any recent changes to backups, diskstore locking, or the > locking of diskstores during cache close, please let Mark or I know. > > Thanks, > Kirk >
Re: Lucene upgrade
Hi Mario, I think there are a few ways to accomplish what Dan was suggesting...Dan or other's, please chime in with more options/solutions. 1.) We add some product code/lucene listener to detect whether we have old versions of geode and if so, do not write to lucene on the newly updated node until all versions are up to date. 2.) We document it and provide instructions (and a way) to pause lucene indexing before someone attempts to do a rolling upgrade. I'd prefer option 1 or some other robust solution, because I think option 2 has many possible issues. -Jason On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo wrote: > Hi Dan, > > thanks for suggestions. > I didn't found a way to write lucene in older format. They only support > reading old format indexes with newer version by using lucene-backward- > codec. > > Regarding to freeze writes to the lucene index, that means that we need > to start locators and servers, create lucene index on the server, roll > it to current and then do puts. In this case tests passed. Is it ok? > > > BR, > Mario > > > On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: > > I think the issue probably has to do with doing a rolling upgrade > > from an > > old version of geode (with an old version of lucene) to the new > > version of > > geode. > > > > Geode's lucene integration works by writing the lucene index to a > > colocated > > region. So lucene index data that was generated on one server can be > > replicated or rebalanced to other servers. > > > > I think what may be happening is that data written by a geode member > > with a > > newer version is being read by a geode member with an old version. > > Because > > this is a rolling upgrade test, members with multiple versions will > > be > > running as part of the same cluster. > > > > I think to really fix this rolling upgrade issue we would need to > > somehow > > configure the new version of lucene to write data in the old format, > > at > > least until the rolling upgrade is complete. I'm not sure if that is > > possible with lucene or not - but perhaps? Another option might be to > > freeze writes to the lucene index during the rolling upgrade process. > > Lucene indexes are asynchronous, so this wouldn't necessarily require > > blocking all puts. But it would require queueing up a lot of updates. > > > > -Dan > > > > On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo > > wrote: > > > > > Hi geode dev, > > > > > > I'm working on upgrade lucene to a newer version. ( > > > https://issues.apache.org/jira/browse/GEODE-7309) > > > > > > I followed instruction from > > > > https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0 > > > Also add some other changes that is needed for lucene 8.2.0. > > > > > > I found some problems with tests: > > > * geode- > > >lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist > > > ribu > > >ted/DistributedScoringJUnitTest.java: > > > > > > > > > * > > > geode- > > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j > > > ava: > > > * > > > geode- > > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll > > > ed.java: > > > * > > > ./geode- > > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio > > > n.java: > > > * > > > ./geode- > > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart > > > itionRegion.java: > > > > > > -> failed due to > > > Caused by: org.apache.lucene.index.IndexFormatTooOldException: > > > Format > > > version is not supported (resource > > > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 > > > and > > > 9). This version of Lucene only supports indexes created with > > > release > > > 6.0 and later. > > > at > > > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav > > > a:21 > > > 3) > > > at > > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3 > > > 05) > > > at > > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2 > > > 89) > > > at > > > org.apache.lucene.index.IndexWriter.(IndexWriter.java:846) > > > at > > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis > > > hCom > > > putingRepository(IndexRepositoryFactory.java:123) > > > at > > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu > > > teIn > > > dexRepository(IndexRepositoryFactory.java:66) > > > at > > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager > > > .com > > > puteRepository(PartitionedRepositoryManager.java:151) > > > at > > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager > >
Re: Lucene upgrade
What about “versioning” the region that backs the indexes? Old servers with old license would continue to read/write to old region. New servers would start re-indexing with the new version. Given the async nature of the indexing would the mismatch in indexing for some period of time have an impact? Not an ideal solution but it’s something. In my previous life we just deleted the indexes and rebuilt them on upgrade but that was specific to our application. -Jake > On Nov 6, 2019, at 11:18 AM, Jason Huynh wrote: > > Hi Mario, > > I think there are a few ways to accomplish what Dan was suggesting...Dan or > other's, please chime in with more options/solutions. > > 1.) We add some product code/lucene listener to detect whether we have old > versions of geode and if so, do not write to lucene on the newly updated > node until all versions are up to date. > > 2.) We document it and provide instructions (and a way) to pause lucene > indexing before someone attempts to do a rolling upgrade. > > I'd prefer option 1 or some other robust solution, because I think option 2 > has many possible issues. > > > -Jason > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo wrote: >> >> Hi Dan, >> >> thanks for suggestions. >> I didn't found a way to write lucene in older format. They only support >> reading old format indexes with newer version by using lucene-backward- >> codec. >> >> Regarding to freeze writes to the lucene index, that means that we need >> to start locators and servers, create lucene index on the server, roll >> it to current and then do puts. In this case tests passed. Is it ok? >> >> >> BR, >> Mario >> >> >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: >>> I think the issue probably has to do with doing a rolling upgrade >>> from an >>> old version of geode (with an old version of lucene) to the new >>> version of >>> geode. >>> >>> Geode's lucene integration works by writing the lucene index to a >>> colocated >>> region. So lucene index data that was generated on one server can be >>> replicated or rebalanced to other servers. >>> >>> I think what may be happening is that data written by a geode member >>> with a >>> newer version is being read by a geode member with an old version. >>> Because >>> this is a rolling upgrade test, members with multiple versions will >>> be >>> running as part of the same cluster. >>> >>> I think to really fix this rolling upgrade issue we would need to >>> somehow >>> configure the new version of lucene to write data in the old format, >>> at >>> least until the rolling upgrade is complete. I'm not sure if that is >>> possible with lucene or not - but perhaps? Another option might be to >>> freeze writes to the lucene index during the rolling upgrade process. >>> Lucene indexes are asynchronous, so this wouldn't necessarily require >>> blocking all puts. But it would require queueing up a lot of updates. >>> >>> -Dan >>> >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo >>> wrote: >>> Hi geode dev, I'm working on upgrade lucene to a newer version. ( https://issues.apache.org/jira/browse/GEODE-7309) I followed instruction from >> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0 Also add some other changes that is needed for lucene 8.2.0. I found some problems with tests: * geode- lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist ribu ted/DistributedScoringJUnitTest.java: * geode- lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j ava: * geode- lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll ed.java: * ./geode- lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio n.java: * ./geode- lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart itionRegion.java: -> failed due to Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and 9). This version of Lucene only supports indexes created with release 6.0 and later. at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav a:21 3) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3 05) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2 89) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:846) at org.a
Re: Lucene upgrade
He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some challenges. One challenge is the codec changed, which caused the format of index is also changed. That's why we did not implement it. If he resolved the coding challenges, then rolling upgrade will probably need option-2 to workaround it. Regards Gester On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett wrote: > What about “versioning” the region that backs the indexes? Old servers > with old license would continue to read/write to old region. New servers > would start re-indexing with the new version. Given the async nature of the > indexing would the mismatch in indexing for some period of time have an > impact? > > Not an ideal solution but it’s something. > > In my previous life we just deleted the indexes and rebuilt them on > upgrade but that was specific to our application. > > -Jake > > > > On Nov 6, 2019, at 11:18 AM, Jason Huynh wrote: > > > > Hi Mario, > > > > I think there are a few ways to accomplish what Dan was suggesting...Dan > or > > other's, please chime in with more options/solutions. > > > > 1.) We add some product code/lucene listener to detect whether we have > old > > versions of geode and if so, do not write to lucene on the newly updated > > node until all versions are up to date. > > > > 2.) We document it and provide instructions (and a way) to pause lucene > > indexing before someone attempts to do a rolling upgrade. > > > > I'd prefer option 1 or some other robust solution, because I think > option 2 > > has many possible issues. > > > > > > -Jason > > > > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo wrote: > >> > >> Hi Dan, > >> > >> thanks for suggestions. > >> I didn't found a way to write lucene in older format. They only support > >> reading old format indexes with newer version by using lucene-backward- > >> codec. > >> > >> Regarding to freeze writes to the lucene index, that means that we need > >> to start locators and servers, create lucene index on the server, roll > >> it to current and then do puts. In this case tests passed. Is it ok? > >> > >> > >> BR, > >> Mario > >> > >> > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: > >>> I think the issue probably has to do with doing a rolling upgrade > >>> from an > >>> old version of geode (with an old version of lucene) to the new > >>> version of > >>> geode. > >>> > >>> Geode's lucene integration works by writing the lucene index to a > >>> colocated > >>> region. So lucene index data that was generated on one server can be > >>> replicated or rebalanced to other servers. > >>> > >>> I think what may be happening is that data written by a geode member > >>> with a > >>> newer version is being read by a geode member with an old version. > >>> Because > >>> this is a rolling upgrade test, members with multiple versions will > >>> be > >>> running as part of the same cluster. > >>> > >>> I think to really fix this rolling upgrade issue we would need to > >>> somehow > >>> configure the new version of lucene to write data in the old format, > >>> at > >>> least until the rolling upgrade is complete. I'm not sure if that is > >>> possible with lucene or not - but perhaps? Another option might be to > >>> freeze writes to the lucene index during the rolling upgrade process. > >>> Lucene indexes are asynchronous, so this wouldn't necessarily require > >>> blocking all puts. But it would require queueing up a lot of updates. > >>> > >>> -Dan > >>> > >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo > >>> wrote: > >>> > Hi geode dev, > > I'm working on upgrade lucene to a newer version. ( > https://issues.apache.org/jira/browse/GEODE-7309) > > I followed instruction from > > >> > https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0 > Also add some other changes that is needed for lucene 8.2.0. > > I found some problems with tests: > * geode- > lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist > ribu > ted/DistributedScoringJUnitTest.java: > > > * > geode- > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j > ava: > * > geode- > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll > ed.java: > * > ./geode- > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio > n.java: > * > ./geode- > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart > itionRegion.java: > > -> failed due to > Caused by: org.apache.lucene.index.IndexFormatTooOldException: > Format >
Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs
Thanks Jason! I reviewed your PR and added a comment/question. On Wed, Nov 6, 2019 at 10:59 AM Jason Huynh wrote: > I'm working on a fix and have a PR up for another hang in the same test > that I think fixes this issue. > > https://github.com/apache/geode/pull/4255 > > On Wed, Nov 6, 2019 at 10:47 AM Kirk Lund wrote: > > > IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging > > intermittently in the DistributedTest job of CI and precheckin. > > > > I filed GEODE-7411 with all the involved thread stacks that I could find: > > https://issues.apache.org/jira/browse/GEODE-7411 > > > > If anyone knows of any recent changes to backups, diskstore locking, or > the > > locking of diskstores during cache close, please let Mark or I know. > > > > Thanks, > > Kirk > > >
Re: Lucene upgrade
Jake, -from my understanding, the implementation details of geode-lucene is that we are using a partitioned region as a "file-system" for lucene files. As new servers are rolled, the issue is that the new servers have the new codec. As puts occur on the users data region, the async listeners are processing on new/old servers alike. If a new server writes using the new codec, it's written into the partitioned region but if an old server with the old codec needs to read that file, it will blow up because it doesn't know about the new codec. Option 1 is to not have the new servers process/write if it detects different geode systems (pre-codec changes). Option 2 is similar but requires users to pause the aeq/lucene listeners Deleting the indexes and recreating them can be quite expensive. Mostly due to tombstone creation when creating a new lucene index, but could be considered Option 3. It also would probably require https://issues.apache.org/jira/browse/GEODE-3924 to be completed. Gester - I may be wrong but I think option 1 is still doable. We just need to not write using the new codec until after all servers are upgraded. There was also some upgrade challenge with scoring from what I remember, but that's a different topic... On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou wrote: > He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some > challenges. One challenge is the codec changed, which caused the format of > index is also changed. > > That's why we did not implement it. > > If he resolved the coding challenges, then rolling upgrade will probably > need option-2 to workaround it. > > Regards > Gester > > > On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett wrote: > > > What about “versioning” the region that backs the indexes? Old servers > > with old license would continue to read/write to old region. New servers > > would start re-indexing with the new version. Given the async nature of > the > > indexing would the mismatch in indexing for some period of time have an > > impact? > > > > Not an ideal solution but it’s something. > > > > In my previous life we just deleted the indexes and rebuilt them on > > upgrade but that was specific to our application. > > > > -Jake > > > > > > > On Nov 6, 2019, at 11:18 AM, Jason Huynh wrote: > > > > > > Hi Mario, > > > > > > I think there are a few ways to accomplish what Dan was > suggesting...Dan > > or > > > other's, please chime in with more options/solutions. > > > > > > 1.) We add some product code/lucene listener to detect whether we have > > old > > > versions of geode and if so, do not write to lucene on the newly > updated > > > node until all versions are up to date. > > > > > > 2.) We document it and provide instructions (and a way) to pause > lucene > > > indexing before someone attempts to do a rolling upgrade. > > > > > > I'd prefer option 1 or some other robust solution, because I think > > option 2 > > > has many possible issues. > > > > > > > > > -Jason > > > > > > > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo > wrote: > > >> > > >> Hi Dan, > > >> > > >> thanks for suggestions. > > >> I didn't found a way to write lucene in older format. They only > support > > >> reading old format indexes with newer version by using > lucene-backward- > > >> codec. > > >> > > >> Regarding to freeze writes to the lucene index, that means that we > need > > >> to start locators and servers, create lucene index on the server, roll > > >> it to current and then do puts. In this case tests passed. Is it ok? > > >> > > >> > > >> BR, > > >> Mario > > >> > > >> > > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: > > >>> I think the issue probably has to do with doing a rolling upgrade > > >>> from an > > >>> old version of geode (with an old version of lucene) to the new > > >>> version of > > >>> geode. > > >>> > > >>> Geode's lucene integration works by writing the lucene index to a > > >>> colocated > > >>> region. So lucene index data that was generated on one server can be > > >>> replicated or rebalanced to other servers. > > >>> > > >>> I think what may be happening is that data written by a geode member > > >>> with a > > >>> newer version is being read by a geode member with an old version. > > >>> Because > > >>> this is a rolling upgrade test, members with multiple versions will > > >>> be > > >>> running as part of the same cluster. > > >>> > > >>> I think to really fix this rolling upgrade issue we would need to > > >>> somehow > > >>> configure the new version of lucene to write data in the old format, > > >>> at > > >>> least until the rolling upgrade is complete. I'm not sure if that is > > >>> possible with lucene or not - but perhaps? Another option might be to > > >>> freeze writes to the lucene index during the rolling upgrade process. > > >>> Lucene indexes are asynchronous, so this wouldn't necessarily require > > >>> blocking all puts. But it would require queueing up a lot of updates. > > >>> > > >>> -D
Re: Lucene upgrade
> > 1.) We add some product code/lucene listener to detect whether we have old > versions of geode and if so, do not write to lucene on the newly updated > node until all versions are up to date. Elaborating on this option a little more, this might be as simple as something like the below at the beginning of LuceneEventListener.process. Maybe there is a better way to cache/check whether there are old members. The danger with this approach is that the queues will grow until the upgrade is complete. But maybe that is the only way to successfully do a rolling upgrade with lucene indexes. boolean hasOldMember = cache.getMembers().stream() .map(InternalDistributedMember.class::cast) .map(InternalDistributedMember::getVersionObject) .anyMatch(version -> version.compareTo(Version.GEODE_1_11_0) <0); if(hasOldMember) { return false; } On Wed, Nov 6, 2019 at 2:16 PM Jason Huynh wrote: > Jake, -from my understanding, the implementation details of geode-lucene is > that we are using a partitioned region as a "file-system" for lucene > files. As new servers are rolled, the issue is that the new servers have > the new codec. As puts occur on the users data region, the async listeners > are processing on new/old servers alike. If a new server writes using the > new codec, it's written into the partitioned region but if an old server > with the old codec needs to read that file, it will blow up because it > doesn't know about the new codec. > Option 1 is to not have the new servers process/write if it detects > different geode systems (pre-codec changes). > Option 2 is similar but requires users to pause the aeq/lucene listeners > > Deleting the indexes and recreating them can be quite expensive. Mostly > due to tombstone creation when creating a new lucene index, but could be > considered Option 3. It also would probably require > https://issues.apache.org/jira/browse/GEODE-3924 to be completed. > > Gester - I may be wrong but I think option 1 is still doable. We just need > to not write using the new codec until after all servers are upgraded. > > There was also some upgrade challenge with scoring from what I remember, > but that's a different topic... > > > On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou wrote: > > > He tried to upgrade lucene version from current 6.6.4 to 8.2. There're > some > > challenges. One challenge is the codec changed, which caused the format > of > > index is also changed. > > > > That's why we did not implement it. > > > > If he resolved the coding challenges, then rolling upgrade will probably > > need option-2 to workaround it. > > > > Regards > > Gester > > > > > > On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett > wrote: > > > > > What about “versioning” the region that backs the indexes? Old servers > > > with old license would continue to read/write to old region. New > servers > > > would start re-indexing with the new version. Given the async nature of > > the > > > indexing would the mismatch in indexing for some period of time have an > > > impact? > > > > > > Not an ideal solution but it’s something. > > > > > > In my previous life we just deleted the indexes and rebuilt them on > > > upgrade but that was specific to our application. > > > > > > -Jake > > > > > > > > > > On Nov 6, 2019, at 11:18 AM, Jason Huynh wrote: > > > > > > > > Hi Mario, > > > > > > > > I think there are a few ways to accomplish what Dan was > > suggesting...Dan > > > or > > > > other's, please chime in with more options/solutions. > > > > > > > > 1.) We add some product code/lucene listener to detect whether we > have > > > old > > > > versions of geode and if so, do not write to lucene on the newly > > updated > > > > node until all versions are up to date. > > > > > > > > 2.) We document it and provide instructions (and a way) to pause > > lucene > > > > indexing before someone attempts to do a rolling upgrade. > > > > > > > > I'd prefer option 1 or some other robust solution, because I think > > > option 2 > > > > has many possible issues. > > > > > > > > > > > > -Jason > > > > > > > > > > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo > > wrote: > > > >> > > > >> Hi Dan, > > > >> > > > >> thanks for suggestions. > > > >> I didn't found a way to write lucene in older format. They only > > support > > > >> reading old format indexes with newer version by using > > lucene-backward- > > > >> codec. > > > >> > > > >> Regarding to freeze writes to the lucene index, that means that we > > need > > > >> to start locators and servers, create lucene index on the server, > roll > > > >> it to current and then do puts. In this case tests passed. Is it ok? > > > >> > > > >> > > > >> BR, > > > >> Mario > > > >> > > > >> > > > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote: > > > >>> I think the issue probably has to do with doing a rolling upgrade > > > >>> from an > > > >>> old version of geode (with an old version of lucene) to the new > > > >>> version of > > > >>> geode. > > >
Re: Lucene upgrade
> On Nov 6, 2019, at 2:16 PM, Jason Huynh wrote: > > Jake, -from my understanding, the implementation details of geode-lucene is > that we are using a partitioned region as a "file-system" for lucene > files. Yeah, I didn’t explain well. I mean to say literally create a new region for the new version of lucene and effectively start over. Yes this is expensive but its also functional. So new members would create region `lucene-whatever-v8` and start over there. Then when all nodes are upgraded the old `lucent-whatever` region could be deleted. Just tossing out alternatives to what’s already been posed. -Jake
Re: Lucene upgrade
Dan - LGTM check it in! ;-) (kidding of course) Jake - there is a side effect to this in that the user would have to reimport all their data into the user defined region too. Client apps would also have to know which of the regions to put into.. also, I may be misunderstanding this suggestion, completely. In either case, I'll support whoever implements the changes :-P On Wed, Nov 6, 2019 at 2:53 PM Jacob Barrett wrote: > > > > On Nov 6, 2019, at 2:16 PM, Jason Huynh wrote: > > > > Jake, -from my understanding, the implementation details of geode-lucene > is > > that we are using a partitioned region as a "file-system" for lucene > > files. > > Yeah, I didn’t explain well. I mean to say literally create a new region > for the new version of lucene and effectively start over. Yes this is > expensive but its also functional. So new members would create region > `lucene-whatever-v8` and start over there. Then when all nodes are upgraded > the old `lucent-whatever` region could be deleted. > > Just tossing out alternatives to what’s already been posed. > > -Jake > >
Re: Lucene upgrade
> On Nov 6, 2019, at 3:36 PM, Jason Huynh wrote: > > Jake - there is a side effect to this in that the user would have to > reimport all their data into the user defined region too. Client apps > would also have to know which of the regions to put into.. also, I may be > misunderstanding this suggestion, completely. In either case, I'll support > whoever implements the changes :-P Ah… there isn’t a way to re-index the existing data. Eh… just a thought. -Jake
Re: bug fix needed for release/1.11.0
+1 to cherry-picking the fix. The sha hasn't made it to benchmarks yet due to an issue with CI losing resource refs that were needed to keep it moving through the pipeline. The next commit is still about an hour away from triggering benchmarks. In my manual benchmarking of this change, I found that it resolved the issue with SSL and passed the benchmarks. Obviously we still need to confirm that it works through the main pipeline, but I feel confident that it will pass the benchmark job. Thanks, Helena Bales (they/them) On Wed, Nov 6, 2019 at 9:28 AM Mark Hanson wrote: > Any other votes? I have 2 people in favor. > > Voting will close at noon. > > Thanks, > Mark > > > On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt > wrote: > > > > The fix for this problem is in the CI pipeline today: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341 > > > > On 11/5/19 10:49 AM, Owen Nichols wrote: > >> +1 for bringing this fix to release/1.11.0 (after it has passed > Benchmarks on develop) > >> > >>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt > wrote: > >>> > >>> The PR for GEODE-6661 introduced a problem in SSL communications that > needs to be fixed. It changed SSL handshakes to use a temporary buffer > that's discarded when the handshake completes, but sometimes this buffer > contains application data that must be retained. This seems to be causing > our Benchmark SSL test failures in CI. > >>> > >>> I'm preparing a fix. We can either revert the PR for GEODE-6661 on > that branch or cherry-pick the correction when it's ready. > >>> > >
Re: Lucene upgrade
Usually re-creating region and index are expensive and customers are reluctant to do it, according to my memory. We do have an offline reindex scripts or steps (written by Barry?). If that could be an option, they can try that offline tool. I saw from Mario's email, he said: "I didn't found a way to write lucene in older format. They only support reading old format indexes with newer version by using lucene-backward- codec." That's why I think option-1 is not feasible. Option-2 will cause the queue to be filled. But usually customer will hold on, silence or reduce their business throughput when doing rolling upgrade. I wonder if it's a reasonable assumption. Overall, after compared all the 3 options, I still think option-2 is the best bet. Regards Gester On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett wrote: > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh wrote: > > > > Jake - there is a side effect to this in that the user would have to > > reimport all their data into the user defined region too. Client apps > > would also have to know which of the regions to put into.. also, I may be > > misunderstanding this suggestion, completely. In either case, I'll > support > > whoever implements the changes :-P > > Ah… there isn’t a way to re-index the existing data. Eh… just a thought. > > -Jake > >