date:20191106

Re: Lucene upgrade

2019-11-06 Thread Mario Kevo

Hi Dan,

thanks for suggestions.
I didn't found a way to write lucene in older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec.

Regarding to freeze writes to the lucene index, that means that we need
to start locators and servers, create lucene index on the server, roll
it to current and then do puts. In this case tests passed. Is it ok?


BR,
Mario


On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> I think the issue probably has to do with doing a rolling upgrade
> from an
> old version of geode (with an old version of lucene) to the new
> version of
> geode.
> 
> Geode's lucene integration works by writing the lucene index to a
> colocated
> region. So lucene index data that was generated on one server can be
> replicated or rebalanced to other servers.
> 
> I think what may be happening is that data written by a geode member
> with a
> newer version is being read by a geode member with an old version.
> Because
> this is a rolling upgrade test, members with multiple versions will
> be
> running as part of the same cluster.
> 
> I think to really fix this rolling upgrade issue we would need to
> somehow
> configure the new version of lucene to write data in the old format,
> at
> least until the rolling upgrade is complete. I'm not sure if that is
> possible with lucene or not - but perhaps? Another option might be to
> freeze writes to the lucene index during the rolling upgrade process.
> Lucene indexes are asynchronous, so this wouldn't necessarily require
> blocking all puts. But it would require queueing up a lot of updates.
> 
> -Dan
> 
> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
> wrote:
> 
> > Hi geode dev,
> > 
> > I'm working on upgrade lucene to a newer version. (
> > https://issues.apache.org/jira/browse/GEODE-7309)
> > 
> > I followed instruction from
> > 
https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > Also add some other changes that is needed for lucene 8.2.0.
> > 
> > I found some problems with tests:
> >  * geode-
> >lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > ribu
> >ted/DistributedScoringJUnitTest.java:
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > ava:
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > ed.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > n.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > itionRegion.java:
> > 
> >   -> failed due to
> > Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > Format
> > version is not supported (resource
> > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > and
> > 9). This version of Lucene only supports indexes created with
> > release
> > 6.0 and later.
> > at
> > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > a:21
> > 3)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > 05)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > 89)
> > at
> > org.apache.lucene.index.IndexWriter.(IndexWriter.java:846)
> > at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > hCom
> > putingRepository(IndexRepositoryFactory.java:123)
> > at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > teIn
> > dexRepository(IndexRepositoryFactory.java:66)
> > at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .com
> > puteRepository(PartitionedRepositoryManager.java:151)
> > at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .lam
> > bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> > ... 16 more
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > lBucketsCreated.java:
> > 
> >   -> failed with the same exception as previous tests
> > 
> > 
> > I found this on web
> > 
> > 
https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > , but not have an idea how to proceed with that.
> > 
> > Does anyone has any idea how to fix it?
> > 
> > BR,
> > Mario
> > 
> > 
> > 
> > 
> >

Re: bug fix needed for release/1.11.0

2019-11-06 Thread Bruce Schuchardt

The fix for this problem is in the CI pipeline today: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341


On 11/5/19 10:49 AM, Owen Nichols wrote:

+1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks on 
develop)


On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  wrote:

The PR for GEODE-6661 introduced a problem in SSL communications that needs to 
be fixed.  It changed SSL handshakes to use a temporary buffer that's discarded 
when the handshake completes, but sometimes this buffer contains application 
data that must be retained.  This seems to be causing our Benchmark SSL test 
failures in CI.

I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that branch 
or cherry-pick the correction when it's ready.

Re: bug fix needed for release/1.11.0

2019-11-06 Thread Mark Hanson

Any other votes? I have 2 people in favor.

Voting will close at noon.

Thanks,
Mark

> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
> 
> The fix for this problem is in the CI pipeline today: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
> 
> On 11/5/19 10:49 AM, Owen Nichols wrote:
>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>> on develop)
>> 
>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
>>> wrote:
>>> 
>>> The PR for GEODE-6661 introduced a problem in SSL communications that needs 
>>> to be fixed.  It changed SSL handshakes to use a temporary buffer that's 
>>> discarded when the handshake completes, but sometimes this buffer contains 
>>> application data that must be retained.  This seems to be causing our 
>>> Benchmark SSL test failures in CI.
>>> 
>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
>>> branch or cherry-pick the correction when it's ready.
>>>

Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Mario Ivanac

+1 for bringing this fix to release/1.11.0

Šalje: Mark Hanson 
Poslano: 6. studenog 2019. 18:28
Prima: dev@geode.apache.org 
Predmet: Re: bug fix needed for release/1.11.0

Any other votes? I have 2 people in favor.

Voting will close at noon.

Thanks,
Mark

> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
>
> The fix for this problem is in the CI pipeline today: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>
> On 11/5/19 10:49 AM, Owen Nichols wrote:
>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>> on develop)
>>
>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
>>> wrote:
>>>
>>> The PR for GEODE-6661 introduced a problem in SSL communications that needs 
>>> to be fixed.  It changed SSL handshakes to use a temporary buffer that's 
>>> discarded when the handshake completes, but sometimes this buffer contains 
>>> application data that must be retained.  This seems to be causing our 
>>> Benchmark SSL test failures in CI.
>>>
>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
>>> branch or cherry-pick the correction when it's ready.
>>>

Re: Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Mark Hanson

Thanks Mario. Your vote reminded me not all voters are in the PST time zone.. 
Pardon my thoughtlessness.. 

Voting closes at 12pm PST

> On Nov 6, 2019, at 9:33 AM, Mario Ivanac  wrote:
> 
> +1 for bringing this fix to release/1.11.0
> 
> Šalje: Mark Hanson 
> Poslano: 6. studenog 2019. 18:28
> Prima: dev@geode.apache.org 
> Predmet: Re: bug fix needed for release/1.11.0
> 
> Any other votes? I have 2 people in favor.
> 
> Voting will close at noon.
> 
> Thanks,
> Mark
> 
>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
>> 
>> The fix for this problem is in the CI pipeline today: 
>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>> 
>> On 11/5/19 10:49 AM, Owen Nichols wrote:
>>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>>> on develop)
>>> 
 On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
 wrote:

 The PR for GEODE-6661 introduced a problem in SSL communications that 
 needs to be fixed.  It changed SSL handshakes to use a temporary buffer 
 that's discarded when the handshake completes, but sometimes this buffer 
 contains application data that must be retained.  This seems to be causing 
 our Benchmark SSL test failures in CI.

 I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
 branch or cherry-pick the correction when it's ready.

>

Re: Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Owen Nichols

Also perhaps some people are waiting to see how the fix actually fares in the 
develop pipeline, which we won’t know until around 4 or 5PM today...

> On Nov 6, 2019, at 9:44 AM, Mark Hanson  wrote:
> 
> Thanks Mario. Your vote reminded me not all voters are in the PST time zone.. 
> Pardon my thoughtlessness.. 
> 
> Voting closes at 12pm PST
> 
> 
>> On Nov 6, 2019, at 9:33 AM, Mario Ivanac  wrote:
>> 
>> +1 for bringing this fix to release/1.11.0
>> 
>> Šalje: Mark Hanson 
>> Poslano: 6. studenog 2019. 18:28
>> Prima: dev@geode.apache.org 
>> Predmet: Re: bug fix needed for release/1.11.0
>> 
>> Any other votes? I have 2 people in favor.
>> 
>> Voting will close at noon.
>> 
>> Thanks,
>> Mark
>> 
>>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
>>> 
>>> The fix for this problem is in the CI pipeline today: 
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>>> 
>>> On 11/5/19 10:49 AM, Owen Nichols wrote:
 +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
 on develop)
 
> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
> wrote:
> 
> The PR for GEODE-6661 introduced a problem in SSL communications that 
> needs to be fixed.  It changed SSL handshakes to use a temporary buffer 
> that's discarded when the handshake completes, but sometimes this buffer 
> contains application data that must be retained.  This seems to be 
> causing our Benchmark SSL test failures in CI.
> 
> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
> branch or cherry-pick the correction when it's ready.
> 
>> 
>

Re: Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Ernest Burghardt

we could just use GMT (Geode Mean Time)

On Wed, Nov 6, 2019 at 9:45 AM Mark Hanson  wrote:

> Thanks Mario. Your vote reminded me not all voters are in the PST time
> zone.. Pardon my thoughtlessness..
>
> Voting closes at 12pm PST
>
>
> > On Nov 6, 2019, at 9:33 AM, Mario Ivanac  wrote:
> >
> > +1 for bringing this fix to release/1.11.0
> > 
> > Šalje: Mark Hanson 
> > Poslano: 6. studenog 2019. 18:28
> > Prima: dev@geode.apache.org 
> > Predmet: Re: bug fix needed for release/1.11.0
> >
> > Any other votes? I have 2 people in favor.
> >
> > Voting will close at noon.
> >
> > Thanks,
> > Mark
> >
> >> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt 
> wrote:
> >>
> >> The fix for this problem is in the CI pipeline today:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
> >>
> >> On 11/5/19 10:49 AM, Owen Nichols wrote:
> >>> +1 for bringing this fix to release/1.11.0 (after it has passed
> Benchmarks on develop)
> >>>
>  On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt 
> wrote:
> 
>  The PR for GEODE-6661 introduced a problem in SSL communications that
> needs to be fixed.  It changed SSL handshakes to use a temporary buffer
> that's discarded when the handshake completes, but sometimes this buffer
> contains application data that must be retained.  This seems to be causing
> our Benchmark SSL test failures in CI.
> 
>  I'm preparing a fix.  We can either revert the PR for GEODE-6661 on
> that branch or cherry-pick the correction when it's ready.
> 
> >
>
>

IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

2019-11-06 Thread Kirk Lund

IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging
intermittently in the DistributedTest job of CI and precheckin.

I filed GEODE-7411 with all the involved thread stacks that I could find:
https://issues.apache.org/jira/browse/GEODE-7411

If anyone knows of any recent changes to backups, diskstore locking, or the
locking of diskstores during cache close, please let Mark or I know.

Thanks,
Kirk

Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

2019-11-06 Thread Jason Huynh

I'm working on a fix and have a PR up for another hang in the same test
that I think fixes this issue.

https://github.com/apache/geode/pull/4255

On Wed, Nov 6, 2019 at 10:47 AM Kirk Lund  wrote:

> IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging
> intermittently in the DistributedTest job of CI and precheckin.
>
> I filed GEODE-7411 with all the involved thread stacks that I could find:
> https://issues.apache.org/jira/browse/GEODE-7411
>
> If anyone knows of any recent changes to backups, diskstore locking, or the
> locking of diskstores during cache close, please let Mark or I know.
>
> Thanks,
> Kirk
>

Re: Lucene upgrade

2019-11-06 Thread Jason Huynh

Hi Mario,

I think there are a few ways to accomplish what Dan was suggesting...Dan or
other's, please chime in with more options/solutions.

1.) We add some product code/lucene listener to detect whether we have old
versions of geode and if so, do not write to lucene on the newly updated
node until all versions are up to date.

2.)  We document it and provide instructions (and a way) to pause lucene
indexing before someone attempts to do a rolling upgrade.

I'd prefer option 1 or some other robust solution, because I think option 2
has many possible issues.


-Jason


On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo  wrote:

> Hi Dan,
>
> thanks for suggestions.
> I didn't found a way to write lucene in older format. They only support
> reading old format indexes with newer version by using lucene-backward-
> codec.
>
> Regarding to freeze writes to the lucene index, that means that we need
> to start locators and servers, create lucene index on the server, roll
> it to current and then do puts. In this case tests passed. Is it ok?
>
>
> BR,
> Mario
>
>
> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > I think the issue probably has to do with doing a rolling upgrade
> > from an
> > old version of geode (with an old version of lucene) to the new
> > version of
> > geode.
> >
> > Geode's lucene integration works by writing the lucene index to a
> > colocated
> > region. So lucene index data that was generated on one server can be
> > replicated or rebalanced to other servers.
> >
> > I think what may be happening is that data written by a geode member
> > with a
> > newer version is being read by a geode member with an old version.
> > Because
> > this is a rolling upgrade test, members with multiple versions will
> > be
> > running as part of the same cluster.
> >
> > I think to really fix this rolling upgrade issue we would need to
> > somehow
> > configure the new version of lucene to write data in the old format,
> > at
> > least until the rolling upgrade is complete. I'm not sure if that is
> > possible with lucene or not - but perhaps? Another option might be to
> > freeze writes to the lucene index during the rolling upgrade process.
> > Lucene indexes are asynchronous, so this wouldn't necessarily require
> > blocking all puts. But it would require queueing up a lot of updates.
> >
> > -Dan
> >
> > On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
> > wrote:
> >
> > > Hi geode dev,
> > >
> > > I'm working on upgrade lucene to a newer version. (
> > > https://issues.apache.org/jira/browse/GEODE-7309)
> > >
> > > I followed instruction from
> > >
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > > Also add some other changes that is needed for lucene 8.2.0.
> > >
> > > I found some problems with tests:
> > >  * geode-
> > >lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > > ribu
> > >ted/DistributedScoringJUnitTest.java:
> > >
> > >
> > >  *
> > > geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > > ava:
> > >  *
> > > geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > > ed.java:
> > >  *
> > > ./geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > > n.java:
> > >  *
> > > ./geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > > itionRegion.java:
> > >
> > >   -> failed due to
> > > Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > > Format
> > > version is not supported (resource
> > > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > > and
> > > 9). This version of Lucene only supports indexes created with
> > > release
> > > 6.0 and later.
> > > at
> > > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > > a:21
> > > 3)
> > > at
> > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > > 05)
> > > at
> > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > > 89)
> > > at
> > > org.apache.lucene.index.IndexWriter.(IndexWriter.java:846)
> > > at
> > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > > hCom
> > > putingRepository(IndexRepositoryFactory.java:123)
> > > at
> > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > > teIn
> > > dexRepository(IndexRepositoryFactory.java:66)
> > > at
> > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > > .com
> > > puteRepository(PartitionedRepositoryManager.java:151)
> > > at
> > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> >

Re: Lucene upgrade

2019-11-06 Thread Jacob Barrett

What about “versioning” the region that backs the indexes? Old servers with old 
license would continue to read/write to old region. New servers would start 
re-indexing with the new version. Given the async nature of the indexing would 
the mismatch in indexing for some period of time have an impact?

Not an ideal solution but it’s something. 

In my previous life we just deleted the indexes and rebuilt them on upgrade but 
that was specific to our application.

-Jake


> On Nov 6, 2019, at 11:18 AM, Jason Huynh  wrote:
> 
> Hi Mario,
> 
> I think there are a few ways to accomplish what Dan was suggesting...Dan or
> other's, please chime in with more options/solutions.
> 
> 1.) We add some product code/lucene listener to detect whether we have old
> versions of geode and if so, do not write to lucene on the newly updated
> node until all versions are up to date.
> 
> 2.)  We document it and provide instructions (and a way) to pause lucene
> indexing before someone attempts to do a rolling upgrade.
> 
> I'd prefer option 1 or some other robust solution, because I think option 2
> has many possible issues.
> 
> 
> -Jason
> 
> 
>> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo  wrote:
>> 
>> Hi Dan,
>> 
>> thanks for suggestions.
>> I didn't found a way to write lucene in older format. They only support
>> reading old format indexes with newer version by using lucene-backward-
>> codec.
>> 
>> Regarding to freeze writes to the lucene index, that means that we need
>> to start locators and servers, create lucene index on the server, roll
>> it to current and then do puts. In this case tests passed. Is it ok?
>> 
>> 
>> BR,
>> Mario
>> 
>> 
>>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
>>> I think the issue probably has to do with doing a rolling upgrade
>>> from an
>>> old version of geode (with an old version of lucene) to the new
>>> version of
>>> geode.
>>> 
>>> Geode's lucene integration works by writing the lucene index to a
>>> colocated
>>> region. So lucene index data that was generated on one server can be
>>> replicated or rebalanced to other servers.
>>> 
>>> I think what may be happening is that data written by a geode member
>>> with a
>>> newer version is being read by a geode member with an old version.
>>> Because
>>> this is a rolling upgrade test, members with multiple versions will
>>> be
>>> running as part of the same cluster.
>>> 
>>> I think to really fix this rolling upgrade issue we would need to
>>> somehow
>>> configure the new version of lucene to write data in the old format,
>>> at
>>> least until the rolling upgrade is complete. I'm not sure if that is
>>> possible with lucene or not - but perhaps? Another option might be to
>>> freeze writes to the lucene index during the rolling upgrade process.
>>> Lucene indexes are asynchronous, so this wouldn't necessarily require
>>> blocking all puts. But it would require queueing up a lot of updates.
>>> 
>>> -Dan
>>> 
>>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
>>> wrote:
>>> 
 Hi geode dev,
 
 I'm working on upgrade lucene to a newer version. (
 https://issues.apache.org/jira/browse/GEODE-7309)
 
 I followed instruction from
 
>> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
 Also add some other changes that is needed for lucene 8.2.0.
 
 I found some problems with tests:
 * geode-
   lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
 ribu
   ted/DistributedScoringJUnitTest.java:
 
 
 *
 geode-
 lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
 gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
 ava:
 *
 geode-
 lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
 gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
 ed.java:
 *
 ./geode-
 lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
 gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
 n.java:
 *
 ./geode-
 lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
 gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
 itionRegion.java:
 
  -> failed due to
 Caused by: org.apache.lucene.index.IndexFormatTooOldException:
 Format
 version is not supported (resource
 BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
 and
 9). This version of Lucene only supports indexes created with
 release
 6.0 and later.
at
 org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
 a:21
 3)
at
 org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
 05)
at
 org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
 89)
at
 org.apache.lucene.index.IndexWriter.(IndexWriter.java:846)
at
 org.a

Re: Lucene upgrade

2019-11-06 Thread Xiaojian Zhou

He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some
challenges. One challenge is the codec changed, which caused the format of
index is also changed.

That's why we did not implement it.

If he resolved the coding challenges, then rolling upgrade will probably
need option-2 to workaround it.

Regards
Gester


On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett  wrote:

> What about “versioning” the region that backs the indexes? Old servers
> with old license would continue to read/write to old region. New servers
> would start re-indexing with the new version. Given the async nature of the
> indexing would the mismatch in indexing for some period of time have an
> impact?
>
> Not an ideal solution but it’s something.
>
> In my previous life we just deleted the indexes and rebuilt them on
> upgrade but that was specific to our application.
>
> -Jake
>
>
> > On Nov 6, 2019, at 11:18 AM, Jason Huynh  wrote:
> >
> > Hi Mario,
> >
> > I think there are a few ways to accomplish what Dan was suggesting...Dan
> or
> > other's, please chime in with more options/solutions.
> >
> > 1.) We add some product code/lucene listener to detect whether we have
> old
> > versions of geode and if so, do not write to lucene on the newly updated
> > node until all versions are up to date.
> >
> > 2.)  We document it and provide instructions (and a way) to pause lucene
> > indexing before someone attempts to do a rolling upgrade.
> >
> > I'd prefer option 1 or some other robust solution, because I think
> option 2
> > has many possible issues.
> >
> >
> > -Jason
> >
> >
> >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo  wrote:
> >>
> >> Hi Dan,
> >>
> >> thanks for suggestions.
> >> I didn't found a way to write lucene in older format. They only support
> >> reading old format indexes with newer version by using lucene-backward-
> >> codec.
> >>
> >> Regarding to freeze writes to the lucene index, that means that we need
> >> to start locators and servers, create lucene index on the server, roll
> >> it to current and then do puts. In this case tests passed. Is it ok?
> >>
> >>
> >> BR,
> >> Mario
> >>
> >>
> >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> >>> I think the issue probably has to do with doing a rolling upgrade
> >>> from an
> >>> old version of geode (with an old version of lucene) to the new
> >>> version of
> >>> geode.
> >>>
> >>> Geode's lucene integration works by writing the lucene index to a
> >>> colocated
> >>> region. So lucene index data that was generated on one server can be
> >>> replicated or rebalanced to other servers.
> >>>
> >>> I think what may be happening is that data written by a geode member
> >>> with a
> >>> newer version is being read by a geode member with an old version.
> >>> Because
> >>> this is a rolling upgrade test, members with multiple versions will
> >>> be
> >>> running as part of the same cluster.
> >>>
> >>> I think to really fix this rolling upgrade issue we would need to
> >>> somehow
> >>> configure the new version of lucene to write data in the old format,
> >>> at
> >>> least until the rolling upgrade is complete. I'm not sure if that is
> >>> possible with lucene or not - but perhaps? Another option might be to
> >>> freeze writes to the lucene index during the rolling upgrade process.
> >>> Lucene indexes are asynchronous, so this wouldn't necessarily require
> >>> blocking all puts. But it would require queueing up a lot of updates.
> >>>
> >>> -Dan
> >>>
> >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo 
> >>> wrote:
> >>>
>  Hi geode dev,
> 
>  I'm working on upgrade lucene to a newer version. (
>  https://issues.apache.org/jira/browse/GEODE-7309)
> 
>  I followed instruction from
> 
> >>
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
>  Also add some other changes that is needed for lucene 8.2.0.
> 
>  I found some problems with tests:
>  * geode-
>    lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
>  ribu
>    ted/DistributedScoringJUnitTest.java:
> 
> 
>  *
>  geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
>  ava:
>  *
>  geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
>  ed.java:
>  *
>  ./geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
>  n.java:
>  *
>  ./geode-
>  lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>  gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
>  itionRegion.java:
> 
>   -> failed due to
>  Caused by: org.apache.lucene.index.IndexFormatTooOldException:
>  Format
>

Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

2019-11-06 Thread Kirk Lund

Thanks Jason! I reviewed your PR and added a comment/question.

On Wed, Nov 6, 2019 at 10:59 AM Jason Huynh  wrote:

> I'm working on a fix and have a PR up for another hang in the same test
> that I think fixes this issue.
>
> https://github.com/apache/geode/pull/4255
>
> On Wed, Nov 6, 2019 at 10:47 AM Kirk Lund  wrote:
>
> > IncrementalBackupDistributedTest.testMissingMemberInBaseline is hanging
> > intermittently in the DistributedTest job of CI and precheckin.
> >
> > I filed GEODE-7411 with all the involved thread stacks that I could find:
> > https://issues.apache.org/jira/browse/GEODE-7411
> >
> > If anyone knows of any recent changes to backups, diskstore locking, or
> the
> > locking of diskstores during cache close, please let Mark or I know.
> >
> > Thanks,
> > Kirk
> >
>

Re: Lucene upgrade

2019-11-06 Thread Jason Huynh

Jake, -from my understanding, the implementation details of geode-lucene is
that we are using a partitioned region as a "file-system" for lucene
files.  As new servers are rolled, the issue is that the new servers have
the new codec.  As puts occur on the users data region, the async listeners
are processing on new/old servers alike.  If a new server writes using the
new codec, it's written into the partitioned region but if an old server
with the old codec needs to read that file, it will blow up because it
doesn't know about the new codec.
Option 1 is to not have the new servers process/write if it detects
different geode systems (pre-codec changes).
Option 2 is similar but requires users to pause the aeq/lucene listeners

Deleting the indexes and recreating them can be quite expensive.  Mostly
due to tombstone creation when creating a new lucene index, but could be
considered Option 3.  It also would probably require
https://issues.apache.org/jira/browse/GEODE-3924 to be completed.

Gester - I may be wrong but I think option 1 is still doable.  We just need
to not write using the new codec until after all servers are upgraded.

There was also some upgrade challenge with scoring from what I remember,
but that's a different topic...


On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou  wrote:

> He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some
> challenges. One challenge is the codec changed, which caused the format of
> index is also changed.
>
> That's why we did not implement it.
>
> If he resolved the coding challenges, then rolling upgrade will probably
> need option-2 to workaround it.
>
> Regards
> Gester
>
>
> On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett  wrote:
>
> > What about “versioning” the region that backs the indexes? Old servers
> > with old license would continue to read/write to old region. New servers
> > would start re-indexing with the new version. Given the async nature of
> the
> > indexing would the mismatch in indexing for some period of time have an
> > impact?
> >
> > Not an ideal solution but it’s something.
> >
> > In my previous life we just deleted the indexes and rebuilt them on
> > upgrade but that was specific to our application.
> >
> > -Jake
> >
> >
> > > On Nov 6, 2019, at 11:18 AM, Jason Huynh  wrote:
> > >
> > > Hi Mario,
> > >
> > > I think there are a few ways to accomplish what Dan was
> suggesting...Dan
> > or
> > > other's, please chime in with more options/solutions.
> > >
> > > 1.) We add some product code/lucene listener to detect whether we have
> > old
> > > versions of geode and if so, do not write to lucene on the newly
> updated
> > > node until all versions are up to date.
> > >
> > > 2.)  We document it and provide instructions (and a way) to pause
> lucene
> > > indexing before someone attempts to do a rolling upgrade.
> > >
> > > I'd prefer option 1 or some other robust solution, because I think
> > option 2
> > > has many possible issues.
> > >
> > >
> > > -Jason
> > >
> > >
> > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo 
> wrote:
> > >>
> > >> Hi Dan,
> > >>
> > >> thanks for suggestions.
> > >> I didn't found a way to write lucene in older format. They only
> support
> > >> reading old format indexes with newer version by using
> lucene-backward-
> > >> codec.
> > >>
> > >> Regarding to freeze writes to the lucene index, that means that we
> need
> > >> to start locators and servers, create lucene index on the server, roll
> > >> it to current and then do puts. In this case tests passed. Is it ok?
> > >>
> > >>
> > >> BR,
> > >> Mario
> > >>
> > >>
> > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > >>> I think the issue probably has to do with doing a rolling upgrade
> > >>> from an
> > >>> old version of geode (with an old version of lucene) to the new
> > >>> version of
> > >>> geode.
> > >>>
> > >>> Geode's lucene integration works by writing the lucene index to a
> > >>> colocated
> > >>> region. So lucene index data that was generated on one server can be
> > >>> replicated or rebalanced to other servers.
> > >>>
> > >>> I think what may be happening is that data written by a geode member
> > >>> with a
> > >>> newer version is being read by a geode member with an old version.
> > >>> Because
> > >>> this is a rolling upgrade test, members with multiple versions will
> > >>> be
> > >>> running as part of the same cluster.
> > >>>
> > >>> I think to really fix this rolling upgrade issue we would need to
> > >>> somehow
> > >>> configure the new version of lucene to write data in the old format,
> > >>> at
> > >>> least until the rolling upgrade is complete. I'm not sure if that is
> > >>> possible with lucene or not - but perhaps? Another option might be to
> > >>> freeze writes to the lucene index during the rolling upgrade process.
> > >>> Lucene indexes are asynchronous, so this wouldn't necessarily require
> > >>> blocking all puts. But it would require queueing up a lot of updates.
> > >>>
> > >>> -D

Re: Lucene upgrade

2019-11-06 Thread Dan Smith

>
> 1.) We add some product code/lucene listener to detect whether we have old
> versions of geode and if so, do not write to lucene on the newly updated
> node until all versions are up to date.


Elaborating on this option a little more, this might be as simple as
something like the below at the beginning of LuceneEventListener.process.
Maybe there is a better way to cache/check whether there are old members.

The danger with this approach is that the queues will grow until the
upgrade is complete. But maybe that is the only way to successfully do a
rolling upgrade with lucene indexes.

boolean hasOldMember = cache.getMembers().stream()
.map(InternalDistributedMember.class::cast)
.map(InternalDistributedMember::getVersionObject)
.anyMatch(version -> version.compareTo(Version.GEODE_1_11_0) <0);

if(hasOldMember) {
  return false;
}


On Wed, Nov 6, 2019 at 2:16 PM Jason Huynh  wrote:

> Jake, -from my understanding, the implementation details of geode-lucene is
> that we are using a partitioned region as a "file-system" for lucene
> files.  As new servers are rolled, the issue is that the new servers have
> the new codec.  As puts occur on the users data region, the async listeners
> are processing on new/old servers alike.  If a new server writes using the
> new codec, it's written into the partitioned region but if an old server
> with the old codec needs to read that file, it will blow up because it
> doesn't know about the new codec.
> Option 1 is to not have the new servers process/write if it detects
> different geode systems (pre-codec changes).
> Option 2 is similar but requires users to pause the aeq/lucene listeners
>
> Deleting the indexes and recreating them can be quite expensive.  Mostly
> due to tombstone creation when creating a new lucene index, but could be
> considered Option 3.  It also would probably require
> https://issues.apache.org/jira/browse/GEODE-3924 to be completed.
>
> Gester - I may be wrong but I think option 1 is still doable.  We just need
> to not write using the new codec until after all servers are upgraded.
>
> There was also some upgrade challenge with scoring from what I remember,
> but that's a different topic...
>
>
> On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou  wrote:
>
> > He tried to upgrade lucene version from current 6.6.4 to 8.2. There're
> some
> > challenges. One challenge is the codec changed, which caused the format
> of
> > index is also changed.
> >
> > That's why we did not implement it.
> >
> > If he resolved the coding challenges, then rolling upgrade will probably
> > need option-2 to workaround it.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett 
> wrote:
> >
> > > What about “versioning” the region that backs the indexes? Old servers
> > > with old license would continue to read/write to old region. New
> servers
> > > would start re-indexing with the new version. Given the async nature of
> > the
> > > indexing would the mismatch in indexing for some period of time have an
> > > impact?
> > >
> > > Not an ideal solution but it’s something.
> > >
> > > In my previous life we just deleted the indexes and rebuilt them on
> > > upgrade but that was specific to our application.
> > >
> > > -Jake
> > >
> > >
> > > > On Nov 6, 2019, at 11:18 AM, Jason Huynh  wrote:
> > > >
> > > > Hi Mario,
> > > >
> > > > I think there are a few ways to accomplish what Dan was
> > suggesting...Dan
> > > or
> > > > other's, please chime in with more options/solutions.
> > > >
> > > > 1.) We add some product code/lucene listener to detect whether we
> have
> > > old
> > > > versions of geode and if so, do not write to lucene on the newly
> > updated
> > > > node until all versions are up to date.
> > > >
> > > > 2.)  We document it and provide instructions (and a way) to pause
> > lucene
> > > > indexing before someone attempts to do a rolling upgrade.
> > > >
> > > > I'd prefer option 1 or some other robust solution, because I think
> > > option 2
> > > > has many possible issues.
> > > >
> > > >
> > > > -Jason
> > > >
> > > >
> > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo 
> > wrote:
> > > >>
> > > >> Hi Dan,
> > > >>
> > > >> thanks for suggestions.
> > > >> I didn't found a way to write lucene in older format. They only
> > support
> > > >> reading old format indexes with newer version by using
> > lucene-backward-
> > > >> codec.
> > > >>
> > > >> Regarding to freeze writes to the lucene index, that means that we
> > need
> > > >> to start locators and servers, create lucene index on the server,
> roll
> > > >> it to current and then do puts. In this case tests passed. Is it ok?
> > > >>
> > > >>
> > > >> BR,
> > > >> Mario
> > > >>
> > > >>
> > > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > > >>> I think the issue probably has to do with doing a rolling upgrade
> > > >>> from an
> > > >>> old version of geode (with an old version of lucene) to the new
> > > >>> version of
> > > >>> geode.
> > >

Re: Lucene upgrade

2019-11-06 Thread Jacob Barrett

> On Nov 6, 2019, at 2:16 PM, Jason Huynh  wrote:
> 
> Jake, -from my understanding, the implementation details of geode-lucene is
> that we are using a partitioned region as a "file-system" for lucene
> files. 

Yeah, I didn’t explain well. I mean to say literally create a new region for 
the new version of lucene and effectively start over. Yes this is expensive but 
its also functional. So new members would create region `lucene-whatever-v8` 
and start over there. Then when all nodes are upgraded the old 
`lucent-whatever` region could be deleted.

Just tossing out alternatives to what’s already been posed.

-Jake

Re: Lucene upgrade

2019-11-06 Thread Jason Huynh

Dan - LGTM check it in! ;-) (kidding of course)

Jake - there is a side effect to this in that the user would have to
reimport all their data into the user defined region too.  Client apps
would also have to know which of the regions to put into.. also, I may be
misunderstanding this suggestion, completely.  In either case, I'll support
whoever implements the changes :-P

On Wed, Nov 6, 2019 at 2:53 PM Jacob Barrett  wrote:

>
>
> > On Nov 6, 2019, at 2:16 PM, Jason Huynh  wrote:
> >
> > Jake, -from my understanding, the implementation details of geode-lucene
> is
> > that we are using a partitioned region as a "file-system" for lucene
> > files.
>
> Yeah, I didn’t explain well. I mean to say literally create a new region
> for the new version of lucene and effectively start over. Yes this is
> expensive but its also functional. So new members would create region
> `lucene-whatever-v8` and start over there. Then when all nodes are upgraded
> the old `lucent-whatever` region could be deleted.
>
> Just tossing out alternatives to what’s already been posed.
>
> -Jake
>
>

Re: Lucene upgrade

2019-11-06 Thread Jacob Barrett




> On Nov 6, 2019, at 3:36 PM, Jason Huynh  wrote:
> 
> Jake - there is a side effect to this in that the user would have to
> reimport all their data into the user defined region too.  Client apps
> would also have to know which of the regions to put into.. also, I may be
> misunderstanding this suggestion, completely.  In either case, I'll support
> whoever implements the changes :-P

Ah… there isn’t a way to re-index the existing data. Eh… just a thought.

-Jake

Re: bug fix needed for release/1.11.0

2019-11-06 Thread Helena Bales

+1 to cherry-picking the fix.

The sha hasn't made it to benchmarks yet due to an issue with CI losing
resource refs that were needed to keep it moving through the pipeline. The
next commit is still about an hour away from triggering benchmarks.
In my manual benchmarking of this change, I found that it resolved the
issue with SSL and passed the benchmarks. Obviously we still need to
confirm that it works through the main pipeline, but I feel confident that
it will pass the benchmark job.

Thanks,
Helena Bales (they/them)

On Wed, Nov 6, 2019 at 9:28 AM Mark Hanson  wrote:

> Any other votes? I have 2 people in favor.
>
> Voting will close at noon.
>
> Thanks,
> Mark
>
> > On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt 
> wrote:
> >
> > The fix for this problem is in the CI pipeline today:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
> >
> > On 11/5/19 10:49 AM, Owen Nichols wrote:
> >> +1 for bringing this fix to release/1.11.0 (after it has passed
> Benchmarks on develop)
> >>
> >>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt 
> wrote:
> >>>
> >>> The PR for GEODE-6661 introduced a problem in SSL communications that
> needs to be fixed.  It changed SSL handshakes to use a temporary buffer
> that's discarded when the handshake completes, but sometimes this buffer
> contains application data that must be retained.  This seems to be causing
> our Benchmark SSL test failures in CI.
> >>>
> >>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on
> that branch or cherry-pick the correction when it's ready.
> >>>
>
>

Re: Lucene upgrade

2019-11-06 Thread Xiaojian Zhou

Usually re-creating region and index are expensive and customers are
reluctant to do it, according to my memory.

We do have an offline reindex scripts or steps (written by Barry?). If that
could be an option, they can try that offline tool.

I saw from Mario's email, he said: "I didn't found a way to write lucene in
older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec."

That's why I think option-1 is not feasible.

Option-2 will cause the queue to be filled. But usually customer will hold
on, silence or reduce their business throughput when
doing rolling upgrade. I wonder if it's a reasonable assumption.

Overall, after compared all the 3 options, I still think option-2 is the
best bet.

Regards
Gester

On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett  wrote:

>
>
> > On Nov 6, 2019, at 3:36 PM, Jason Huynh  wrote:
> >
> > Jake - there is a side effect to this in that the user would have to
> > reimport all their data into the user defined region too.  Client apps
> > would also have to know which of the regions to put into.. also, I may be
> > misunderstanding this suggestion, completely.  In either case, I'll
> support
> > whoever implements the changes :-P
>
> Ah… there isn’t a way to re-index the existing data. Eh… just a thought.
>
> -Jake
>
>

Re: Lucene upgrade

Re: bug fix needed for release/1.11.0

Re: bug fix needed for release/1.11.0

Odg: bug fix needed for release/1.11.0

Re: Odg: bug fix needed for release/1.11.0

Re: Odg: bug fix needed for release/1.11.0

Re: Odg: bug fix needed for release/1.11.0

IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

Re: Lucene upgrade

Re: Lucene upgrade

Re: Lucene upgrade

Re: IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs

Re: Lucene upgrade

Re: Lucene upgrade

Re: Lucene upgrade

Re: Lucene upgrade

Re: Lucene upgrade

Re: bug fix needed for release/1.11.0

Re: Lucene upgrade

20 matches

Site Navigation

Mail list logo

Footer information