Hi Mario,

I made a PR against your branch for some of the changes I had to do to get
past the Index too new exception.  Summary - repo creation, even if no
writes occur, appear to create some meta data that the old node attempts to
read and blow up on.

The pr against your branch just prevents the repo from being constructed
until all old members are upgraded.
This requires test changes to not try to validate using queries (since we
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because
we kind of intended for that with the oldMember check.  In-between the
server rolls, the test was trying to verify, but because not all servers
had upgraded, the LuceneEventListener wasn't allowing the queue to drain on
the new member.

I am not sure if the changes I added are acceptable or not -maybe if this
ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <mario.k...@est.tech> wrote:

> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not 
> supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be 
> between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 
> 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, 
> expectedRegionSize, 5,    15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 
> entries). The problem is while executing verifyLuceneQueryResults, for 
> VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for 
> GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for 
> GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal 
> processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully 
> dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jhu...@pivotal.io>
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode <dev@geode.apache.org>
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <mario.k...@est.tech> wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > ________________________________
> > Šalje: Xiaojian Zhou <gz...@pivotal.io>
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode <dev@geode.apache.org>
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jhu...@pivotal.io> wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read the
> > > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> > >
> > > I think Option 2 is going to leave a lot of people unhappy when they
> get
> > > stuck with what Mario is experiencing right now and all we can say is
> > "you
> > > should have read the doc". Not to say Option 2 isn't valid and it's
> > > definitely the least amount of work to do, I still vote option 1.
> > >
> > > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> > >
> > > > Usually re-creating region and index are expensive and customers are
> > > > reluctant to do it, according to my memory.
> > > >
> > > > We do have an offline reindex scripts or steps (written by Barry?).
> If
> > > that
> > > > could be an option, they can try that offline tool.
> > > >
> > > > I saw from Mario's email, he said: "I didn't found a way to write
> > lucene
> > > in
> > > > older format. They only support
> > > > reading old format indexes with newer version by using
> lucene-backward-
> > > > codec."
> > > >
> > > > That's why I think option-1 is not feasible.
> > > >
> > > > Option-2 will cause the queue to be filled. But usually customer will
> > > hold
> > > > on, silence or reduce their business throughput when
> > > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > > >
> > > > Overall, after compared all the 3 options, I still think option-2 is
> > the
> > > > best bet.
> > > >
> > > > Regards
> > > > Gester
> > > >
> > > >
> > > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jbarr...@pivotal.io>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jhu...@pivotal.io>
> wrote:
> > > > > >
> > > > > > Jake - there is a side effect to this in that the user would have
> > to
> > > > > > reimport all their data into the user defined region too.  Client
> > > apps
> > > > > > would also have to know which of the regions to put into.. also,
> I
> > > may
> > > > be
> > > > > > misunderstanding this suggestion, completely.  In either case,
> I'll
> > > > > support
> > > > > > whoever implements the changes :-P
> > > > >
> > > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > > thought.
> > > > >
> > > > > -Jake
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to