from:"Mark Hanson"

[Proposal] Adding Micrometer to Apache Geode

2019-01-15 Thread Mark Hanson

Hello All,

I would like to propose that we incorporate Micrometer into Geode to allow us 
to collect statistics and make them more easily available to external services 
should someone chose to implement support for that.

In some basic testing, it does not appear to have a significant impact on 
performance to hook this in. I think that in the long run it will really 
improve the visibility of the systems stats in real-time… This will also allow 
some extensibility for people who want to hook into their tracking 
infrastructure, such as New Relic or Data Dog, though we are not putting that 
in.


What do people think?

Thanks,
Mark

Re: [Proposal] Adding Micrometer to Apache Geode

2019-01-15 Thread Mark Hanson

Yes, deprecating stats and VSD would be the direction we are heading…


> On Jan 15, 2019, at 9:43 AM, Jacob Barrett  wrote:
> 
> I am good with this proposal as long as it includes the deprecation of all 
> the current stats APIs and VSD such that Micrometer is the only go forward 
> stats definition and collection API in 2.0.
> 
>> On Jan 15, 2019, at 9:37 AM, Mark Hanson  wrote:
>> 
>> Hello All,
>> 
>> I would like to propose that we incorporate Micrometer into Geode to allow 
>> us to collect statistics and make them more easily available to external 
>> services should someone chose to implement support for that.
>> 
>> In some basic testing, it does not appear to have a significant impact on 
>> performance to hook this in. I think that in the long run it will really 
>> improve the visibility of the systems stats in real-time… This will also 
>> allow some extensibility for people who want to hook into their tracking 
>> infrastructure, such as New Relic or Data Dog, though we are not putting 
>> that in.
>> 
>> 
>> What do people think?
>> 
>> Thanks,
>> Mark
>> 
>> 
>

Re: [Proposal] Adding Micrometer to Apache Geode

2019-01-15 Thread Mark Hanson

Yes, we will be working towards tagged dimensions and JMX parity as well.



> On Jan 15, 2019, at 10:06 AM, Udo Kohlmeyer  wrote:
> 
> I agree with Jacob here. Great to see such great strides forward
> 
> +1 deprecating old Stats APIs
> 
> It would be good to see the new Micrometer stats have a logical grouping, 
> that makes it easier for users to search for metrics.
> 
> Does this mean that Geode will now support fully tagged/dimension metrics? 
> What about JMX?
> 
> --Udo
> 
> On 1/15/19 09:43, Jacob Barrett wrote:
>> I am good with this proposal as long as it includes the deprecation of all 
>> the current stats APIs and VSD such that Micrometer is the only go forward 
>> stats definition and collection API in 2.0.
>> 
>>> On Jan 15, 2019, at 9:37 AM, Mark Hanson  wrote:
>>> 
>>> Hello All,
>>> 
>>> I would like to propose that we incorporate Micrometer into Geode to allow 
>>> us to collect statistics and make them more easily available to external 
>>> services should someone chose to implement support for that.
>>> 
>>> In some basic testing, it does not appear to have a significant impact on 
>>> performance to hook this in. I think that in the long run it will really 
>>> improve the visibility of the systems stats in real-time… This will also 
>>> allow some extensibility for people who want to hook into their tracking 
>>> infrastructure, such as New Relic or Data Dog, though we are not putting 
>>> that in.
>>> 
>>> 
>>> What do people think?
>>> 
>>> Thanks,
>>> Mark
>>> 
>>>

Re: Remove mavenLocal from Geode gradle files

2019-05-08 Thread Mark Hanson

Hi Patrick,

If our build should not use a local  .m2 cache, then we should remove 
mavenLocal from the build to avoid these
possible friction points. I, for one, like the idea of reduce opportunity for 
problems in the build.

If there is a meaningful benefit to having a mavenLocal, then we should see if 
we can come up with something that
reduces the likelihood of problems.


Thanks,
Mark

> On May 8, 2019, at 12:40 PM, Patrick Rhomberg  wrote:
> 
> It is strange to me that your build is *only* looking in Maven-local.
> You're not building with --offline, are you?  Does running with
> --refresh-dependencies resolve this issue when you have it?
> 
> Is anything fiddling with your ~/.m2/repository without updating the
> corresponding maven xml entries?  It seems to me like your local Maven xml
> records believes it should have an artifact, but then the expected file
> isn't present on disk.
> 
> I don't think the forum thread you found is still relevant.  That thread
> points to GRADLE-2709, which has been fixed since Gradle 1.9.
> 
> All that said, if you don't want to use your local ~/.m2, that's your
> business.  If that folder doesn't exist, the presence of mavenLocal() 
> shouldn't
> have any effect.  But some of us are making use of it.
> 
> On Wed, May 8, 2019 at 12:24 PM Robert Houghton 
> wrote:
> 
>> Jake, my understanding is that benchmarks does not need geode to be able to
>> pull resources from Maven local. Benchmarks is an external wrapper that
>> executes geode commands, right?
>> 
>> On Wed, May 8, 2019, 12:12 Jacob Barrett  wrote:
>> 
>>> Maven local is necessary for some of our other build processes like
>>> benchmarks.
>>> 
>>> Is there no other way to correct this issue. I have never run into this
>>> issue.
>>> 
>>> -jake
>>> 
>>> 
 On May 8, 2019, at 10:13 AM, Kirk Lund  wrote:
 
 I'd like like to remove mavelLocal the Geode gradle files.
 
 GEODE-6753: Use of mavenLocal in gradle may cause build to fail with
 missing tests dependencies
 https://issues.apache.org/jira/browse/GEODE-6753
 
 Thanks,
 Kirk
>>> 
>>

Re: [DISCUSS] changing geode 32-bit counter stats to 64-bit

2019-06-10 Thread Mark Hanson

+1 for Jake’s approach.

Jake’s approach seems to address the only concern that I know of which is  
backward compatibility.
 
Thanks,
Mark

> On Jun 10, 2019, at 11:20 AM, Robert Houghton  wrote:
> 
> +1 to @Udo Kohlmeyer  comment. If the memory has
> been used, might as well allow us to query the values
> 
> On Fri, Jun 7, 2019 at 2:18 PM Udo Kohlmeyer  wrote:
> 
>> +1, if the LongAdders have already added and the additional memory usage
>> has already been dealt with, then adding the accessors does not really
>> make a difference anymore.
>> 
>> On 6/7/19 13:47, Jacob Barrett wrote:
>>> I like this!
>>> 
>>> I’d go ahead and change all the usage of the int methods to the long
>> methods. I’d deprecate the int methods to make it very clear.
>>> 
>>> If some consumer is using the int methods they will still work with the
>> same rollover issues but perhaps with the deprecated warning they will
>> update their code. Without the warning they may never know.
>>> 
>>> -jake
>>> 
>>> 
 On Jun 7, 2019, at 1:32 PM, Darrel Schneider 
>> wrote:
 
 We have had a couple of tickets that have problems with 32-bit counters
 changing too fast and causing them to be hard to understand when they
>> wrap
 around (see GEODE-6425 and GEODE-6834). We have also had problems with
>> some
 stats being broken because they were changing the 32-bit one when they
 should have been changing the 64-bit one (see GEODE-6776).
 The current implementation has one array of values for the 32-bit stats
>> and
 another array of values for the 64-bit stats. We use indexes into these
 arrays when changing a stat. Given an int "id" used to index these
>> arrays,
 we can not tell if we should be indexing the 32-bit array or 64-bit
>> array.
 The first 32-bit stat for a type will have an id of 0 and the first
>> 64-bit
 stat on that type will also have an id of 0. But our current
>> implementation
 has the same type of value in both arrays (LongAdder
 see: StripedStatisticsImpl fields intAdders and longAdders). So if we
 simply change our implementation to have a single array, then the ids
>> will
 no longer be overloaded.
 
 Changing this internal implementation also improves backward
>> compatibility.
 Currently when we change one of our counters from 32-bit to 64-bit it is
 possible we would break existing applications that are using the
>> Statistics
 interface. It has methods on it that allow stats to be read given an it:
 getInt(int id) and getLong(int id). If they are currently reading a
>> 32-bit
 stat using getInt(id) and we change that stat to be 64-bit (like we did
>> in
 GEODE-6425) then they will now be reading the wrong stat when they call
 getInt(id). If they used getInt(String) or getInt(StatisticDescriptor)
>> they
 will be okay. But if we make this simple change to have 32-bit and
>> 64-bit
 stats stored in the same array then getInt(id) will do the right thing
>> when
 we change a stat from 32-bit to 64-bit.
 
 Does anyone see a problem with making this change?
 
 After we do it I think we should change all of our counters to 64-bit
>> since
 they are always stored in a LongAdder anyway and we should deprecate the
 methods that support 32-bit stats.
>>

Proposal: For PR reviews and change requests can we have a 7 day turn around on re-reviews?

2019-07-09 Thread Mark Hanson

Hi All,

TL;DR

Can we have a norm( preferred, but not required ) of providing feedback within 
seven days of the last checkin to a PR?

Long version

I have just spent a bit of time reviewing PRs that have been open for a while 
and sent some emails to reviewers of the ones that are open the longest. In my 
humble opinion, it would be very nice if we could close out some of the older 
PRs where the requester has made changes to, but reviewers have not 
re-reviewed. An ideal norm would seem to be 7 days. One might notice that I 
have a PR that I requested a change on, that I have not provided feedback on, 
so I am in the same boat...

Thoughts?

Thanks,
Mark

Re: Proposal: For PR reviews and change requests can we have a 7 day turn around on re-reviews?

2019-07-09 Thread Mark Hanson

In Github there is a request re-review option. I just learned more about that 
today. 
I think that people should probably be using that option to interact with 
reviewers. 
I do like the assignee idea. I worry that things might pile up on certain 
people, 
but that already kind of happening because certain people are doing more 
reviews.

Thanks,
Mark


> On Jul 9, 2019, at 11:09 AM, Benjamin Ross  wrote:
> 
> +1
> 
> I think having an assignee would help set better expectations between
> committer and reviewer.
> 
> On Tue, Jul 9, 2019 at 11:05 AM Dan Smith  wrote:
> 
>> +1
>> 
>> What do you think about assigning someone to each PR to make sure it gets
>> through the process? We don't currently seem to be using github's
>> "assignee" field. Committers can make themselves the assignee, but for
>> contributors we could assign a committer who will make sure the PR gets
>> reviewed and merged in a timely fashion.
>> 
>> -Dan
>> 
>> On Tue, Jul 9, 2019 at 10:34 AM Mark Hanson  wrote:
>> 
>>> Hi All,
>>> 
>>> TL;DR
>>> 
>>> Can we have a norm( preferred, but not required ) of providing feedback
>>> within seven days of the last checkin to a PR?
>>> 
>>> Long version
>>> 
>>> I have just spent a bit of time reviewing PRs that have been open for a
>>> while and sent some emails to reviewers of the ones that are open the
>>> longest. In my humble opinion, it would be very nice if we could close
>> out
>>> some of the older PRs where the requester has made changes to, but
>>> reviewers have not re-reviewed. An ideal norm would seem to be 7 days.
>> One
>>> might notice that I have a PR that I requested a change on, that I have
>> not
>>> provided feedback on, so I am in the same boat...
>>> 
>>> Thoughts?
>>> 
>>> Thanks,
>>> Mark
>>

[Proposal] Refactor the Cache and Region perf stats structure.

2019-07-10 Thread Mark Hanson

Hi All,

As many of you may know our structure for our perf stats is not great. I would 
like to propose we refactor the code to have the following inheritance model, 
which Kirk and I came up with.

It is my belief that fixing this will allow future features to be implemented 
in a much less painful way.

Thoughts?

Thanks,
Mark

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-10 Thread Mark Hanson

PartitionRegionStatsImpl can contain one to many RegionStats

> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
> 
> Seems reasonable. I'm guessing that CachePerfImpl contains many RegionStats. 
> Does PartitionRegionStatsImpl just contain a single RegionStats?
> 
> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson  <mailto:mhan...@pivotal.io>> wrote:
> Hi All,
> 
> As many of you may know our structure for our perf stats is not great. I 
> would like to propose we refactor the code to have the following inheritance 
> model, which Kirk and I came up with.
> 
> It is my belief that fixing this will allow future features to be implemented 
> in a much less painful way.
> 
> Thoughts?
> 
> Thanks,
> Mark
>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

Correct.

> On Jul 11, 2019, at 9:23 AM, Darrel Schneider  wrote:
> 
> Why would a PartitionedRegionStatsImpl contain more than one RegionStats?
> Are these representing the local buckets?
> 
> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson  wrote:
> 
>> PartitionRegionStatsImpl can contain one to many RegionStats
>> 
>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
>>> 
>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
>> RegionStats. Does PartitionRegionStatsImpl just contain a single
>> RegionStats?
>>> 
>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson > mhan...@pivotal.io>> wrote:
>>> Hi All,
>>> 
>>> As many of you may know our structure for our perf stats is not great. I
>> would like to propose we refactor the code to have the following
>> inheritance model, which Kirk and I came up with.
>>> 
>>> It is my belief that fixing this will allow future features to be
>> implemented in a much less painful way.
>>> 
>>> Thoughts?
>>> 
>>> Thanks,
>>> Mark
>>> 
>> 
>>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

It depends to be honest. This is just a plan. I would hope to roll them up, but 
I can see a case where you have buckets on different servers, that would seem 
to necessitate that.

> On Jul 11, 2019, at 9:26 AM, Jacob Barrett  wrote:
> 
> There isn’t currently a partition stat instance per bucket. Are you saying 
> you’re making that a thing now?
> 
>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
>> 
>> Correct.
>> 
>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider  wrote:
>>> 
>>> Why would a PartitionedRegionStatsImpl contain more than one RegionStats?
>>> Are these representing the local buckets?
>>> 
>>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson  wrote:
>>>> 
>>>> PartitionRegionStatsImpl can contain one to many RegionStats
>>>> 
>>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
>>>>> 
>>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
>>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
>>>> RegionStats?
>>>>> 
>>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson >>> mhan...@pivotal.io>> wrote:
>>>>> Hi All,
>>>>> 
>>>>> As many of you may know our structure for our perf stats is not great. I
>>>> would like to propose we refactor the code to have the following
>>>> inheritance model, which Kirk and I came up with.
>>>>> 
>>>>> It is my belief that fixing this will allow future features to be
>>>> implemented in a much less painful way.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Thanks,
>>>>> Mark
>>>>> 
>>>> 
>>>> 
>>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

I accept that point. I would certainly like to minimize the work and per bucket 
status would seem undesirable.  I can totally agree to back burner that until 
some point in the future at which point we agree to move forward on it. I was 
just anticipating that would fall out necessarily. That said, let’s consider 
that back burnered for this proposal.

The main classes I really want to cleanup with this proposal are CachePerfStats 
and RegionPerfStats.

Thanks for the good constructive feedback.

> On Jul 11, 2019, at 9:36 AM, Jacob Barrett  wrote:
> 
> So is the root of this proposal really about expanding our current stats 
> collection to include per-bucket stats. If not I would really separate these 
> two ideas. First focus on refactoring these stats classes to be more 
> manageable and readable. Then propose adding per-bucket stats as a thing 
> separately. If this discussion is really about per-bucket stats then let’s 
> focus the subject on that and not really worry about any internal refactoring 
> that must happen to make it work.
> 
>> On Jul 11, 2019, at 9:29 AM, Mark Hanson  wrote:
>> 
>> It depends to be honest. This is just a plan. I would hope to roll them up, 
>> but I can see a case where you have buckets on different servers, that would 
>> seem to necessitate that.
>> 
>>> On Jul 11, 2019, at 9:26 AM, Jacob Barrett  wrote:
>>> 
>>> There isn’t currently a partition stat instance per bucket. Are you saying 
>>> you’re making that a thing now?
>>> 
>>>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
>>>> 
>>>> Correct.
>>>> 
>>>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider  
>>>>> wrote:
>>>>> 
>>>>> Why would a PartitionedRegionStatsImpl contain more than one RegionStats?
>>>>> Are these representing the local buckets?
>>>>> 
>>>>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson  wrote:
>>>>>> 
>>>>>> PartitionRegionStatsImpl can contain one to many RegionStats
>>>>>> 
>>>>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
>>>>>>> 
>>>>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
>>>>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
>>>>>> RegionStats?
>>>>>>> 
>>>>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson >>>>> mhan...@pivotal.io>> wrote:
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> As many of you may know our structure for our perf stats is not great. I
>>>>>> would like to propose we refactor the code to have the following
>>>>>> inheritance model, which Kirk and I came up with.
>>>>>>> 
>>>>>>> It is my belief that fixing this will allow future features to be
>>>>>> implemented in a much less painful way.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> 
>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

See my comments inline..

> On Jul 11, 2019, at 9:36 AM, Darrel Schneider  wrote:
> 
> Currently we do not make visible per bucket stats. Is the above proposal
> just to change the internal implementation but the stats visible in tools
> like vsd are unchanged?
> 
As with my previous e-mail exchange with Jake, I would like to back burner per 
bucket stats. If it turns out to be a problem, I will bring it back before the 
group.

My goal is these are internal changes. I would see it as a problem requiring 
re-evaluation, if this were a meaningful breaking change. E.g. breaks vsd, 
changes public API
An important note would be that fixing a bug in a stat, is not a meaningful 
breaking change.

> Currently we have an internal CachePerfStats which the internal
> RegionPerfStats extends. Does your CacheStats replace CachePerfStats and
> your RegionStats replace RegionPerfStats?
> 

You are 100% correct.

> Currently we have an internal PartitionedRegionStats class. Does
> your PartitionedRegionStats replace it?
> 

Yes. I considered that under the “RegionPerfStats” umbrella.

> Are your "*Stats" interfaces and your "*StatsImpl" classes?

Yes.

> 
> On Thu, Jul 11, 2019 at 9:29 AM Mark Hanson  wrote:
> 
>> It depends to be honest. This is just a plan. I would hope to roll them
>> up, but I can see a case where you have buckets on different servers, that
>> would seem to necessitate that.
>> 
>>> On Jul 11, 2019, at 9:26 AM, Jacob Barrett  wrote:
>>> 
>>> There isn’t currently a partition stat instance per bucket. Are you
>> saying you’re making that a thing now?
>>> 
>>>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
>>>> 
>>>> Correct.
>>>> 
>>>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider 
>> wrote:
>>>>> 
>>>>> Why would a PartitionedRegionStatsImpl contain more than one
>> RegionStats?
>>>>> Are these representing the local buckets?
>>>>> 
>>>>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson 
>> wrote:
>>>>>> 
>>>>>> PartitionRegionStatsImpl can contain one to many RegionStats
>>>>>> 
>>>>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
>>>>>>> 
>>>>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
>>>>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
>>>>>> RegionStats?
>>>>>>> 
>>>>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson > >>>>> mhan...@pivotal.io>> wrote:
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> As many of you may know our structure for our perf stats is not
>> great. I
>>>>>> would like to propose we refactor the code to have the following
>>>>>> inheritance model, which Kirk and I came up with.
>>>>>>> 
>>>>>>> It is my belief that fixing this will allow future features to be
>>>>>> implemented in a much less painful way.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> 
>>

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

My personal thinking was:

Step 1: fix the object structure this round, and try to keep the data output
the same. The downside here is that a chunk of stuff from CachePerfStats would 
then be duplicated into 
RegionPerfStats directly where it was there effectively.  This also means that 
PartitionedRegionStats stops being a standalone class.
It becomes an Implementation of RegionStats (note the name).

Step 2: Add in Metrics facade to track the existing stats.

Step 3: Move existing stats calls to use Metrics decorator.

Step 3: Optional, but desirable,  remove unnecessary metrics from RegionStats 
(note the name change), that really are CacheStats. This would break backward 
compatibility.

Thanks,
Mark


> On Jul 11, 2019, at 10:33 AM, Darrel Schneider  wrote:
> 
> Originally we just had a single instance of CachePerfStats per jvm. So all
> the region related stats were always rolled up onto the single
> CachePerfStats. Later we added RegionPerfStats so that users could see what
> was happening on a per region basis. When RegionPerfStats was added it was
> made to extend CachePerfStats but no new stats were added to it.
> 
> I think we now have some stats that only make sense on a Cache (like
> "regions" and "partitionedRegions") but we also have them on
> RegionPerfStats. I thought these used to always return zero on the region
> but I just looked at the implementation and it looks like they just return
> 1 or 0 on the region (depending on if it is partitioned or not). The help
> text says it will be the number of regions on the cache (so the help is
> correct for CachePerfStats but wrong for RegionPerfStats). I would be okay
> with us dropping the stats that only make sense at the cache level from the
> per region stats.
> 
> Something I could not tell from you diagram is if you are cleaning this up.
> Does CacheStats also have everything that is on RegionStats? Or will it now
> just have the stats that are unique to a cache?
> 
> If you change the internal names (like CachePerfStats -> CacheStats) keep
> in mind that you should use the same type name when calling "createType"
> (in this case "CachePerfStats") for backwards compatibility.
> 
> On Thu, Jul 11, 2019 at 9:45 AM Mark Hanson  wrote:
> 
>> See my comments inline..
>> 
>>> On Jul 11, 2019, at 9:36 AM, Darrel Schneider 
>> wrote:
>>> 
>>> Currently we do not make visible per bucket stats. Is the above proposal
>>> just to change the internal implementation but the stats visible in tools
>>> like vsd are unchanged?
>>> 
>> As with my previous e-mail exchange with Jake, I would like to back burner
>> per bucket stats. If it turns out to be a problem, I will bring it back
>> before the group.
>> 
>> My goal is these are internal changes. I would see it as a problem
>> requiring re-evaluation, if this were a meaningful breaking change. E.g.
>> breaks vsd, changes public API
>> An important note would be that fixing a bug in a stat, is not a
>> meaningful breaking change.
>> 
>>> Currently we have an internal CachePerfStats which the internal
>>> RegionPerfStats extends. Does your CacheStats replace CachePerfStats and
>>> your RegionStats replace RegionPerfStats?
>>> 
>> 
>> You are 100% correct.
>> 
>>> Currently we have an internal PartitionedRegionStats class. Does
>>> your PartitionedRegionStats replace it?
>>> 
>> 
>> Yes. I considered that under the “RegionPerfStats” umbrella.
>> 
>>> Are your "*Stats" interfaces and your "*StatsImpl" classes?
>> 
>> Yes.
>> 
>>> 
>>> On Thu, Jul 11, 2019 at 9:29 AM Mark Hanson  wrote:
>>> 
>>>> It depends to be honest. This is just a plan. I would hope to roll them
>>>> up, but I can see a case where you have buckets on different servers,
>> that
>>>> would seem to necessitate that.
>>>> 
>>>>> On Jul 11, 2019, at 9:26 AM, Jacob Barrett 
>> wrote:
>>>>> 
>>>>> There isn’t currently a partition stat instance per bucket. Are you
>>>> saying you’re making that a thing now?
>>>>> 
>>>>>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
>>>>>> 
>>>>>> Correct.
>>>>>> 
>>>>>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider >> 
>>>> wrote:
>>>>>>> 
>>>>>>> Why would a PartitionedRegionStatsImpl contain more than one
>>>> RegionStats?
>>>>>>> Are these representing the lo

Re: [Proposal] Refactor the Cache and Region perf stats structure.

2019-07-11 Thread Mark Hanson

Hello All,

Since there is no one disagreeing, we are going to start moving forward with 
this.

Thanks,
Mark


> On Jul 11, 2019, at 10:33 AM, Darrel Schneider  wrote:
> 
> Originally we just had a single instance of CachePerfStats per jvm. So all
> the region related stats were always rolled up onto the single
> CachePerfStats. Later we added RegionPerfStats so that users could see what
> was happening on a per region basis. When RegionPerfStats was added it was
> made to extend CachePerfStats but no new stats were added to it.
> 
> I think we now have some stats that only make sense on a Cache (like
> "regions" and "partitionedRegions") but we also have them on
> RegionPerfStats. I thought these used to always return zero on the region
> but I just looked at the implementation and it looks like they just return
> 1 or 0 on the region (depending on if it is partitioned or not). The help
> text says it will be the number of regions on the cache (so the help is
> correct for CachePerfStats but wrong for RegionPerfStats). I would be okay
> with us dropping the stats that only make sense at the cache level from the
> per region stats.
> 
> Something I could not tell from you diagram is if you are cleaning this up.
> Does CacheStats also have everything that is on RegionStats? Or will it now
> just have the stats that are unique to a cache?
> 
> If you change the internal names (like CachePerfStats -> CacheStats) keep
> in mind that you should use the same type name when calling "createType"
> (in this case "CachePerfStats") for backwards compatibility.
> 
> On Thu, Jul 11, 2019 at 9:45 AM Mark Hanson  wrote:
> 
>> See my comments inline..
>> 
>>> On Jul 11, 2019, at 9:36 AM, Darrel Schneider 
>> wrote:
>>> 
>>> Currently we do not make visible per bucket stats. Is the above proposal
>>> just to change the internal implementation but the stats visible in tools
>>> like vsd are unchanged?
>>> 
>> As with my previous e-mail exchange with Jake, I would like to back burner
>> per bucket stats. If it turns out to be a problem, I will bring it back
>> before the group.
>> 
>> My goal is these are internal changes. I would see it as a problem
>> requiring re-evaluation, if this were a meaningful breaking change. E.g.
>> breaks vsd, changes public API
>> An important note would be that fixing a bug in a stat, is not a
>> meaningful breaking change.
>> 
>>> Currently we have an internal CachePerfStats which the internal
>>> RegionPerfStats extends. Does your CacheStats replace CachePerfStats and
>>> your RegionStats replace RegionPerfStats?
>>> 
>> 
>> You are 100% correct.
>> 
>>> Currently we have an internal PartitionedRegionStats class. Does
>>> your PartitionedRegionStats replace it?
>>> 
>> 
>> Yes. I considered that under the “RegionPerfStats” umbrella.
>> 
>>> Are your "*Stats" interfaces and your "*StatsImpl" classes?
>> 
>> Yes.
>> 
>>> 
>>> On Thu, Jul 11, 2019 at 9:29 AM Mark Hanson  wrote:
>>> 
>>>> It depends to be honest. This is just a plan. I would hope to roll them
>>>> up, but I can see a case where you have buckets on different servers,
>> that
>>>> would seem to necessitate that.
>>>> 
>>>>> On Jul 11, 2019, at 9:26 AM, Jacob Barrett 
>> wrote:
>>>>> 
>>>>> There isn’t currently a partition stat instance per bucket. Are you
>>>> saying you’re making that a thing now?
>>>>> 
>>>>>> On Jul 11, 2019, at 9:24 AM, Mark Hanson  wrote:
>>>>>> 
>>>>>> Correct.
>>>>>> 
>>>>>>> On Jul 11, 2019, at 9:23 AM, Darrel Schneider >> 
>>>> wrote:
>>>>>>> 
>>>>>>> Why would a PartitionedRegionStatsImpl contain more than one
>>>> RegionStats?
>>>>>>> Are these representing the local buckets?
>>>>>>> 
>>>>>>>> On Wed, Jul 10, 2019 at 4:57 PM Mark Hanson 
>>>> wrote:
>>>>>>>> 
>>>>>>>> PartitionRegionStatsImpl can contain one to many RegionStats
>>>>>>>> 
>>>>>>>>> On Jul 10, 2019, at 4:53 PM, Dan Smith  wrote:
>>>>>>>>> 
>>>>>>>>> Seems reasonable. I'm guessing that CachePerfImpl contains many
>>>>>>>> RegionStats. Does PartitionRegionStatsImpl just contain a single
>>>>>>>> RegionStats?
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 10, 2019 at 4:49 PM Mark Hanson >>> >>>>>>> mhan...@pivotal.io>> wrote:
>>>>>>>>> Hi All,
>>>>>>>>> 
>>>>>>>>> As many of you may know our structure for our perf stats is not
>>>> great. I
>>>>>>>> would like to propose we refactor the code to have the following
>>>>>>>> inheritance model, which Kirk and I came up with.
>>>>>>>>> 
>>>>>>>>> It is my belief that fixing this will allow future features to be
>>>>>>>> implemented in a much less painful way.
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Mark
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: [DISCUSS] RFC 0: Lightweight RFC Process

2019-07-12 Thread Mark Hanson

Thanks for taking the initiative Dan!

> On Jul 12, 2019, at 12:57 PM, Dan Smith  wrote:
> 
> Following up on this, I took a stab at organizing our old proposals into
> the buckets on the wiki. We now have:
> 
> Under Discussion - Draft and In Discussion proposals
> In Development - proposals under active development
> Active - Proposals that are completely implemented
> Dropped - Proposals that were not approved or development stalled out.
> 
> If I moved your proposal to "Dropped" erroneously, please feel free to move
> it back! I moved things there that did not appear to have been implemented
> or have any recent activity.
> 
> I put a few things in "Unknown State." If you know what state these
> proposals are in, please move them!
> 
> https://cwiki.apache.org/confluence/display/GEODE/Project+Proposals+and+Specifications
> 
> On Wed, Jun 26, 2019 at 11:20 AM Alexander Murmann 
> wrote:
> 
>> Given discussion here and previous discussion on the PR, I consider this
>> proposal approved and updated its state accordingly.
>> 
>> I also incorporated Dan's suggestion of moving deprecated proposals and
>> added a reference to the new process at the top of the Project Proposals
>> and Specifications page
>> <
>> https://cwiki.apache.org/confluence/display/GEODE/Project+Proposals+and+Specifications
>>> 
>> .
>> 
>> Thank you all for you great feedback throughout this process!
>> 
>> On Tue, Jun 25, 2019 at 10:07 AM Dan Smith  wrote:
>> 
 
 Will moving the page around on the wiki result in dead links to the
>> draft
 version?
 
>>> 
>>> No. If you use the share button in the wiki, you get a permanent link to
>>> the page. Even if you just copy the URL from the address bar it doesn't
>>> include the folder the page is in.
>>> 
>>> -Dan
>>> 
>> 
>> 
>> --
>> Alexander J. Murmann
>> (650) 283-1933
>>

Re: Need reviewers for #3762: prevent creation of unnecessary PartitionedRegion callback events

2019-07-17 Thread Mark Hanson

Hi Kirk,

I pinged Darrel and Jason about it as they reviewed part 1.

Thanks,
Mark

> On Jul 17, 2019, at 1:41 PM, Kirk Lund  wrote:
> 
> We need reviewers for PR #3762 -- no one has reviewed it yet:
> https://github.com/apache/geode/pull/3762
> 
> Looks like it involves optimizing PartitionedRegion callback events to
> prevent creation of events that will just become garbage because there are
> no listeners to receive those events.
> 
> Jira ticket is GEODE-6762:
> BucketRegion ops do not
> always need to call createEventForPR
> 
> 
> Thanks!
> Kirk

Volunteers for PR Review?

2019-08-06 Thread Mark Hanson

Hi All, 

PR GEODE-3632 throw NotAuthorizedException for getAll 
https://github.com/apache/geode/pull/3765 


This PR currently has no reviewers. Could we get a few people to review it?

Thanks,
Mark

Reviewers for PR

2019-08-06 Thread Mark Hanson

Hi All,

Here is another PR that could use reviewers.

GEODE-6748: First part of solution #3854 
https://github.com/apache/geode/pull/3854 


Thanks,
Mark

Re: geode-native ipv6

2019-08-08 Thread Mark Hanson

The latest ACE framework seems to have support, but I don’t know how far off 
latest we are. I don’t think we test anything in an IPv6 context, so I would 
say no that we don’t officially support it in the client. Given some time, I 
could do some testing..

Thanks,
Mark

> On Aug 8, 2019, at 7:35 AM, Blake Bender  wrote:
> 
> I'm sure someone will chime in with a more definitive answer, but I'm
> pretty certain the answer is no, sorry.
> 
> Thanks,
> 
> Blake
> 
> 
> On Thu, Aug 8, 2019 at 4:28 AM Mario Ivanac  wrote:
> 
>> Hi,
>> 
>> 
>> can you tell me does geode-native client support ipv6?
>> 
>> 
>> BR,
>> 
>> Mario
>>

Re: Fix for ClassCastException when using Logback for 1.10.0

2019-08-08 Thread Mark Hanson

+1 

I think it is valuable to make life easier for Spring Boot users.

Thanks,
Mark

> On Aug 8, 2019, at 11:24 AM, Kirk Lund  wrote:
> 
> This is the last logging related fix that I'd like to propose adding to
> 1.10.0
> release branch.
> 
> Spring Boot adds Logback and log4j-to-slf4j to the classpath. This
> results in ClassCastExceptions if log4j-core is not excluded.
> 
> This change prevents Geode from using Log4jAgent if Log4j Core is
> present but not using Log4jProvider.
> 
> For example, Log4j uses SLF4JProvider instead of Log4jProvider when
> log4j-to-slf4j is in the classpath.
> 
> By disabling Log4jAgent when other Log4j Providers are in use, this
> prevents problems such as ClassCastExceptions when attempting to cast
> loggers from org.apache.logging.slf4j.SLF4JLogger to
> org.apache.logging.log4j.core.Logger to get the LoggerConfig or
> LoggerContext.
> 
> PR: https://github.com/apache/geode/pull/3892
> GEODE-7050: Use Log4jAgent only if Log4j is using Log4jProvider
> https://issues.apache.org/jira/browse/GEODE-7050
> 
> Thanks,
> Kirk and Aaron

Re: geode-native ipv6

2019-08-08 Thread Mark Hanson

I just tried to connect to Geode with the native client and it did not go well. 
 It exceptioned with an illegal argument error.

That said, it “seems" like it might not be complicated to make it IPv6 
compliant.

Thanks,
Mark

> On Aug 8, 2019, at 9:56 AM, Mark Hanson  wrote:
> 
> The latest ACE framework seems to have support, but I don’t know how far off 
> latest we are. I don’t think we test anything in an IPv6 context, so I would 
> say no that we don’t officially support it in the client. Given some time, I 
> could do some testing..
> 
> Thanks,
> Mark
> 
>> On Aug 8, 2019, at 7:35 AM, Blake Bender  wrote:
>> 
>> I'm sure someone will chime in with a more definitive answer, but I'm
>> pretty certain the answer is no, sorry.
>> 
>> Thanks,
>> 
>> Blake
>> 
>> 
>> On Thu, Aug 8, 2019 at 4:28 AM Mario Ivanac  wrote:
>> 
>>> Hi,
>>> 
>>> 
>>> can you tell me does geode-native client support ipv6?
>>> 
>>> 
>>> BR,
>>> 
>>> Mario
>>> 
>

Re: Propose fix for 1.10 release: Prevent NPE in getLocalSize()

2019-08-14 Thread Mark Hanson

+1 to include the fix

> On Aug 14, 2019, at 9:06 AM, Kirk Lund  wrote:
> 
> +1 to include this fix in 1.10.0
> 
> FYI: The race condition for this code path to throw NPE (which is
> catastrophic and requires restarting the server) was introduced by commit
> 279fa0 on July 31 for GEODE-7001.
> 
> On Tue, Aug 13, 2019 at 6:22 PM Anthony Baker  wrote:
> 
>> Given that we’re trying to stabilize the release branch and this fix seems
>> to *help* that I’m in favor of merging it.
>> 
>> Anthony
>> 
>> 
>>> On Aug 13, 2019, at 5:32 PM, Udo Kohlmeyer  wrote:
>>> 
>>> @Aaron, is this an existing issue (i.e this was not introduced in a
>> current refactor)?
>>> 
>>> If the answer is anything other that "This will make the system stop
>> working", I would vote: -1
>>> 
>>> If this is an existing issue and has been around for a while, I think we
>> hold off including this.
>>> 
>>> I think the boat has sailed on the inclusion of issues into the 1.10
>> release. Sorry...
>>> 
>>> --Udo
>>> 
>>> On 8/13/19 4:58 PM, Aaron Lindsey wrote:
 I’d like to propose including
>> https://github.com/apache/geode/pull/3913/commits/6f1814d1f719cc06b13769c40a9d6d01f99f927c
>> <
>> https://github.com/apache/geode/pull/3913/commits/6f1814d1f719cc06b13769c40a9d6d01f99f927c>
>> in the Geode 1.10 release.
 
 This commit fixes an issue where a NullPointerException is thrown from
>> PartitionedRegion.getLocalSize() when the statistics callback sampler is
>> invoked before a PartitionedRegion is initialized.
 
 - Aaron
 
 
 
>> 
>>

Re: Odg: Need PR reviews

2019-08-27 Thread Mark Hanson

Hi Jake and Blake,

Could you take a look at this? I will test it as well.

Thanks,
Mark

> On Aug 27, 2019, at 9:15 AM, Mario Ivanac  wrote:
> 
> Hi,
> 
> just to remind you.
> 
> Thanks.
> 
> Šalje: Mario Ivanac 
> Poslano: 26. kolovoza 2019. 11:37
> Prima: dev@geode.apache.org 
> Predmet: Need PR reviews
> 
> Hi Geode dev,
> 
> we need review for following PRs:
> 
> Jira ticket:
> 
> https://issues.apache.org/jira/browse/GEODE-7086
> PR:
> https://github.com/apache/geode-native/pull/510
> 
> Jira ticket: 
> https://issues.apache.org/jira/browse/GEODE-7039
> PR:
> https://github.com/apache/geode/pull/3955
> 
> Thanks,
> Mario

Re: [DISCUSS] Release Geode 1.9.1 with logging improvements

2019-08-28 Thread Mark Hanson

+1 for log4j changes etc.

Mark

Re: [VOTE] Apache Geode 1.9.1 RC2

2019-08-29 Thread Mark Hanson

+1

> On Aug 29, 2019, at 5:11 PM, Ryan McMahon  wrote:
> 
> +1
> 
> On Thu, Aug 29, 2019 at 5:11 PM Kirk Lund  wrote:
> 
>> +1
>> 
>> On Thu, Aug 29, 2019 at 5:02 PM Owen Nichols  wrote:
>> 
>>> Hello Geode dev community,
>>> 
>>> This is a release candidate for Apache Geode, version 1.9.1.RC2.
>>> Thanks to all the community members for their contributions to this
>>> release!
>>> 
>>> Please do a review and give your feedback. The deadline is 3PM PST Wed,
>>> September 04 2019.
>>> Release notes can be found at:
>>> 
>> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.9.1
>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.9.1
>>> 
>>> 
>>> 
>>> Please note that we are voting upon the source tags: rel/v1.9.1.RC2
>>> 
>>> Apache Geode:
>>> https://github.com/apache/geode/tree/rel/v1.9.1.RC2 <
>>> https://github.com/apache/geode/tree/rel/v1.9.1.RC2>
>>> Apache Geode examples:
>>> https://github.com/apache/geode-examples/tree/rel/v1.9.1.RC2 <
>>> https://github.com/apache/geode-examples/tree/rel/v1.9.1.RC2>
>>> Apache Geode native:
>>> https://github.com/apache/geode-native/tree/rel/v1.9.1.RC2 <
>>> https://github.com/apache/geode-native/tree/rel/v1.9.1.RC2>
>>> 
>>> Source and binary files:
>>> https://dist.apache.org/repos/dist/dev/geode/1.9.1.RC2/ <
>>> https://dist.apache.org/repos/dist/dev/geode/1.9.1.RC2/>
>>> 
>>> Maven staging repo:
>>> https://repository.apache.org/content/repositories/orgapachegeode-1056 <
>>> https://repository.apache.org/content/repositories/orgapachegeode-1056>
>>> 
>>> Geode's KEYS file containing PGP keys we use to sign the release:
>>> https://github.com/apache/geode/blob/develop/KEYS <
>>> https://github.com/apache/geode/blob/develop/KEYS>
>>> 
>>> PS: Command to run geode-examples: ./gradlew -PgeodeReleaseUrl=
>>> https://dist.apache.org/repos/dist/dev/geode/1.9.1.RC2
>>> -PgeodeRepositoryUrl=
>>> https://repository.apache.org/content/repositories/orgapachegeode-1056
>>> build runAll
>>> 
>>> Regards
>>> Owen & Kirk
>>

[Proposal] Make gfsh "stop server" command synchronous

2019-09-10 Thread Mark Hanson

Hello All,

I would like to propose that we make the gfsh “stop server” command 
synchronous. It is causing some issues with some tests as the rest of the calls 
are blocking. Stop on the other hand immediately returns by comparison.
This causes issues as shown in GEODE-7017 specifically.

GEODE:7017 CI failure: 
org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest > 
startupReportsOnlineOnlyAfterRedundancyRestored
https://issues.apache.org/jira/browse/GEODE-7017 



What do people think?

Thanks,
Mark

Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-10 Thread Mark Hanson

Hi John, 

Kirk and I found that in our testing it was returning before it was fully 
stopped. I have a change that seems viable that waits for the pid file to 
disappear from the subdirectory of the server. I am not a fan. I would prefer 
to wait for the pid to disappear, but that doesn’t seem like it will be 
cross-platform friendly.

Thanks,
Mark

> On Sep 10, 2019, at 3:31 PM, John Blum  wrote:
> 
> `stop server` is synchronous (with an option to break out of the wait using
> CTRL^C) AFAIR.
> 
> Way deep down inside, it simply relies on GemFireCache.close() to return
> (in-process).
> 
> As Darrel mentioned, there is not "true" signal the the server was
> successfully stopped.
> 
> -j
> 
> 
> On Tue, Sep 10, 2019 at 3:23 PM Darrel Schneider 
> wrote:
> 
>> I think it would be good for stop server to confirm in some way that the
>> server has stopped before returning.
>> 
>> On Tue, Sep 10, 2019 at 3:08 PM Mark Hanson  wrote:
>> 
>>> Hello All,
>>> 
>>> I would like to propose that we make the gfsh “stop server” command
>>> synchronous. It is causing some issues with some tests as the rest of the
>>> calls are blocking. Stop on the other hand immediately returns by
>>> comparison.
>>> This causes issues as shown in GEODE-7017 specifically.
>>> 
>>> GEODE:7017 CI failure:
>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
>>> startupReportsOnlineOnlyAfterRedundancyRestored
>>> https://issues.apache.org/jira/browse/GEODE-7017 <
>>> https://issues.apache.org/jira/browse/GEODE-7017>
>>> 
>>> 
>>> What do people think?
>>> 
>>> Thanks,
>>> Mark
>> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)

Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-10 Thread Mark Hanson

I think this cache.close() discussion is off topic. I’m not sure that’s the 
case, but it’s not at the root of my question.

The problem: Using gfsh -e, from a gfsh rule in test, stop server does not 
properly block as the rest of the api seems to. 

I’m looking for a better understanding of the desired interface.

The suggestion put forth is to use the existing indicators to make it 
synchronous. That seems right to me, but I could be wrong.

Thanks,
Mark

Sent from my iPhone

> On Sep 10, 2019, at 7:12 PM, John Blum  wrote:
> 
> @Mike - Who said anything about...
> 
> "*masking it in an early return from the shutdown command doesn't seem like
> the appropriate action.*"
> 
> I think you missed the point.  You explicitly have to break out of the
> wait, which is a conscious decision when *Gfsh* is run interactively.
> 
> The command as I previously stated, is "blocking", or "synchronous" with
> respect to cache.close(), which is "ultimately" what gets called whether
> the stop happens in-process or out-of-process (for that matter).  So, in a
> non-interactive mode, the real issue is, why is the cache not completely
> and properly closed/shutdown after a cache.close() call then?
> 
> Fix cache.close() then!  Don't simply bandaid this thing with yet another
> unreliable means to ascertain whether the cache was completely and properly
> shutdown.  And, don't put responsibility on the user to have register and
> receive notification on complete shutdown, or some other ridiculous means,
> either.
> 
> 
>> On Tue, Sep 10, 2019 at 6:15 PM Michael Stolz  wrote:
>> 
>> I understand that issue John, but masking it in an early return from the
>> shutdown command doesn't seem like the appropriate action.
>> Maybe we should consider that nearly all gfsh commands are not blocking,
>> and rather, have a way to determine which ones are still waiting for
>> completion?
>> 
>> --
>> Mike Stolz
>> Principal Engineer, Pivotal Cloud Cache
>> Mobile: +1-631-835-4771
>> 
>> 
>> 
>>> On Tue, Sep 10, 2019 at 9:13 PM John Blum  wrote:
>>> 
>>> @Anil-  I hear your argument when you are "scripting" with *Gfsh*, but
>>> blocking absolutely maybe less desirable when using *Gfsh* interactively.
>>> There are, after all, many non-cluster based commands.
>>> 
>>> @Mark - I see.  I have generally found in my own testing purposes,
>>> especially automated, that a cache instance is not fully closed and has
>> not
>>> properly released all it's resource even after cache.close() returns.
>>> 
>>> -j
>>> 
>>>> On Tue, Sep 10, 2019 at 5:02 PM Mark Hanson  wrote:
>>>> 
>>>> Hi John,
>>>> 
>>>> Kirk and I found that in our testing it was returning before it was
>> fully
>>>> stopped. I have a change that seems viable that waits for the pid file
>> to
>>>> disappear from the subdirectory of the server. I am not a fan. I would
>>>> prefer to wait for the pid to disappear, but that doesn’t seem like it
>>> will
>>>> be cross-platform friendly.
>>>> 
>>>> Thanks,
>>>> Mark
>>>> 
>>>>> On Sep 10, 2019, at 3:31 PM, John Blum  wrote:
>>>>> 
>>>>> `stop server` is synchronous (with an option to break out of the wait
>>>> using
>>>>> CTRL^C) AFAIR.
>>>>> 
>>>>> Way deep down inside, it simply relies on GemFireCache.close() to
>>> return
>>>>> (in-process).
>>>>> 
>>>>> As Darrel mentioned, there is not "true" signal the the server was
>>>>> successfully stopped.
>>>>> 
>>>>> -j
>>>>> 
>>>>> 
>>>>> On Tue, Sep 10, 2019 at 3:23 PM Darrel Schneider <
>>> dschnei...@pivotal.io>
>>>>> wrote:
>>>>> 
>>>>>> I think it would be good for stop server to confirm in some way that
>>> the
>>>>>> server has stopped before returning.
>>>>>> 
>>>>>> On Tue, Sep 10, 2019 at 3:08 PM Mark Hanson 
>>> wrote:
>>>>>> 
>>>>>>> Hello All,
>>>>>>> 
>>>>>>> I would like to propose that we make the gfsh “stop server” command
>>>>>>> synchronous. It is causing some issues with some tests as the rest
>> of
>>>> the
>>>>>>> calls are blocking. Stop on the other hand immediately returns by
>>>>>>> comparison.
>>>>>>> This causes issues as shown in GEODE-7017 specifically.
>>>>>>> 
>>>>>>> GEODE:7017 CI failure:
>>>>>>> 
>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
>>>>>>> startupReportsOnlineOnlyAfterRedundancyRestored
>>>>>>> https://issues.apache.org/jira/browse/GEODE-7017 <
>>>>>>> https://issues.apache.org/jira/browse/GEODE-7017>
>>>>>>> 
>>>>>>> 
>>>>>>> What do people think?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> -John
>>>>> john.blum10101 (skype)
>>>> 
>>>> 
>>> 
>>> --
>>> -John
>>> john.blum10101 (skype)
>>> 
>> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)

Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-11 Thread Mark Hanson

The idea I am working with at the moment that Kirk pointed me at was to use the 
pid file in the directory as indicator. Once that file disappears the server is 
stopped.

That seems to work in my testing.

Thoughts?

Thanks,
Mark

> On Sep 11, 2019, at 10:23 AM, Dan Smith  wrote:
> 
> It does seem like we should make stop synchronous, or at least make start
> wait for the old process to die as Bruce suggested. Otherwise it is
> difficult for someone to script the restart of a server.
> 
> Looking at the code, it does look like gfsh stop is asynchronous. There are
> multiple ways to stop a server:
> * gfsh stop --dir - it looks like we write out some stop file and return
> immediately. Or, if we can connect over JMX, we invoke the
> MemberMBean.shutDownMember method, which launches a thread to close the
> cache, which is also asynchronous.
> * gfsh stop --pid - this seems to be similar to --dir
> * With a member name - this appears to go to the MemberMBean.shutDownMember
> method as well.
> 
> I think one issue is that the JMX methods to stopping the server may be
> hard to ensure the process is really gone, because they can be invoked
> remotely. That may be why they are asynchronous - they need to return
> something to the caller before shutting down. So maybe Bruce's suggestion
> is better.
> 
> As Jens pointed out - tests should generally just use port 0 for servers.
> 
> -Dan
> 
> On Wed, Sep 11, 2019 at 8:46 AM Jens Deppe  wrote:
> 
>> To circle back to the original test failure that prompted this discussion -
>> the failing test was getting intermittent bind exceptions on subsequent
>> server restarts.
>> 
>> I believe it's quite likely that a process' ports will remain unavailable
>> even after it is gone (I'm not sure if we create listening sockets with
>> SO_REUSEADDR). So, as to John's comment that gfsh is already synchronous, I
>> don't think that adding extra functionality to gfsh, to ultimately just
>> wait longer before exiting, is really solving the problem. I'd suggest you
>> adjust the tests to always start servers with `--server-port=0` so that
>> there are no port conflicts and let the OS handle it.
>> 
>> --Jens
>> 
>> On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt 
>> wrote:
>> 
>>> Blocking or non-blocking, I don't have a strong opinion.  What I'd
>>> really like to have gfsh ensure, though, is that no-one is able to start
>>> a new instance of a server while the old process is still around.  Maybe
>>> the PID file is the way to do that.
>>> 
>>> On 9/10/19 3:08 PM, Mark Hanson wrote:
>>>> Hello All,
>>>> 
>>>> I would like to propose that we make the gfsh “stop server” command
>>> synchronous. It is causing some issues with some tests as the rest of the
>>> calls are blocking. Stop on the other hand immediately returns by
>>> comparison.
>>>> This causes issues as shown in GEODE-7017 specifically.
>>>> 
>>>> GEODE:7017 CI failure:
>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
>>> startupReportsOnlineOnlyAfterRedundancyRestored
>>>> https://issues.apache.org/jira/browse/GEODE-7017 <
>>> https://issues.apache.org/jira/browse/GEODE-7017>
>>>> 
>>>> 
>>>> What do people think?
>>>> 
>>>> Thanks,
>>>> Mark
>>> 
>>

Re: Question about excluding serialized classes

2019-09-11 Thread Mark Hanson

They would be the specific functions, but this doesn’t get us around they other 
problem. I think this approach is not going to work and we are about to start 
looking into other ways.

Thanks,
Mark

> On Sep 11, 2019, at 12:14 PM, Anthony Baker  wrote:
> 
> I think the Decorator approach you outlined could have other impacts as well. 
>  Would I still be able to see specific function executions in statistics or 
> would they all become “TImingFunction”?
> 
> Anthony
> 
> 
>> On Sep 11, 2019, at 12:00 PM, Aaron Lindsey  wrote:
>> 
>> Thanks for your response, Dan.
>> 
>> The second scenario you mentioned (i.e. users calling
>> FunctionService.getFunction(String)) worries me because our PR changes the
>> FunctionService so they would now get back an instance of the decorator
>> function instead of the specific instance they registered by calling
>> FunctionService.registerFunction(Function). Therefore, any explicit casts
>> to a Function implementation like (MyFunction)
>> FunctionService.getFunction("MyFunction") would fail. Does that mean this
>> be a breaking change? The FunctionService class does not specify that
>> getFunction must return the same type function as the one passed to
>> registerFunction, but I could see how users might be relying on that
>> behavior since there is no other way to get a specific function type out of
>> the FunctionService without doing a cast.
>> 
>> - Aaron
>> 
>> 
>> On Wed, Sep 11, 2019 at 10:52 AM Dan Smith  wrote:
>> 
>>> Functions are serialized when you call Execution.execute(Function) instead
>>> of Execution.execute(String). Invoking execute on a function object
>>> serializes the function and executes it on the remote side. Functions
>>> executed this way don't have be registered.
>>> 
>>> Users can also get registered function objects directly from the function
>>> service using FunctionService.getFunction(String) and do whatever they want
>>> with them, which I guess could include serializing them.
>>> 
>>> Hope that helps!
>>> -Dan
>>> 
>>> On Wed, Sep 11, 2019 at 10:27 AM Aaron Lindsey 
>>> wrote:
>>> 
 As part of a PR to add Micrometer timers for function executions
 , we implemented a decorator
 Function that wraps all registered non-internal functions and adds
 instrumentation. This PR is
 failing AnalyzeSerializablesJUnitTest.testSerializables because the
 decorator class is a new Serializable.
 
 I'm not sure if it would be OK to add this class to excludedClasses.txt
 because I don't know whether this function will ever be serialized. If it
 will be serialized, then I'm concerned that this might break backwards
 compatibility because we're changing the serialized form of registered
 functions. If this is the case, then we could implement custom logic for
 serializing the decorator class which would replace its serialized form
 with the serialized form of the inner class. Again, I'm not sure if that
 would be necessary because I don't know the conditions under which a
 function would be serialized.
 
 Could someone help me understand when functions would be persisted or
>>> sent
 over the wire so I can determine if this change would break
>>> compatibility?
 
 Thanks,
 Aaron
 
>>> 
>

Re: [Proposal] Make gfsh "stop server" command synchronous

2019-09-11 Thread Mark Hanson

Good question. I will have to look into that.

Thanks,
Mark

> On Sep 11, 2019, at 10:53 AM, Dan Smith  wrote:
> 
>> The idea I am working with at the moment that Kirk pointed me at was to
> use the pid file in the directory as indicator. Once that file disappears
> the server is stopped.
> 
> How will this work if stop server --member is invoked some a different
> machine than the member that is being stopped?
> 
> -Dan
> 
> On Wed, Sep 11, 2019 at 10:28 AM Mark Hanson  wrote:
> 
>> The idea I am working with at the moment that Kirk pointed me at was to
>> use the pid file in the directory as indicator. Once that file disappears
>> the server is stopped.
>> 
>> That seems to work in my testing.
>> 
>> Thoughts?
>> 
>> Thanks,
>> Mark
>> 
>>> On Sep 11, 2019, at 10:23 AM, Dan Smith  wrote:
>>> 
>>> It does seem like we should make stop synchronous, or at least make start
>>> wait for the old process to die as Bruce suggested. Otherwise it is
>>> difficult for someone to script the restart of a server.
>>> 
>>> Looking at the code, it does look like gfsh stop is asynchronous. There
>> are
>>> multiple ways to stop a server:
>>> * gfsh stop --dir - it looks like we write out some stop file and return
>>> immediately. Or, if we can connect over JMX, we invoke the
>>> MemberMBean.shutDownMember method, which launches a thread to close the
>>> cache, which is also asynchronous.
>>> * gfsh stop --pid - this seems to be similar to --dir
>>> * With a member name - this appears to go to the
>> MemberMBean.shutDownMember
>>> method as well.
>>> 
>>> I think one issue is that the JMX methods to stopping the server may be
>>> hard to ensure the process is really gone, because they can be invoked
>>> remotely. That may be why they are asynchronous - they need to return
>>> something to the caller before shutting down. So maybe Bruce's suggestion
>>> is better.
>>> 
>>> As Jens pointed out - tests should generally just use port 0 for servers.
>>> 
>>> -Dan
>>> 
>>> On Wed, Sep 11, 2019 at 8:46 AM Jens Deppe  wrote:
>>> 
>>>> To circle back to the original test failure that prompted this
>> discussion -
>>>> the failing test was getting intermittent bind exceptions on subsequent
>>>> server restarts.
>>>> 
>>>> I believe it's quite likely that a process' ports will remain
>> unavailable
>>>> even after it is gone (I'm not sure if we create listening sockets with
>>>> SO_REUSEADDR). So, as to John's comment that gfsh is already
>> synchronous, I
>>>> don't think that adding extra functionality to gfsh, to ultimately just
>>>> wait longer before exiting, is really solving the problem. I'd suggest
>> you
>>>> adjust the tests to always start servers with `--server-port=0` so that
>>>> there are no port conflicts and let the OS handle it.
>>>> 
>>>> --Jens
>>>> 
>>>> On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt <
>> bschucha...@pivotal.io>
>>>> wrote:
>>>> 
>>>>> Blocking or non-blocking, I don't have a strong opinion.  What I'd
>>>>> really like to have gfsh ensure, though, is that no-one is able to
>> start
>>>>> a new instance of a server while the old process is still around.
>> Maybe
>>>>> the PID file is the way to do that.
>>>>> 
>>>>> On 9/10/19 3:08 PM, Mark Hanson wrote:
>>>>>> Hello All,
>>>>>> 
>>>>>> I would like to propose that we make the gfsh “stop server” command
>>>>> synchronous. It is causing some issues with some tests as the rest of
>> the
>>>>> calls are blocking. Stop on the other hand immediately returns by
>>>>> comparison.
>>>>>> This causes issues as shown in GEODE-7017 specifically.
>>>>>> 
>>>>>> GEODE:7017 CI failure:
>>>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
>>>>> startupReportsOnlineOnlyAfterRedundancyRestored
>>>>>> https://issues.apache.org/jira/browse/GEODE-7017 <
>>>>> https://issues.apache.org/jira/browse/GEODE-7017>
>>>>>> 
>>>>>> 
>>>>>> What do people think?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mark
>>>>> 
>>>> 
>> 
>>

Re: Please review PR #4024

2019-09-19 Thread Mark Hanson

Hi All, 

This has been merged. It got three reviews already so we merged it.

Thanks,
Mark

> On Sep 18, 2019, at 4:15 PM, Kirk Lund  wrote:
> 
> Please review PR #4024
> https://github.com/apache/geode/pull/4024
> 
> The purpose of this PR is to reduce flaky failures involving ServerLauncher
> tests.

[VOLUNTEER] I am volunteering to handle the 1.11 Apache Geode release.

2019-10-28 Thread Mark Hanson

Hi All,

Since we need a release manager for 1.11 and no one has volunteered, yet. I 
thought I would volunteer.

Thanks,
Mark

[Proposal] Cut 1.11.0 branch on November 4th, 2019.

2019-10-28 Thread Mark Hanson

Hi All,

Looks like I won the contest to be release manager by default, so let's get 
started….

I think we should cut the branch on November 4th, 2019.  Assuming that is 
agreed to, I would pick the most stable build during that day.

Please get back to me if you have any concerns.

Thanks,
Mark

The 1.11.0 branch we be cut on November 4th, 2019.

2019-10-28 Thread Mark Hanson

Please ensure that your changes are in November 3rd, 2019 at the latest. I will 
pick the SHA of the first successful build on November 4th, 2019.


If you realize your change will not make the cut and think it should, please 
let me know.


Thanks,
Mark Hanson

Re: [DISCUSS] is overriding a PR check ever justified?

2019-10-31 Thread Mark Hanson

-1 for "Break glass" approach. Needing a break glass approach is a sign. I 
wonder how hard that would be to make exist. I think our break glass approach 
is that we have someone with the power disable the restrictions in Github for 
the window that we must and then we put it back.

Thanks,
Mark

> On Oct 31, 2019, at 9:26 AM, Benjamin Ross  wrote:
> 
> I would agree with udo, +1 to having an emergency 'break glass' override
> but -1 to having the ability to do it routinely or for convenience.
> 
> The main use case in my mind is for infrastructure related issues that
> block a PR behind unrelated changes which can be really frustrating and
> inconvenient (StressNewTest is a big culprit here) but I'm hoping that if
> constant issues with this arise it will lead to fixing the offending
> infrastructure, whether that means changing test itself or the architecture
> in which it runs, to make our pipelines more reliable. If we smooth over
> PR's that run into issues every time Stress Tests break a test which only
> had logging changes (or similar situations) then there's no incentive for
> us to improve the Stress Tests job.
> 
> Having said all that, being completely without the ability to perform an
> emergency override if everything goes sideways and the only way to fix it
> is to push a commit which can't get a green pipeline seems like a pretty
> good idea to me as long as we all agree on what circumstances qualify as an
> emergency.
> 
> On Thu, Oct 31, 2019 at 2:43 AM Ju@N  wrote:
> 
>> -1 for allowing overrides.
>> If there's an edge case on which it's required, then we could use it at the
>> very last resources *if and only if* it's been discussed and approved
>> through the dev list for that particular case.
>> Cheers.
>> 
>> 
>> On Wed, Oct 30, 2019 at 11:35 PM Robert Houghton 
>> wrote:
>> 
>>> Any committer has the 'status' permission on apache/geode.git. Some API
>>> tokens have it as well. Its curl + github API reasoning to do this.
>>> 
>>> On Wed, Oct 30, 2019 at 2:02 PM Jason Huynh  wrote:
>>> 
 If we are going to allow overrides, then maybe what Owen is describing
 should occur.  Make a request on the dev list and explain the
>> reasoning.
 
 I don't think this has been done and a few have already been
>> overridden.
 
 Also who has the capability to override and knows how.  How is that
 determined?
 
 On Wed, Oct 30, 2019 at 1:59 PM Owen Nichols 
>>> wrote:
 
>> How do you override a check, anyway?
> 
> Much like asking for jira permissions, wiki permissions, etc, just
>> ask
>>> on
> the dev list ;)
> 
> Presumably this type of request would be made as a “last resort”
 following
> a dev list discussion wherein all other reasonable options had been
> exhausted (reworking or splitting up the PR, pushing empty commits,
> rebasing the PR, etc)
> 
>> On Oct 30, 2019, at 1:42 PM, Dan Smith  wrote:
>> 
>> +1 for allowing overrides. I think we should avoid backing
>> ourselves
> into a
>> corner where we can't get anything into develop without talking to
 apache
>> infra. Some infrastructure things we can't even fix without
>> pushing a
>> change develop!
>> 
>> How do you override a check, anyway?
>> 
>> -Dan
>> 
>> On Wed, Oct 30, 2019 at 12:58 PM Donal Evans 
 wrote:
>> 
>>> -1 to overriding from me.
>>> 
>>> The question I have here is what's the rush? Is anything ever so
>>> time-sensitive that you can't even wait the 15 minutes it takes
>> for
>>> it
> to
>>> build and run unit tests? If some infrastructure problem is
>>> preventing
>>> builds or tests from completing then that should be fixed before
>> any
 new
>>> changes are added, otherwise what's the point in even having the
>> pre
>>> check-in process?
>>> 
>>> -Donal
>>> 
>>> On Wed, Oct 30, 2019 at 11:44 AM Nabarun Nag 
>>> wrote:
>>> 
 @Aaron
 It's okay to wait for at least the build, and unit tests to
>>> complete,
> to
 cover all the bases. [There may have been commits in between
>> which
 may
 result in failure because of the revert]  And it's not hard to
>> get
>>> a
 PR
 approval.
 
 -1 on overriding. If the infrastructure is down, which is the
>> test
 framework designed to ensure that we are not checking in unwanted
> changes
 into Apache Geode, wait for the infrastructure to be up, get your
> changes
 verified, get the review from a fellow committer and then
>> check-in
 your
 changes.
 
 I still don't understand why will anyone not wait for unit tests
>>> and
>>> build
 to be successful.
 
 Regards
 Nabarun Nag
 
 On Wed, Oct 30, 2019 at 11:32 AM Aaron Lindsey <
>>> alind...@pivotal.io>
 wrote:
 
> One case when it migh

Re: upgrading concourse to 5.7.0

2019-11-01 Thread Mark Hanson

Hi Sean,

Should I be concerned? I have a bunch of queued up jobs going to test 
distributed unit tests. 

Thanks,
Mark

> On Nov 1, 2019, at 10:06 AM, Sean Goller  wrote:
> 
> After testing, we're going to upgrade the concourse infrastructure to 5.7.0
> this morning. We do not anticipate any issues during this upgrade.
> 
> -Sean.

1.11.0 Release branch has been cut

2019-11-04 Thread Mark Hanson

Hello Geode Dev Community,

We have created a new release branch for Apache Geode 1.1100 - “release/1.11.0"

Please do review and raise any concern with the release branch.
If no concerns are raised, we will start with the voting for the release 
candidate soon.

Regards
Mark Hanson

Re: 1.11.0 Release branch has been cut

2019-11-04 Thread Mark Hanson

Argh... Corrected release version below…

> On Nov 4, 2019, at 4:47 PM, Mark Hanson  wrote:
> 
> Hello Geode Dev Community,
> 
> We have created a new release branch for Apache Geode 1.11.0 - 
> “release/1.11.0"
> 
> Please do review and raise any concern with the release branch.
> If no concerns are raised, we will start with the voting for the release 
> candidate soon.
> 
> Regards
> Mark Hanson
>

Re: Review for #4204

2019-11-05 Thread Mark Hanson

I took care of it.

Best regards,
Mark

> On Oct 28, 2019, at 8:05 AM, Alberto Gomez  wrote:
> 
> Hi,
> 
> Could I ask for a review on https://github.com/apache/geode/pull/4204?
> 
> This PR is about GEODE-7157: 
> (https://jira.apache.org/jira/browse/GEODE-7157).
> 
> Thanks,
> 
> /Alberto G.
> 
> 
>

Re: Review for #4204

2019-11-05 Thread Mark Hanson

Hi Alberto, 

Rebasing and merging your fix broke the build. I am going to revert it. Once 
you resolve the issues. we can remerge it.

Thanks,
Mark

> On Nov 5, 2019, at 9:06 AM, Alberto Gomez  wrote:
> 
> Hi,
> 
> Any volunteer to merge this PR that has already been approved?
> 
> Thanks,
> 
> /Alberto G.
> 
> On 28/10/19 16:05, Alberto Gómez wrote:
>> Hi,
>> 
>> Could I ask for a review on https://github.com/apache/geode/pull/4204?
>> 
>> This PR is about GEODE-7157: 
>> (https://jira.apache.org/jira/browse/GEODE-7157).
>> 
>> Thanks,
>> 
>> /Alberto G.
>> 
>> 
>>

Re: [REQUEST] Squash merge please

2019-11-05 Thread Mark Hanson

Yup. I am a little annoyed at myself about that..

Thanks for the reminder though.

Mark

> On Nov 5, 2019, at 11:48 AM, Owen Nichols  wrote:
> 
> +1
> 
> When merging a PR from GitHub, if the green button does not already say 
> “Squash and merge”, click the little triangle and select “Squash and merge”.
> 
> In addition to all the reasons already listed in this thread, it make reverts 
> a lot cleaner too.
> 
>> On Oct 28, 2019, at 12:47 PM, Jacob Barrett  wrote:
>> 
>> +1
>> 
>> 
>>> On Oct 22, 2019, at 5:12 PM, Nabarun Nag  wrote:
>>> 
>>> Hi Geode Committers,
>>> 
>>> A kind request for using squash commit instead of using merge. 
>>> This will really help us in our bisect operations when a 
>>> regression/flakiness in the product is introduced. We can automate and go 
>>> through fewer commits faster, avoiding commits like "spotless fix" and 
>>> "re-trigger precheck-in" or other minor commits in the merged branch. 
>>> 
>>> Also, please use the commit format : (helps us to know who worked on it, 
>>> what is the history)
>>>   GEODE-: 
>>> 
>>>   * explanation line 1
>>>   * explanation line 2
>>> 
>>> This is not a rule or anything, but a request to help out your fellow 
>>> committers in quickly detecting a problem.
>>> 
>>> For inspiration, we can look into Apache Kafka / Spark where they have a 
>>> complete linear graph for their main branch HEAD [see attachment]
>>> 
>>> Regards
>>> Naba.
>>> 
>>> 
>> 
>

Re: bug fix needed for release/1.11.0

2019-11-06 Thread Mark Hanson

Any other votes? I have 2 people in favor.

Voting will close at noon.

Thanks,
Mark

> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
> 
> The fix for this problem is in the CI pipeline today: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
> 
> On 11/5/19 10:49 AM, Owen Nichols wrote:
>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>> on develop)
>> 
>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
>>> wrote:
>>> 
>>> The PR for GEODE-6661 introduced a problem in SSL communications that needs 
>>> to be fixed.  It changed SSL handshakes to use a temporary buffer that's 
>>> discarded when the handshake completes, but sometimes this buffer contains 
>>> application data that must be retained.  This seems to be causing our 
>>> Benchmark SSL test failures in CI.
>>> 
>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
>>> branch or cherry-pick the correction when it's ready.
>>>

Re: Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Mark Hanson

Thanks Mario. Your vote reminded me not all voters are in the PST time zone.. 
Pardon my thoughtlessness.. 

Voting closes at 12pm PST


> On Nov 6, 2019, at 9:33 AM, Mario Ivanac  wrote:
> 
> +1 for bringing this fix to release/1.11.0
> ____
> Šalje: Mark Hanson 
> Poslano: 6. studenog 2019. 18:28
> Prima: dev@geode.apache.org 
> Predmet: Re: bug fix needed for release/1.11.0
> 
> Any other votes? I have 2 people in favor.
> 
> Voting will close at noon.
> 
> Thanks,
> Mark
> 
>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
>> 
>> The fix for this problem is in the CI pipeline today: 
>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>> 
>> On 11/5/19 10:49 AM, Owen Nichols wrote:
>>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>>> on develop)
>>> 
>>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
>>>> wrote:
>>>> 
>>>> The PR for GEODE-6661 introduced a problem in SSL communications that 
>>>> needs to be fixed.  It changed SSL handshakes to use a temporary buffer 
>>>> that's discarded when the handshake completes, but sometimes this buffer 
>>>> contains application data that must be retained.  This seems to be causing 
>>>> our Benchmark SSL test failures in CI.
>>>> 
>>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
>>>> branch or cherry-pick the correction when it's ready.
>>>> 
>

Re: bug fix needed for release/1.11.0

2019-11-07 Thread Mark Hanson

Based on the lack of “-1” votes, I have cherry-picked the change…

Thanks,
Mark

> On Nov 7, 2019, at 8:54 AM, Bruce Schuchardt  wrote:
> 
> The change passed in SSL benchmark testing and can be cherry-picked into the 
> release branch.  The Benchmark job as a whole failed due to perf degradation 
> with the Security Manager, but that's a different set of tests.
> 
> 
> On 11/6/19 3:57 PM, Helena Bales wrote:
>> +1 to cherry-picking the fix.
>> 
>> The sha hasn't made it to benchmarks yet due to an issue with CI losing
>> resource refs that were needed to keep it moving through the pipeline. The
>> next commit is still about an hour away from triggering benchmarks.
>> In my manual benchmarking of this change, I found that it resolved the
>> issue with SSL and passed the benchmarks. Obviously we still need to
>> confirm that it works through the main pipeline, but I feel confident that
>> it will pass the benchmark job.
>> 
>> Thanks,
>> Helena Bales (they/them)
>> 
>> On Wed, Nov 6, 2019 at 9:28 AM Mark Hanson  wrote:
>> 
>>> Any other votes? I have 2 people in favor.
>>> 
>>> Voting will close at noon.
>>> 
>>> Thanks,
>>> Mark
>>> 
>>>> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt 
>>> wrote:
>>>> The fix for this problem is in the CI pipeline today:
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>>>> On 11/5/19 10:49 AM, Owen Nichols wrote:
>>>>> +1 for bringing this fix to release/1.11.0 (after it has passed
>>> Benchmarks on develop)
>>>>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt 
>>> wrote:
>>>>>> The PR for GEODE-6661 introduced a problem in SSL communications that
>>> needs to be fixed.  It changed SSL handshakes to use a temporary buffer
>>> that's discarded when the handshake completes, but sometimes this buffer
>>> contains application data that must be retained.  This seems to be causing
>>> our Benchmark SSL test failures in CI.
>>>>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on
>>> that branch or cherry-pick the correction when it's ready.
>>>

Re: Adding GEODE-7412 to 1.11 release

2019-11-08 Thread Mark Hanson

Hi,

Can someone give me a known good sha?

I will add it in.

Thanks,
Mark

> On Nov 8, 2019, at 2:20 PM, Jens Deppe  wrote:
> 
> Hmm, I thought this was implicitly fixed by various build refactorings that
> went into 1.11.0. I see that we're creating local maven artifacts for
> geode-pulse-1.11.0.war. The next question is then whether we're actually
> publishing those.
> 
> --Jens
> 
> On Fri, Nov 8, 2019 at 12:56 PM Owen Nichols  wrote:
> 
>> +1
>> 
>>> On Nov 8, 2019, at 12:54 PM, Udo Kohlmeyer  wrote:
>>> 
>>> @Owen,
>>> 
>>> I did not specifically check all web archives for this issue. Yes, I
>> *should* have been caught in that ticket.
>>> 
>>> On 11/8/19 11:03 AM, Owen Nichols wrote:
 I’m curious, how was this missed in
>> https://issues.apache.org/jira/browse/GEODE-7241 <
>> https://issues.apache.org/jira/browse/GEODE-7241> ?
 
> On Nov 8, 2019, at 10:56 AM, Udo Kohlmeyer  wrote:
> 
> Hi there Geode Dev,
> 
> I would like to request that we add
>> https://issues.apache.org/jira/browse/GEODE-7412 <
>> https://issues.apache.org/jira/browse/GEODE-7412> to the 1.11 release.
> 
> This change is a build change that has crept in since 1.8. The issue
>> that is to be fixed is that the web archive, (geode-pulse) has since 1.8
>> been uploaded as a jar file to maven central.
> 
> I would like to fix this by having the WAR artifact being pushed to
>> maven central again.
> 
> --Udo
> 
 
>> 
>>

Release candidate target date...

2019-11-12 Thread Mark Hanson

Hello Geode Dev Community,

We have a release branch for Apache Geode 1.11.0 - “release/1.11.0”. People 
have been voting to pick up fixes and I have been cherry picking them 
accordingly. I think it is important that we look forward to a release date and 
I have picked November 19th, 2019 as a time to try for. As with any process, if 
a sufficiently scary issue comes up we will reschedule, but this seems like a 
reasonable date.

As always, please contact me with any concerns you might have. If no concerns 
are raised, we will start voting on the release on November 20th, 2019.

Regards
Mark Hanson

Re: Release candidate target date...

2019-11-12 Thread Mark Hanson

What Owen said basically... I prefer to try for an early date and reschedule to 
a later date if I must. Aiming for a later date leaves little window to deal 
with surprises other than to miss the target date. Its almost never bad to come 
in a few days early, but it is rarely good to come in late.

Finally, Thanksgiving in the US is very unpredictable for vacations and thus a 
few days earlier guarantees more eyes.

Thanks,
Mark

Sent from my iPhone

> On Nov 12, 2019, at 4:09 PM, Owen Nichols  wrote:
> 
> Could be:
> * not very many fixes have come in, so maybe there’s actually nothing further 
> we’re waiting for?
> * to give us a better chance of hitting Dec 2 goal, in case it takes more 
> than one RC?
> * some people might prefer to give feedback against a formal RC rather than 
> informally against the branch?
> * impact of Thanksgiving?
> 
>> On Nov 12, 2019, at 4:05 PM, Alexander Murmann  wrote:
>> 
>> I think it's really great to agree on the date by which we want to have a
>> RC1 early. However, I thought the target release date is beginning of
>> December, which would probably bring us to December 2nd, given the first is
>> a Sunday. The voting period is 3 days. I would have expected that we'd
>> create the RC around 11/26 (with the caveat that everything looks fine by
>> then). What's the reasoning behind cutting it so early?
>> 
>>> On Tue, Nov 12, 2019 at 3:47 PM Owen Nichols  wrote:
>>> 
>>> +1 for cutting RC1 within two weeks of creating the release branch :)
>>> 
>>>> On Nov 12, 2019, at 3:00 PM, Mark Hanson  wrote:
>>>> 
>>>> Hello Geode Dev Community,
>>>> 
>>>> We have a release branch for Apache Geode 1.11.0 - “release/1.11.0”.
>>> People have been voting to pick up fixes and I have been cherry picking
>>> them accordingly. I think it is important that we look forward to a release
>>> date and I have picked November 19th, 2019 as a time to try for. As with
>>> any process, if a sufficiently scary issue comes up we will reschedule, but
>>> this seems like a reasonable date.
>>>> 
>>>> As always, please contact me with any concerns you might have. If no
>>> concerns are raised, we will start voting on the release on November 20th,
>>> 2019.
>>>> 
>>>> Regards
>>>> Mark Hanson
>>>> 
>>> 
>>> 
>> 
>> -- 
>> Alexander J. Murmann
>> (650) 283-1933
>

Re: Propose adding GEODE-7400 fix to 1.11 release

2019-11-13 Thread Mark Hanson

FYI, this has been added to the release/1.11.0 branch.

Thanks,
Mark

> On Nov 11, 2019, at 9:42 AM, Jason Huynh  wrote:
> 
> +1
> 
> On Mon, Nov 11, 2019 at 9:41 AM Kirk Lund  wrote:
> 
>> I propose merging the fix for GEODE-7400 (merged to develop today) to the
>> 1.11 release branch.
>> 
>> My fix for GEODE-7330 (merged to develop in late October) introduced
>> GEODE-7400 which is the potential for RejectedExecutionException to be
>> thrown within FederatingManager.
>> 
>> Thanks,
>> Kirk
>> 
>> commit 3c5a6ccf40b03c345f53f28214513a9d76a1e024
>> Author: Aaron Lindsey 
>> Date:   Mon Nov 11 09:36:24 2019 -0800
>> 
>>GEODE-7400: Prevent RejectedExecutionException in FederatingManager
>> (#4270)
>> 
>>Commit f0c96db73263bb1b3cb04558f2a720d70f43421f changed the
>>FederatingManager class so that it reuses the same ExecutorService
>>between restarts. After that change, if we start the manager after
>>previously starting and stopping it, we get RejectedExecutionException
>>because it tries to invoke a task on the same ExecutorService which has
>>been shut down.
>> 
>>This commit changes the FederatingManager so that it invokes a supplier
>>to get a new ExecutorService each time it is started to prevent the
>>RejectedExecutionException.
>> 
>>Co-authored-by: Aaron Lindsey 
>>Co-authored-by: Kirk Lund 
>>

Re: Adding GEODE-7412 to 1.11 release

2019-11-13 Thread Mark Hanson

Thanks Udo.

> On Nov 13, 2019, at 10:32 AM, Udo Kohlmeyer  wrote:
> 
> @Mark,
> 
> According to investigation that has been done, GEODE-7412 is a non-issue.. 
> Fixed by another ticket, that is already part of /1.11/ and /develop/
> 
> --Udo
> 
> On 11/8/19 3:35 PM, Mark Hanson wrote:
>> Hi,
>> 
>> Can someone give me a known good sha?
>> 
>> I will add it in.
>> 
>> Thanks,
>> Mark
>> 
>>> On Nov 8, 2019, at 2:20 PM, Jens Deppe  wrote:
>>> 
>>> Hmm, I thought this was implicitly fixed by various build refactorings that
>>> went into 1.11.0. I see that we're creating local maven artifacts for
>>> geode-pulse-1.11.0.war. The next question is then whether we're actually
>>> publishing those.
>>> 
>>> --Jens
>>> 
>>> On Fri, Nov 8, 2019 at 12:56 PM Owen Nichols  wrote:
>>> 
>>>> +1
>>>> 
>>>>> On Nov 8, 2019, at 12:54 PM, Udo Kohlmeyer  wrote:
>>>>> 
>>>>> @Owen,
>>>>> 
>>>>> I did not specifically check all web archives for this issue. Yes, I
>>>> *should* have been caught in that ticket.
>>>>> On 11/8/19 11:03 AM, Owen Nichols wrote:
>>>>>> I’m curious, how was this missed in
>>>> https://issues.apache.org/jira/browse/GEODE-7241 <
>>>> https://issues.apache.org/jira/browse/GEODE-7241> ?
>>>>>>> On Nov 8, 2019, at 10:56 AM, Udo Kohlmeyer  wrote:
>>>>>>> 
>>>>>>> Hi there Geode Dev,
>>>>>>> 
>>>>>>> I would like to request that we add
>>>> https://issues.apache.org/jira/browse/GEODE-7412 <
>>>> https://issues.apache.org/jira/browse/GEODE-7412> to the 1.11 release.
>>>>>>> This change is a build change that has crept in since 1.8. The issue
>>>> that is to be fixed is that the web archive, (geode-pulse) has since 1.8
>>>> been uploaded as a jar file to maven central.
>>>>>>> I would like to fix this by having the WAR artifact being pushed to
>>>> maven central again.
>>>>>>> --Udo
>>>>>>> 
>>>>

Re: Release candidate target date...

2019-11-19 Thread Mark Hanson

Hello Geode Dev Community,

As I will be doing a building a release candidate in about 5 hours. I am not 
aware of any serious issues at this moment. If you have any last minute 
checkins for 1.11.0, now is the time.

Thanks,
Mark

> On Nov 12, 2019, at 3:00 PM, Mark Hanson  wrote:
> 
> Hello Geode Dev Community,
> 
> We have a release branch for Apache Geode 1.11.0 - “release/1.11.0”. People 
> have been voting to pick up fixes and I have been cherry picking them 
> accordingly. I think it is important that we look forward to a release date 
> and I have picked November 19th, 2019 as a time to try for. As with any 
> process, if a sufficiently scary issue comes up we will reschedule, but this 
> seems like a reasonable date.
> 
> As always, please contact me with any concerns you might have. If no concerns 
> are raised, we will start voting on the release on November 20th, 2019.
> Regards
> Mark Hanson
>

Access to upload Apache Geode to Docker Hub.

2019-11-19 Thread Mark Hanson

Hi, 

I would like to have access to upload to Docker Hub, so I can release Apache 
Geode to Docker Hub.

My DockerHub ID is mhansonp.

Thanks,
Mark

[VOTE] Apache Geode 1.11.0.RC2

2019-11-20 Thread Mark Hanson

Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.11.0.RC2.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you performed.

Voting deadline:
3PM PST Mon, November 25 2019.

Please note that we are voting upon the source tag:
rel/v1.11.0.RC2

Release notes:
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.11.0

Source and binary distributions:
https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC2/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachegeode-1062

GitHub:
https://github.com/apache/geode/tree/rel/v1.11.0.RC2
https://github.com/apache/geode-examples/tree/rel/v1.11.0.RC2
https://github.com/apache/geode-native/tree/rel/v1.11.0.RC2

Pipelines:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-rc

Geode's KEYS file containing PGP keys we use to sign the release:
https://github.com/apache/geode/blob/develop/KEYS

Command to run geode-examples:
./gradlew 
-PgeodeReleaseUrl=https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC2 
-PgeodeRepositoryUrl=https://repository.apache.org/content/repositories/orgapachegeode-1062
 build runAll

Access to edit the Geode Wiki

2019-11-20 Thread Mark Hanson

Hi All,

Can I have access to edit the Geode Wiki to add release notes? My confluence ID 
is "mhanson".

Thanks,
Mark

Re: [VOTE] Apache Geode 1.11.0.RC2

2019-11-21 Thread Mark Hanson

Agreed. 


> On Nov 21, 2019, at 11:38 AM, Dan Smith  wrote:
> 
> -1 based on Blakes feedback. If we need to change the native client tag and
> source zip, we need to create RC3.
> 
> -Dan
> 
> On Thu, Nov 21, 2019 at 11:31 AM Blake Bender  wrote:
> 
>> We could also just tag the current NC develop branch.  I've run all my
>> integration tests from NC develop branch against 1.11.0.RC2, and it looks
>> good on Mac, Windows and Linux.
>> 
>> 
>> On Thu, Nov 21, 2019 at 11:25 AM Blake Bender  wrote:
>> 
>>> If it's a requirement that the native client runs properly, I'm a -1.
>>> There's a segfault bug in NC logging (see
>>> https://github.com/apache/geode-native/pull/545), and the release tag is
>>> prior to the fix.  I'd prefer to pick up NC from this commit:
>>> 
>> https://github.com/apache/geode-native/commit/c932a1d765809dcd8d7a0c0f14a3aa6c98e18a5c
>>> .
>>> 
>>> Thanks,
>>> 
>>> Blake
>>> 
>>> 
>>> On Wed, Nov 20, 2019 at 5:16 PM Mark Hanson  wrote:
>>> 
>>>> Hello Geode Dev Community,
>>>> 
>>>> This is a release candidate for Apache Geode version 1.11.0.RC2.
>>>> Thanks to all the community members for their contributions to this
>>>> release!
>>>> 
>>>> Please do a review and give your feedback, including the checks you
>>>> performed.
>>>> 
>>>> Voting deadline:
>>>> 3PM PST Mon, November 25 2019.
>>>> 
>>>> Please note that we are voting upon the source tag:
>>>> rel/v1.11.0.RC2
>>>> 
>>>> Release notes:
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.11.0
>>>> 
>>>> Source and binary distributions:
>>>> https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC2/
>>>> 
>>>> Maven staging repo:
>>>> https://repository.apache.org/content/repositories/orgapachegeode-1062
>>>> 
>>>> GitHub:
>>>> https://github.com/apache/geode/tree/rel/v1.11.0.RC2
>>>> https://github.com/apache/geode-examples/tree/rel/v1.11.0.RC2
>>>> https://github.com/apache/geode-native/tree/rel/v1.11.0.RC2
>>>> 
>>>> Pipelines:
>>>> 
>>>> 
>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main
>>>> 
>>>> 
>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-rc
>>>> 
>>>> Geode's KEYS file containing PGP keys we use to sign the release:
>>>> https://github.com/apache/geode/blob/develop/KEYS
>>>> 
>>>> Command to run geode-examples:
>>>> ./gradlew -PgeodeReleaseUrl=
>>>> https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC2
>>>> -PgeodeRepositoryUrl=
>>>> https://repository.apache.org/content/repositories/orgapachegeode-1062
>>>> build runAll
>>>> 
>>>> 
>>>> 
>>

Re: Cache.close is not synchronous?

2019-11-25 Thread Mark Hanson

+1 to fix.

> On Nov 25, 2019, at 2:02 PM, John Blum  wrote:
> 
> +1 ^ 64!
> 
> I found this out the hard way some time ago and is why STDG exists in the
> first place (i.e. usability issues, particularly with testing).
> 
> On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund  wrote:
> 
>> I found a test that closes the cache and then recreates the cache multiple
>> times with 2 second sleep between each. I tried to remove the Thread.sleep
>> and found that recreating the cache
>> throws DistributedSystemDisconnectedException (see below).
>> 
>> This seems like a usability nightmare. Anyone have any ideas WHY it's this
>> way?
>> 
>> Personally, I want Cache.close() to block until both Cache and
>> DistributedSystem are closed and the API is ready to create a new Cache.
>> 
>> org.apache.geode.distributed.DistributedSystemDisconnectedException: This
>> connection to a distributed system has been disconnected.
>>at
>> 
>> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
>>at
>> 
>> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
>>at
>> 
>> org.apache.geode.internal.cache.GemFireCacheImpl.(GemFireCacheImpl.java:791)
>>at
>> 
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
>>at
>> 
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
>>at
>> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
>> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)

[VOTE] Release candidate for Apache Geode version 1.11.0.RC3.

2019-11-25 Thread Mark Hanson

Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.11.0.RC3.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you performed.

Voting deadline:
11AM PST Monday December 2 2019.

Please note that we are voting upon the source tag:
rel/v1.11.0.RC3

Release notes:
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.11.0

Source and binary distributions:
https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC3/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachegeode-1063

GitHub:
https://github.com/apache/geode/tree/rel/v1.11.0.RC3
https://github.com/apache/geode-examples/tree/rel/v1.11.0.RC3
https://github.com/apache/geode-native/tree/rel/v1.11.0.RC3

Pipelines:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-rc

Geode's KEYS file containing PGP keys we use to sign the release:
https://github.com/apache/geode/blob/develop/KEYS

Command to run geode-examples:
./gradlew 
-PgeodeReleaseUrl=https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC3 
-PgeodeRepositoryUrl=https://repository.apache.org/content/repositories/orgapachegeode-1063
 build runAll

Regards
Mark Hanson

Re: Request for addition to 1.11 RC: GEODE-7454: Docs for Cluster Management Service

2019-12-03 Thread Mark Hanson

Done.

> On Dec 3, 2019, at 9:24 AM, Dave Barnes  wrote:
> 
> Docs for a feature that's already implemented - no code changes.
> Can be cherry-picked from the develop branch as-is with no modifications.
> https://github.com/apache/geode/commit/e48f34048c574440ed7e0640f42e9c82d789becb

Re: [VOTE] Release candidate for Apache Geode version 1.11.0.RC3.

2019-12-04 Thread Mark Hanson

Just an update…

1.11.0.RC3 is not going out. We are in a holding pattern on RC4 due to the 
issue that Lynn mentioned and other issues found.

This is another strike against that RC3 release. 

If the contributors deem the fix necessary ( I assume they would ), we will put 
in a fix for that as well.

I will provide the full list of outstanding issues shortly.

Thanks,
Mark 

> On Dec 4, 2019, at 11:16 AM, John Blum  wrote:
> 
> I am changing my vote to -1!
> 
> I have filed GEODE-7531  
> [1],
> which is a serious blocking issue for all things *Spring* for Apache
> Geode.  This issue alone is currently preventing me from upgrading *Spring
> Boot for Apache Geode* (SBDG) to Apache Geode 1.10, which I plan to do in
> SBDG 1.3, which is based on *Spring Data for Apache Geode* (SDG)
> Neumann/2.3, which is currently already pulling in Apache Geode 1.10, soon
> to be upgraded to 1.11 once this issue is resolved and 1.11 is available.
> 
> If you need further explanation, you don't need to look any further than
> the description as well as my comment
> 
> [2].
> 
> -j
> 
> [1] https://issues.apache.org/jira/browse/GEODE-7531
> [2]
> https://issues.apache.org/jira/browse/GEODE-7531?focusedCommentId=16988096&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988096
> 
> 
> On Wed, Dec 4, 2019 at 9:24 AM John Blum  wrote:
> 
>> Indeed, both dependencies (geode-logging & geode-serialization) are
>> listed as runtime dependencies.
>> 
>> *> Is SDG creating its dependencies manually?*
>> 
>> I am not quite following your thinking on this question.  Of course SDG
>> uses transitive dependencies. SDG must declare direct dependencies on
>> geode-core, geode-cq, geode-lucene and geode-wan., as it is using those
>> features API to implement the functionality provided by SDG.
>> 
>> Anyway, it because Apache Geode's public API is broken/incomplete
>> (especially from a framework/tooling perspective, but even an application
>> perspective in many cases) that SDG must rely on certain (non-protected)
>> "internal" APIs.  It turns out that those "internal" classes have hard
>> (i.e. compile-time) dependencies on geode-logging & geode-serialization
>> to even build a project (application, framework or otherwise) using those
>> classes with Apache Geode.
>> 
>> I am currently exploring whether I can alter the use of the "internal"
>> class(es) to avoid forcing a compile-time dependency.
>> 
>> -j
>> 
>> 
>> On Mon, Dec 2, 2019 at 12:42 PM Jacob Barrett  wrote:
>> 
>>> 
>>> 
 On Dec 1, 2019, at 2:40 PM, John Blum  wrote:
 
 After some modifications to Spring Data for Apache Geode (Spring Data
 Geode; SDG), I was finally able to build SDG with Apache Geode 1.11.
 
 While I support the modularization effort, I would make it very clear
>>> (in
 documentation) now that both geode-logging and geode-serialization are
 required on the application classpath when using Apache Geode.
 
 Technically, I am not entirely certain about geode-serialization (yet),
>>> but
 geode-logging is strictly required to use Apache Geode.  I need to run
>>> some
 more tests.
>>> 
>>> Both are properly listed as runtime scope in the geode-core POM. Is SDG
>>> creating its dependencies manually or using the transitive dependencies
>>> from the Geode POMs?
>>> 
>>> -Jake
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> -John
>> john.blum10101 (skype)
>> 
> 
> 
> -- 
> -John
> Spring Data Team

Re: [VOTE] Release candidate for Apache Geode version 1.11.0.RC3.

2019-12-04 Thread Mark Hanson

So, outstanding issues that I see right now are 

GEODE-7531
GEODE-7537
GEODE-7538

Thanks,
Mark

> On Dec 4, 2019, at 2:11 PM, John Blum  wrote:
> 
> This is not a test failure in SDG.  SDG builds fine with Apache Geode 1.11
> (and all tests pass), as I indicated above in my origin +0 vote.
> 
> This is a definitive problem for SBDG when using STDG to mock Apache Geode
> resources/objects, which is caused by GEODE-7531.
> 
> Either way, the design/code changes made in GEODE-6759 were poor and should
> be fixed regardless which GEODE-7531 describes.
> 
> -j
> 
> 
> On Wed, Dec 4, 2019 at 2:04 PM Dan Smith  wrote:
> 
>> On Wed, Dec 4, 2019 at 11:16 AM John Blum  wrote:
>> 
>>> I am changing my vote to -1!
>>> 
>>> I have filed GEODE-7531 <
>> https://issues.apache.org/jira/browse/GEODE-7531>
>>> [1],
>>> which is a serious blocking issue for all things *Spring* for Apache
>>> Geode.  This issue alone is currently preventing me from upgrading
>> *Spring
>>> Boot for Apache Geode* (SBDG) to Apache Geode 1.10, which I plan to do in
>>> SBDG 1.3, which is based on *Spring Data for Apache Geode* (SDG)
>>> Neumann/2.3, which is currently already pulling in Apache Geode 1.10,
>> soon
>>> to be upgraded to 1.11 once this issue is resolved and 1.11 is available.
>>> 
>> 
>> 
>> I commented on the above JIRA. While we definitely don't want to break
>> spring data geode, it looks like maybe the issue is just with one unit test
>> in Spring Data Geode using an internal geode API to inject something into a
>> singleton? If that's the case, I think it would be better to fix that one
>> test in SDG.
>> 
>> -Dan
>> 
> 
> 
> -- 
> -John
> Spring Data Team

Re: Request GEODE-7510/GEODE-7538 be cherry-picked into release 1.11

2019-12-11 Thread Mark Hanson

Can I get the SHA of the commit?

Thanks,
Mark

> On Dec 11, 2019, at 11:02 AM, Jason Huynh  wrote:
> 
> Hello,
> 
> GEODE-7538 was highlighted as blocking the 1.11 release.  This has now been
> addressed and propose that this gets merged over to release 1.11.
> 
> This issue solves a few things, most notably: GEODE-7510 shows
> inconsistency between secondaries and primaries.  GEODE-7538 showed
> operations not consistently being applied.  The code change is a revert
> that modified profile calculation.  It may affect other areas that were
> showing up as flaky as it's timing related.
> 
> Thanks,
> -Jason

[DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Mark Hanson

Hi All,

There was a suggestion that since I am making a couple user visible API changes 
that I might want to ask the dev list.

Basically I was migrating code from AttributesFactory to RegionFactory and hit 
a few snags along the way.

Please see https://github.com/apache/geode/pull/4409 
 specifically Cache.java, 
RegionFactory.java, and for extra credit GemfireCacheImpl.java

I have commented at the relevant changes.

Thanks,
Mark

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Mark Hanson



In Cache.java

+   RegionFactory createRegionFactory(RegionFactory 
regionFactory);

In RegionFactory.java
 + public RegionFactory(InternalCache cache, RegionFactory regionFactory)
and
+ public RegionFactory(RegionFactory regionFactory)

Lastly in GemfireCacheImpl.java
+ public  RegionFactory createRegionFactory(RegionFactory 
regionFactory) 

Thanks,
Mark


> On Dec 11, 2019, at 3:25 PM, Dan Smith  wrote:
> 
> I see a lot of PR comments on those PRs. What is the new API you added?
> 
> -Dan
> 
> On Wed, Dec 11, 2019 at 2:57 PM Mark Hanson  wrote:
> 
>> Hi All,
>> 
>> There was a suggestion that since I am making a couple user visible API
>> changes that I might want to ask the dev list.
>> 
>> Basically I was migrating code from AttributesFactory to RegionFactory and
>> hit a few snags along the way.
>> 
>> Please see https://github.com/apache/geode/pull/4409 <
>> https://github.com/apache/geode/pull/4409> specifically Cache.java,
>> RegionFactory.java, and for extra credit GemfireCacheImpl.java
>> 
>> I have commented at the relevant changes.
>> 
>> Thanks,
>> Mark

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Mark Hanson

In discussion with Dan, I made a few realizations, so I have new API coming. 

Basically dropping the use of InternalCache below.

Please hold off on reviewing.

Thanks,
Mark

> On Dec 11, 2019, at 3:34 PM, Mark Hanson  wrote:
> 
> 
> In Cache.java
> 
> +   RegionFactory createRegionFactory(RegionFactory 
> regionFactory);
> 
> In RegionFactory.java
> + public RegionFactory(InternalCache cache, RegionFactory regionFactory)
> and
> + public RegionFactory(RegionFactory regionFactory)
> 
> Lastly in GemfireCacheImpl.java
> + public  RegionFactory createRegionFactory(RegionFactory 
> regionFactory) 
> 
> Thanks,
> Mark
> 
> 
>> On Dec 11, 2019, at 3:25 PM, Dan Smith  wrote:
>> 
>> I see a lot of PR comments on those PRs. What is the new API you added?
>> 
>> -Dan
>> 
>> On Wed, Dec 11, 2019 at 2:57 PM Mark Hanson  wrote:
>> 
>>> Hi All,
>>> 
>>> There was a suggestion that since I am making a couple user visible API
>>> changes that I might want to ask the dev list.
>>> 
>>> Basically I was migrating code from AttributesFactory to RegionFactory and
>>> hit a few snags along the way.
>>> 
>>> Please see https://github.com/apache/geode/pull/4409 <
>>> https://github.com/apache/geode/pull/4409> specifically Cache.java,
>>> RegionFactory.java, and for extra credit GemfireCacheImpl.java
>>> 
>>> I have commented at the relevant changes.
>>> 
>>> Thanks,
>>> Mark
>

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Mark Hanson

Basically the point is to allow a use to copy a RegionFactory because under 
certain circumstances it is necessary. I found that when migrating from 
AttributesFactory.

Thanks,
Mark

> On Dec 11, 2019, at 3:48 PM, Anthony Baker  wrote:
> 
> Mark,
> 
> Can you share how the API changes will help the user?  
> 
> Thanks,
> Anthony
> 
> 
>> On Dec 11, 2019, at 2:57 PM, Mark Hanson  wrote:
>> 
>> Hi All,
>> 
>> There was a suggestion that since I am making a couple user visible API 
>> changes that I might want to ask the dev list.
>> 
>> Basically I was migrating code from AttributesFactory to RegionFactory and 
>> hit a few snags along the way.
>> 
>> Please see https://github.com/apache/geode/pull/4409 
>> <https://github.com/apache/geode/pull/4409> specifically Cache.java, 
>> RegionFactory.java, and for extra credit GemfireCacheImpl.java
>> 
>> I have commented at the relevant changes.
>> 
>> Thanks,
>> Mark
>

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-11 Thread Mark Hanson

Yup, fixing that.

> On Dec 11, 2019, at 3:38 PM, Darrel Schneider  wrote:
> 
> Don't expose "InternalCache" on RegionFactory. You probably want it to be
> "Cache".
> 
> On Wed, Dec 11, 2019 at 3:35 PM Mark Hanson  wrote:
> 
>> 
>> In Cache.java
>> 
>> +   RegionFactory createRegionFactory(RegionFactory
>> regionFactory);
>> 
>> In RegionFactory.java
>> + public RegionFactory(InternalCache cache, RegionFactory
>> regionFactory)
>> and
>> + public RegionFactory(RegionFactory regionFactory)
>> 
>> Lastly in GemfireCacheImpl.java
>> + public  RegionFactory createRegionFactory(RegionFactory> V> regionFactory)
>> 
>> Thanks,
>> Mark
>> 
>> 
>>> On Dec 11, 2019, at 3:25 PM, Dan Smith  wrote:
>>> 
>>> I see a lot of PR comments on those PRs. What is the new API you added?
>>> 
>>> -Dan
>>> 
>>> On Wed, Dec 11, 2019 at 2:57 PM Mark Hanson  wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> There was a suggestion that since I am making a couple user visible API
>>>> changes that I might want to ask the dev list.
>>>> 
>>>> Basically I was migrating code from AttributesFactory to RegionFactory
>> and
>>>> hit a few snags along the way.
>>>> 
>>>> Please see https://github.com/apache/geode/pull/4409 <
>>>> https://github.com/apache/geode/pull/4409> specifically Cache.java,
>>>> RegionFactory.java, and for extra credit GemfireCacheImpl.java
>>>> 
>>>> I have commented at the relevant changes.
>>>> 
>>>> Thanks,
>>>> Mark
>> 
>>

Re: [VOTE] Release candidate for Apache Geode version 1.11.0.RC3.

2019-12-11 Thread Mark Hanson

Hi All,

It does not look like we have an assignee for GEODE-7531. Any takers?

Thanks,
Mark

> On Dec 4, 2019, at 2:35 PM, Mark Hanson  wrote:
> 
> So, outstanding issues that I see right now are 
> 
> GEODE-7531
> GEODE-7537
> GEODE-7538
> 
> Thanks,
> Mark
> 
>> On Dec 4, 2019, at 2:11 PM, John Blum  wrote:
>> 
>> This is not a test failure in SDG.  SDG builds fine with Apache Geode 1.11
>> (and all tests pass), as I indicated above in my origin +0 vote.
>> 
>> This is a definitive problem for SBDG when using STDG to mock Apache Geode
>> resources/objects, which is caused by GEODE-7531.
>> 
>> Either way, the design/code changes made in GEODE-6759 were poor and should
>> be fixed regardless which GEODE-7531 describes.
>> 
>> -j
>> 
>> 
>> On Wed, Dec 4, 2019 at 2:04 PM Dan Smith  wrote:
>> 
>>> On Wed, Dec 4, 2019 at 11:16 AM John Blum  wrote:
>>> 
>>>> I am changing my vote to -1!
>>>> 
>>>> I have filed GEODE-7531 <
>>> https://issues.apache.org/jira/browse/GEODE-7531>
>>>> [1],
>>>> which is a serious blocking issue for all things *Spring* for Apache
>>>> Geode.  This issue alone is currently preventing me from upgrading
>>> *Spring
>>>> Boot for Apache Geode* (SBDG) to Apache Geode 1.10, which I plan to do in
>>>> SBDG 1.3, which is based on *Spring Data for Apache Geode* (SDG)
>>>> Neumann/2.3, which is currently already pulling in Apache Geode 1.10,
>>> soon
>>>> to be upgraded to 1.11 once this issue is resolved and 1.11 is available.
>>>> 
>>> 
>>> 
>>> I commented on the above JIRA. While we definitely don't want to break
>>> spring data geode, it looks like maybe the issue is just with one unit test
>>> in Spring Data Geode using an internal geode API to inject something into a
>>> singleton? If that's the case, I think it would be better to fix that one
>>> test in SDG.
>>> 
>>> -Dan
>>> 
>> 
>> 
>> -- 
>> -John
>> Spring Data Team
>

Re: [DISCUSS] Adding a couple user APIs dealing with RegionFactory.

2019-12-13 Thread Mark Hanson

Hi Udo,

I disagree. I believe the currently published best practice is to use 
RegionFactory. As a part of the work I have been doing,  I have been migrating 
code from the AttributesFactory pattern to the RegionFactory pattern. To 
support that, I believe a copy constructor for RegionFactory is necessary. And 
thus a createRegionFactory

Hence my changes.

Thanks,
Mark

> On Dec 11, 2019, at 4:41 PM, Udo Kohlmeyer  wrote:
> 
> I think at this point I'd be looking at the new V2 Management API's for 
> Regions.
> 
> I think any new "public" effort that we'd be adding to the product should be 
> done through the Management API's for Regions, rather than exposing new 
> public API's that in reality should not be made "public".
> 
> --Udo
> 
> On 12/11/19 3:53 PM, Mark Hanson wrote:
>> Basically the point is to allow a use to copy a RegionFactory because under 
>> certain circumstances it is necessary. I found that when migrating from 
>> AttributesFactory.
>> 
>> Thanks,
>> Mark
>> 
>>> On Dec 11, 2019, at 3:48 PM, Anthony Baker  wrote:
>>> 
>>> Mark,
>>> 
>>> Can you share how the API changes will help the user?
>>> 
>>> Thanks,
>>> Anthony
>>> 
>>> 
>>>> On Dec 11, 2019, at 2:57 PM, Mark Hanson  wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> There was a suggestion that since I am making a couple user visible API 
>>>> changes that I might want to ask the dev list.
>>>> 
>>>> Basically I was migrating code from AttributesFactory to RegionFactory and 
>>>> hit a few snags along the way.
>>>> 
>>>> Please see https://github.com/apache/geode/pull/4409 
>>>> <https://github.com/apache/geode/pull/4409> specifically Cache.java, 
>>>> RegionFactory.java, and for extra credit GemfireCacheImpl.java
>>>> 
>>>> I have commented at the relevant changes.
>>>> 
>>>> Thanks,
>>>> Mark

Re: [REQUEST] Squash merge please

2019-12-16 Thread Mark Hanson

I would strongly prefer smaller as small a commit as possible. And as large as 
necessary. I am less partial when it comes to PRs sizes. Sometimes depending on 
what is done in a PR, I don’t think it makes sense to issue a blanket statement 
that all PRs are one commit. I think there is a strong reason to make them one 
commit, but one size does not fit all. A great example is a refactor and a bug 
fix in one PR.

Thanks,
Mark

> On Dec 16, 2019, at 9:47 AM, Kirk Lund  wrote:
> 
> Whenever I submit a PR with multiple commits that I intend to rebase to
> develop, I always try to ensure that each commit stands alone as is
> (compiles and passes tests). Separating file renames and refactoring from
> behavior changes into different commits seems very valuable to me, and I've
> had trouble getting people to review PRs without this separation (but it
> could be squashed as it's merged to develop).
> 
> It sounds to me like the real problem is (a) some PRs have multiple commits
> that don't compile or don't pass tests, and (b) some PRs that should be
> merged with squash are not (by accident most likely).
> 
> I can submit multiple PRs instead of one PR with multiple commits. So I'll
> change my response to -0 if that helps prevent commits to develop that
> don't compile or pass tests. Without preventing rebase or merge commits
> from github, I'm not sure how we can really enforce this or prevent (b)
> above.
> 
> On Fri, Dec 13, 2019 at 3:38 PM Alexander Murmann 
> wrote:
> 
>> I wonder if Kirk's and Naba's statements are necessarily at odds.
>> 
>> Make the change easy (warning: this may be hard), then make the easy
>>> change."
>> 
>> -Kent Beck
>> 
>> Following Kent Beck's advise might reasonably split into two commits. One
>> refactor commit and a separate commit that introduces the actual change.
>> They should be individually revertible and might be easier understood if
>> split out. I vividly remember a change on our code base where someone did a
>> huge amount of refactoring that resulted than in one parameter changing in
>> order to get the desired functionality change. If that was in one commit,
>> it would be hard to see the actual change. If split out, it's beautiful and
>> crystal clear.
>> 
>> I am unsure how that would be reflected in terms of JIRA ticket references.
>> Usually we assume that if there is a commit with the ticket number, the
>> issue is resolved. Maybe the key here is to create a separate refactoring
>> ticket.
>> 
>> Would that allow us to have our cake and eat it too?
>> 
>> On Fri, Dec 13, 2019 at 3:16 PM Nabarun Nag  wrote:
>> 
>>> It is to help with bisect operations when things start failing ... helps
>> us
>>> it revert and build faster.
>>> also with cherry picking features / fixes to previous versions .
>>> And keeping the git history clean with no unnecessary “merge commits”.
>>> 
>>> Regards
>>> Naba
>>> 
>>> 
>>> On Fri, Dec 13, 2019 at 2:25 PM Kirk Lund  wrote:
>>> 
 -1 I really like to sometimes have more than 1 commit in a PR and keep
>>> them
 separate when they merge to develop
 
 On Tue, Oct 22, 2019 at 5:12 PM Nabarun Nag  wrote:
 
> Hi Geode Committers,
> 
> A kind request for using squash commit instead of using merge.
> This will really help us in our bisect operations when a
> regression/flakiness in the product is introduced. We can automate
>> and
>>> go
> through fewer commits faster, avoiding commits like "spotless fix"
>> and
> "re-trigger precheck-in" or other minor commits in the merged branch.
> 
> Also, please use the commit format : (helps us to know who worked on
>>> it,
> what is the history)
> 
> 
> 
> *GEODE-: 
> * explanation line 1* explanation
>> line
>>> 2*
> 
> This is not a rule or anything, but a request to help out your fellow
> committers in quickly detecting a problem.
> 
> For inspiration, we can look into Apache Kafka / Spark where they
>> have
>>> a
> complete linear graph for their main branch HEAD [see attachment]
> 
> Regards
> Naba.
> 
> 
> 
 
>>> 
>>

[VOTE] Adding a couple user APIs dealing with RegionFactory.

2019-12-16 Thread Mark Hanson

Actually, I would say that it would not be necessary to have a copy constructor 
if it were not for the way the tests are written that assume an 
AttributesFactory. I think the discussion boils down to this…

Do we migrate to the RegionFactory API from AttributesFactory or do we wait for 
the Management V2 API. I would heartily support the V2 API, I have used the 
REST version of it and it was great. That said, when will that arrive. 

We currently have a deprecated API and a current (wrapper) API. The Management 
V2 is an expected, but not realized API at this point. 

As a community, I would like to decide if I should revert my changes to 
modernize the test, but going to RegionFactory and just put in the core fixes. 
I can do that. I am fine with that. I just don’t want to sit in the middle. I 
don’t see how I can move to RegionFactory API without a significant test 
restructuring without the copy constructor.

VOTE SUBJECT:

Stop migrating from AttributesFactory to RegionFactory and wait for Management 
V2 API.  

Again, I am 100% fine either way, I just want a clear direction. :)

This would mean reverting my changes to update, (totally fine) and then putting 
in only the fixes. In the near future, we would also stop migrating from 
AttributesFactory to RegionFactory while we wait for the Management V2 API.

Thanks,
Mark

> On Dec 16, 2019, at 10:47 AM, Udo Kohlmeyer  wrote:
> 
> -1 to adding a copy constructor onto any /**Factory*/ classes.
> 
> I have never seen a copy constructor on a Factory pattern in the wild.
> 
> That said, having done some Googling, this is something that people talk 
> about, so it cannot be THAT foreign.
> 
> There is one thing I want to point out here. There is work currently being 
> done on the new Management/Configuration V2 API's. One of the goals of this 
> project is to have a single API to create/update/delete and management the 
> configuration and creation (realization of configuration). I'm advocating 
> that new Management/Configuration v2 API's need to be interacting with it, 
> rather than a user directly. This, adding a "copy constructor" of 
> configuration should be added onto the v2 API's rather than on the 
> /**Factory*/ classes.
> 
> That said, having */*Factories/* as public API's that are use to create a 
> /*Region*/, that is correct for the module. But any utility capability to 
> copy configuration needs to be addressed by the Management API and not the 
> /*Factory*/ classes.
> 
> --Udo
> 
> 
> On 12/13/19 10:01 AM, Darrel Schneider wrote:
>> The v2 management api allows regions to be created remotely in cluster
>> configuration and on running servers. But it does not allow you to create a
>> region on a client. Non-spring applications can use RegionFactory to create
>> the region on the client side.
>> 
>> 
>> On Fri, Dec 13, 2019 at 9:56 AM Dan Smith  wrote:
>> 
>>> +1 to adding a way to copy the RegionAttributes.
>>> 
>>> BTW, I really wish this RegionFactory was an interface. I don't know if
>>> adding a copy constructor makes it harder to migrate to an interface later,
>>> but maybe just having this single public method might be better?
>>> 
>>> +   RegionFactory createRegionFactory(RegionFactory
>>> regionFactory);
>>> 
>>> On Fri, Dec 13, 2019 at 9:35 AM Mark Hanson  wrote:
>>> 
>>>> Hi Udo,
>>>> 
>>>> I disagree. I believe the currently published best practice is to use
>>>> RegionFactory. As a part of the work I have been doing,  I have been
>>>> migrating code from the AttributesFactory pattern to the RegionFactory
>>>> pattern. To support that, I believe a copy constructor for RegionFactory
>>> is
>>>> necessary. And thus a createRegionFactory
>>>> 
>>>> Hence my changes.
>>>> 
>>>> Thanks,
>>>> Mark
>>>> 
>>>>> On Dec 11, 2019, at 4:41 PM, Udo Kohlmeyer 
>>>> wrote:
>>>>> I think at this point I'd be looking at the new V2 Management API's for
>>>> Regions.
>>>>> I think any new "public" effort that we'd be adding to the product
>>>> should be done through the Management API's for Regions, rather than
>>>> exposing new public API's that in reality should not be made "public".
>>>>> --Udo
>>>>> 
>>>>> On 12/11/19 3:53 PM, Mark Hanson wrote:
>>>>>> Basically the point is to allow a use to copy a RegionFactory because
>>>> under certain circumstances it is necessary. I found that when migrating
>&

Re: [VOTE] Adding a couple user APIs dealing with RegionFactory.

2019-12-16 Thread Mark Hanson

It has been said I have a negative vote which is counter intuitive.

VOTE SUBJECT:

Should we continue migrating from AttributesFactory usage to RegionFactory 
usage and merge the RegionFactory copy constructor.


+1 to Migrate to RegionFactory from AttributesFactory and merge the 
RegionFactory copy constructor 
 0  don’t care
-1 stop migrating from AttributesFactory to RegionFactory and wait for 
Management V2 API.  


> On Dec 16, 2019, at 11:08 AM, Mark Hanson  wrote:
> 
> Actually, I would say that it would not be necessary to have a copy 
> constructor if it were not for the way the tests are written that assume an 
> AttributesFactory. I think the discussion boils down to this…
> 
> Do we migrate to the RegionFactory API from AttributesFactory or do we wait 
> for the Management V2 API. I would heartily support the V2 API, I have used 
> the REST version of it and it was great. That said, when will that arrive. 
> 
> We currently have a deprecated API and a current (wrapper) API. The 
> Management V2 is an expected, but not realized API at this point. 
> 
> As a community, I would like to decide if I should revert my changes to 
> modernize the test, but going to RegionFactory and just put in the core 
> fixes. I can do that. I am fine with that. I just don’t want to sit in the 
> middle. I don’t see how I can move to RegionFactory API without a significant 
> test restructuring without the copy constructor.
> 
> VOTE SUBJECT:
> 
> Stop migrating from AttributesFactory to RegionFactory and wait for 
> Management V2 API.  
> 
> Again, I am 100% fine either way, I just want a clear direction. :)
> 
> This would mean reverting my changes to update, (totally fine) and then 
> putting in only the fixes. In the near future, we would also stop migrating 
> from AttributesFactory to RegionFactory while we wait for the Management V2 
> API.
> 
> 
> Thanks,
> Mark
> 
> 
> 
>> On Dec 16, 2019, at 10:47 AM, Udo Kohlmeyer > <mailto:ukohlme...@pivotal.io>> wrote:
>> 
>> -1 to adding a copy constructor onto any /**Factory*/ classes.
>> 
>> I have never seen a copy constructor on a Factory pattern in the wild.
>> 
>> That said, having done some Googling, this is something that people talk 
>> about, so it cannot be THAT foreign.
>> 
>> There is one thing I want to point out here. There is work currently being 
>> done on the new Management/Configuration V2 API's. One of the goals of this 
>> project is to have a single API to create/update/delete and management the 
>> configuration and creation (realization of configuration). I'm advocating 
>> that new Management/Configuration v2 API's need to be interacting with it, 
>> rather than a user directly. This, adding a "copy constructor" of 
>> configuration should be added onto the v2 API's rather than on the 
>> /**Factory*/ classes.
>> 
>> That said, having */*Factories/* as public API's that are use to create a 
>> /*Region*/, that is correct for the module. But any utility capability to 
>> copy configuration needs to be addressed by the Management API and not the 
>> /*Factory*/ classes.
>> 
>> --Udo
>> 
>> 
>> On 12/13/19 10:01 AM, Darrel Schneider wrote:
>>> The v2 management api allows regions to be created remotely in cluster
>>> configuration and on running servers. But it does not allow you to create a
>>> region on a client. Non-spring applications can use RegionFactory to create
>>> the region on the client side.
>>> 
>>> 
>>> On Fri, Dec 13, 2019 at 9:56 AM Dan Smith >> <mailto:dsm...@pivotal.io>> wrote:
>>> 
>>>> +1 to adding a way to copy the RegionAttributes.
>>>> 
>>>> BTW, I really wish this RegionFactory was an interface. I don't know if
>>>> adding a copy constructor makes it harder to migrate to an interface later,
>>>> but maybe just having this single public method might be better?
>>>> 
>>>> +   RegionFactory createRegionFactory(RegionFactory
>>>> regionFactory);
>>>> 
>>>> On Fri, Dec 13, 2019 at 9:35 AM Mark Hanson >>> <mailto:mhan...@pivotal.io>> wrote:
>>>> 
>>>>> Hi Udo,
>>>>> 
>>>>> I disagree. I believe the currently published best practice is to use
>>>>> RegionFactory. As a part of the work I have been doing,  I have been
>>>>> migrating code from the AttributesFactory pattern to the RegionFactory
>>>>> pattern. To support that, I believe a copy constructor for RegionFactory
>>>> i

Re: Propose bringing GEODE-7537 to release/1.11.0

2019-12-16 Thread Mark Hanson

Yes, I agree, but I am waiting on a fix. Eric is supposed to let me know when 
he feels its good to go.

Thanks,
Mark

> On Dec 16, 2019, at 2:32 PM, Owen Nichols  wrote:
> 
> Sorry, the correct ticket is GEODE-7537 
> .  This was on the short 
> list previously posted as fixes 1.11.0 is waiting for.
> 
>> On Dec 16, 2019, at 2:12 PM, Owen Nichols  wrote:
>> 
>> I propose backporting this from develop to the current release branch.
>> 
>> This fix has been on develop since Dec 5 with no issues.  It is critical 
>> because without it, hang in gii/rebalance is possible when persistent AEQ is 
>> enabled.  This issue is believed to have been present since 1.10.0.
>

Re: [VOTE] Adding a couple user APIs dealing with RegionFactory.

2019-12-17 Thread Mark Hanson

Hi All, 

In working with Udo, I moved the needed public API into a helper class. The 
RegionFactory copy constructor still remains in RegionFactory but is protected. 
The only way to use the method now is by creating a helper class that extends 
RegionFactory.

Thanks for you all your help. I think this resolves the discussion.

Thanks,
Mark

> On Dec 16, 2019, at 12:56 PM, Owen Nichols  wrote:
> 
> A -1 vote on a code change should be framed as a “request for change”.  Udo, 
> you’ve made it clear what you don’t want, but not what it would take to make 
> PR #4409 acceptable to you.
> 
> “Management V2 API” is unlikely to solve all problems in the near term, and 
> even to do so, it needs a sound underlying API to build on.  Can you suggest 
> a way to resolve this that makes sense both for the current codebase as well 
> as paving the eventual way for the future?
> 
>> On Dec 16, 2019, at 12:13 PM, Mark Hanson  wrote:
>> 
>> It has been said I have a negative vote which is counter intuitive.
>> 
>> VOTE SUBJECT:
>> 
>> Should we continue migrating from AttributesFactory usage to RegionFactory 
>> usage and merge the RegionFactory copy constructor.
>> 
>> 
>> +1 to Migrate to RegionFactory from AttributesFactory and merge the 
>> RegionFactory copy constructor 
>> 0  don’t care
>> -1 stop migrating from AttributesFactory to RegionFactory and wait for 
>> Management V2 API.  
>> 
>> 
>>> On Dec 16, 2019, at 11:08 AM, Mark Hanson >> <mailto:mhan...@pivotal.io>> wrote:
>>> 
>>> Actually, I would say that it would not be necessary to have a copy 
>>> constructor if it were not for the way the tests are written that assume an 
>>> AttributesFactory. I think the discussion boils down to this…
>>> 
>>> Do we migrate to the RegionFactory API from AttributesFactory or do we wait 
>>> for the Management V2 API. I would heartily support the V2 API, I have used 
>>> the REST version of it and it was great. That said, when will that arrive. 
>>> 
>>> We currently have a deprecated API and a current (wrapper) API. The 
>>> Management V2 is an expected, but not realized API at this point. 
>>> 
>>> As a community, I would like to decide if I should revert my changes to 
>>> modernize the test, but going to RegionFactory and just put in the core 
>>> fixes. I can do that. I am fine with that. I just don’t want to sit in the 
>>> middle. I don’t see how I can move to RegionFactory API without a 
>>> significant test restructuring without the copy constructor.
>>> 
>>> VOTE SUBJECT:
>>> 
>>> Stop migrating from AttributesFactory to RegionFactory and wait for 
>>> Management V2 API.  
>>> 
>>> Again, I am 100% fine either way, I just want a clear direction. :)
>>> 
>>> This would mean reverting my changes to update, (totally fine) and then 
>>> putting in only the fixes. In the near future, we would also stop migrating 
>>> from AttributesFactory to RegionFactory while we wait for the Management V2 
>>> API.
>>> 
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> 
>>> 
>>>> On Dec 16, 2019, at 10:47 AM, Udo Kohlmeyer >>> <mailto:ukohlme...@pivotal.io> <mailto:ukohlme...@pivotal.io 
>>>> <mailto:ukohlme...@pivotal.io>>> wrote:
>>>> 
>>>> -1 to adding a copy constructor onto any /**Factory*/ classes.
>>>> 
>>>> I have never seen a copy constructor on a Factory pattern in the wild.
>>>> 
>>>> That said, having done some Googling, this is something that people talk 
>>>> about, so it cannot be THAT foreign.
>>>> 
>>>> There is one thing I want to point out here. There is work currently being 
>>>> done on the new Management/Configuration V2 API's. One of the goals of 
>>>> this project is to have a single API to create/update/delete and 
>>>> management the configuration and creation (realization of configuration). 
>>>> I'm advocating that new Management/Configuration v2 API's need to be 
>>>> interacting with it, rather than a user directly. This, adding a "copy 
>>>> constructor" of configuration should be added onto the v2 API's rather 
>>>> than on the /**Factory*/ classes.
>>>> 
>>>> That said, having */*Factories/* as public API's that are use to create a 
>>>> /*Region*/, that is correct for the module. But any utility capability to 
>>>

[DISCUSS] Does anyone know of any other issues for 1.11.0?

2019-12-19 Thread Mark Hanson

Hi All,

I believe at this point, that all outstanding issues have been included in 
1.11.0. I would like to do an RC build unless someone advises me otherwise, by 
3pm PST. Everyone will still have a few days to try the release candidate, as 
usual.

Thanks,
Mark

Re: Proposal to including GEODE-7593 in release/1.11.0

2019-12-19 Thread Mark Hanson

I have added the SHA to the branch.

Thanks,
Mark

> On Dec 19, 2019, at 12:25 PM, Ivan Godwin  wrote:
> 
> +1
> 
> On Thu, Dec 19, 2019 at 11:43 AM Dick Cavender  wrote:
> 
>> +1
>> 
>> On Thu, Dec 19, 2019 at 11:27 AM Udo Kohlmeyer  wrote:
>> 
>>> +1
>>> 
>>> On 12/19/19 10:05 AM, Owen Nichols wrote:
 GEODE-7593 fixes a memory leak where indexes could retain references to
>>> pdx values when eviction should have released that memory.
 
 This is not a new issue, but is critical because system stability is
>>> threatened when eviction does not release memory as expected.
 
 The SHA is 1beec9e3930a071031b960f045874fb337e72e7c.
>>> 
>>

Re: [DISCUSS] Proposal to require linear commit history on develop

2019-12-19 Thread Mark Hanson

+1

> On Dec 19, 2019, at 4:27 PM, Jinmei Liao  wrote:
> 
> +1
> 
> On Thu, Dec 19, 2019, 4:05 PM Owen Nichols  wrote:
> 
>> I’d like to advance this topic from an informal request/discussion to a
>> discussion of a concrete proposal.
>> 
>> To recap, it sounds like there is general agreement that commit history on
>> develop should be linear (no merge commits), and that the biggest obstacle
>> to this is that the PR merge button in GitHub creates a merge commit by
>> default.
>> 
>> I propose the following changes:
>> 1) Change GitHub settings to remove the ability to create a merge commit.
>> This will make Squash & Merge the default.
>> 
>> 2) Change GitHub settings to require linear history on develop.  This
>> prevents merge commits via command-line (not recommended, but wiki still
>> has instructions for merging PRs this way).
>> 
>> 3) Update the PR template to change the text "Is your initial contribution
>> a single, squashed commit” to “Is your initial contribution squashed for
>> ease of review (e.g. a single commit, or a ‘refactoring’ commit plus a
>> ‘fix’ commit)"
>> 
>> For clarity, I am proposing no-change in the following areas:
>> i) Leave Rebase & Merge as an option for PRs that have been structured to
>> benefit from it (this can make it easier in a bisect to see whether the
>> refactoring or the “fix” broke something).
>> ii) Leave existing wording in the wiki as-is [stating that squashing is
>> preferred].
>> 
>> 
>> Please comment via this email thread.
>> -Owen
>> 
>> 
>> 
>>> On Dec 16, 2019, at 10:49 AM, Kirk Lund  wrote:
>>> 
>>> I think it's already resolved Udo ;)
>>> 
>>> Here's the problem, if I fixup a dunit test by removing all uses of
>> "this."
>>> and I rename the dunit test, then git doesn't remember that the file is a
>>> rename -- it forever afterwards interprets it as a new file that I
>> created
>>> if I touch more than 50% of the lines (which "this." can easily do). If
>> we
>>> squash two commits: the rename and the cleanup of that dunit test -- then
>>> we effectively lose the history of that file and it shows that I created
>> a
>>> new file.
>>> 
>>> Also for the record, I've been working on Geode since the beginning and I
>>> was never made aware of this change in our process. I never voted on it.
>>> I'm not a big fan of changing various details in our process every single
>>> week. It's very easy to miss these discussions unless someone points it
>> out
>>> to me.
>>> 
>>> On Mon, Dec 16, 2019 at 10:34 AM Udo Kohlmeyer 
>>> wrote:
>>> 
 I'm not sure what this discussion is about... WE, as a community, have
 agreed in common practices, in two place no less...
 
 1) Quoting our PR template
 
 
 For all changes:
 
 *
 
   Is there a JIRA ticket associated with this PR? Is it referenced in
   the commit message?
 
 *
 
   Has your PR been rebased against the latest commit within the target
   branch (typically|develop|)?
 
 *
 
   ***Is your initial contribution a single, squashed commit?*
 
 *
 
   Does|gradlew build|run cleanly?
 
 *
 
   Have you written or updated unit tests to verify your changes?
 
 *
 
   If adding new dependencies to the code, are these dependencies
   licensed in a way that is compatible for inclusion underASF 2.0
   ?
 
 On our PR template we call out that the initial PR commit should be
 squashed.
 
 2)https://cwiki.apache.org/confluence/display/GEODE/Code+contributions
 
 -- See "Accepting a PR Using the Command Line" - Point #3.
 
 
 @Kirk, if each of your commits "stands alone", I commend you on the
 diligence, but in reality, they should either then be stand alone PR's
 or just extra work created for yourself.
 
 If we want to change the way we have agreed upon we submit/commit/merge
 changes back into develop... Then this is another discussion thread,
 until then, I think we should all remind ourselves on our agreed
 contributions code of conduct.
 
 --Udo
 
 On 12/16/19 9:59 AM, Nabarun Nag wrote:
> Kirk, I believe that creating a Pull Request with multiple commits is
>> ok.
> It's just in the end that when it's being pushed to develop branch, it
> needs to be squash merged. I believe that is what you have mentioned in
 the
> first paragraph, and I am more than happy with that.
> If you can see in the first screenshot comparison that I had attached
>> in
> the first email in this thread is what I want to avoid.
> 
> Thank you for your feedback.
> 
> Regards
> Naba
> 
> 
> On Mon, Dec 16, 2019 at 9:47 AM Kirk Lund  wrote:
> 
>> Whenever I submit a PR with multiple commits that I intend to rebase
>> to
>> develop,

[VOTE] Apache Geode 1.11.0.RC4

2019-12-20 Thread Mark Hanson

Subject: [VOTE] Apache Geode 1.11.0.RC4
Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.11.0.RC4.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you performed.

Voting deadline:
3PM PST Wed, December 26 2019.

Please note that we are voting upon the source tag:
rel/v1.11.0.RC4

Release notes:
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.11.0

Source and binary distributions:
https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC4/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachegeode-1064

GitHub:
https://github.com/apache/geode/tree/rel/v1.11.0.RC4
https://github.com/apache/geode-examples/tree/rel/v1.11.0.RC4
https://github.com/apache/geode-native/tree/rel/v1.11.0.RC4

Pipelines:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-rc

Geode's KEYS file containing PGP keys we use to sign the release:
https://github.com/apache/geode/blob/develop/KEYS

Command to run geode-examples:
./gradlew 
-PgeodeReleaseUrl=https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC4 
-PgeodeRepositoryUrl=https://repository.apache.org/content/repositories/orgapachegeode-1064
 build runAll

Regards
Mark Hanson

Re: [VOTE] Apache Geode 1.11.0.RC4

2019-12-20 Thread Mark Hanson

Given it is the holidays, perhaps more time is in order. I am bumping the 
voting deadline to Friday, December 27th, 2019.

Thanks,
Mark



> On Dec 20, 2019, at 2:46 PM, Mark Hanson  wrote:
> 
> Subject: [VOTE] Apache Geode 1.11.0.RC4
> Hello Geode Dev Community,
> 
> This is a release candidate for Apache Geode version 1.11.0.RC4.
> Thanks to all the community members for their contributions to this release!
> 
> Please do a review and give your feedback, including the checks you performed.
> 
> Voting deadline:
> 3PM PST Friday, December 27 2019.
> 
> Please note that we are voting upon the source tag:
> rel/v1.11.0.RC4
> 
> Release notes:
> https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.11.0
> 
> Source and binary distributions:
> https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC4/
> 
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachegeode-1064
> 
> GitHub:
> https://github.com/apache/geode/tree/rel/v1.11.0.RC4
> https://github.com/apache/geode-examples/tree/rel/v1.11.0.RC4
> https://github.com/apache/geode-native/tree/rel/v1.11.0.RC4
> 
> Pipelines:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-rc
> 
> Geode's KEYS file containing PGP keys we use to sign the release:
> https://github.com/apache/geode/blob/develop/KEYS
> 
> Command to run geode-examples:
> ./gradlew 
> -PgeodeReleaseUrl=https://dist.apache.org/repos/dist/dev/geode/1.11.0.RC4 
> -PgeodeRepositoryUrl=https://repository.apache.org/content/repositories/orgapachegeode-1064
>  build runAll
> 
> Regards
> Mark Hanson
>

Re: [VOTE] Apache Geode 1.11.0.RC4

2019-12-27 Thread Mark Hanson

Hi All,

It seems that we have the 3 PMC member votes needed plus extra votes, so I will 
begin the release process for Apache Geode 1.11.0.RC4, once cwiki.apache.org 
 comes back online.
Final tallies appear to be.
  
 5 +1 (Yes, release it.) 
 1 -0 (Releasing is OK, but…)

Thanks,
Mark

> On Dec 27, 2019, at 2:08 PM, Dan Smith  wrote:
> 
> This is the most recent fix on develop for the benchmark issues. Also tends
> to indicate that the tests are flaky.
> 
> commit 278c2470a5cac2c332d13914b935f0618b820a91
> Author: Helena Bales 
> Date:   Mon Dec 16 11:16:55 2019 -0800
> 
>GEODE-7554: Add retry mechanism for failed tests (#4461)
> 
>* GEODE-7554: Add retry mechanism for failed tests
> 
>There are still some flaky benchmarks. Retry tests that have failed, up
>to 5 times, to determine if the failure is legitimate or just a flaky
>test. If the test fails 5 times in a row, we know that it is a
>legitimate failure.
> 
> On Fri, Dec 27, 2019 at 1:44 PM Dan Smith  wrote:
> 
>> On Fri, Dec 27, 2019 at 1:28 PM Owen Nichols  wrote:
>> 
>>> It looks like the Benchmark <
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-release-1-11-0-main/jobs/Benchmark/builds/11>
>>> job in the release pipeline has failed for the last 4 runs.  At various
>>> times it has flagged PartitionedGet, PartitionedPutAllLong,
>>> PartitionedFunctionExecution, PartitionedNonIndexedQuery, and
>>> ReplicatedPutAllBenchmark (2x), but there is no discernible pattern, so
>>> maybe this is random noise?
>>> 
>>> Random noise is what it looks like to me. Those same tests you mention
>> look better than 1.10 in some runs and worse in others. We make the
>> benchmark job on develop not block other jobs recently because it was too
>> noisy.
>>

Re: [DISCUSS] Proposal to require linear commit history on develop

2019-12-30 Thread Mark Hanson

This change to disable all but squash-merge would be really easy to revert. How 
about we try it for a while and see? If people decide it is really limiting 
them, we can change it back. Let’s do it for 1 month and see how it goes. Does 
that sound reasonable?

Thanks,
Mark

> On Dec 30, 2019, at 5:25 PM, Owen Nichols  wrote:
> 
> Given that we adopted 
> 
>  and still wish to continue 
> 
>  having branch protection rules to ensure every commit landing in develop has 
> passed the required PR checks, it seems like that decision alone mandates 
> that we disable all merge mechanisms other than squash-and-merge.
> 
> Allowing merge commits or rebase merges bypasses branch protection for all 
> commits other than the final one in the merge or rebase set.  Given that we 
> decided 
> 
>  that bypassing PR checks should never be allowed, keeping this loophole open 
> seems untenable.
> 
> This is not just hypothetical — this loophole is causing real problems.  We 
> now have commits on develop that don’t compile.  For example:
> git checkout 19eee7821322a1301f16bdcd31fd3b8c872a41b6
> ./gradlew devBuild
> ...spotlessJava FAILED
> We implemented branch protections to make this impossible, right?  
> 
> We can very easily close this loophole by disabling all but the Squash&Merge 
> button for PRs.  This will not make more work for any developer.  If you want 
> to get multiple commits onto develop, simply submit each as a separate PR — 
> that is the only way to assure that required PR checks have passed for each.
> 
> On the other hand, if we as a Geode community feel strongly that bypassing 
> branch protection via merge commits and rebase commits is important to allow, 
> why not also allow arbitrary overrides (or get rid of branch protection 
> entirely)?
> 
> -Owen
> 
>> On Dec 20, 2019, at 12:31 PM, Blake Bender  wrote:
>> 
>> Just FWIW, the situation described of having multiple commits in a single
>> PR with separate associated JIRA tickets is still kind of problematic.  It
>> could well be the case that the commits are interdependent, thus when
>> bisecting etc it's still not possible to revert the fix for a single
>> bug/feature/whatever atomically.  It's all good, though, I'm satisfied no
>> one's forcing me to adopt practice I'm opposed to.  Apologies for getting
>> my feathers a little ruffled, or if I ruffled anyone else's in return.
>> 
>> Thanks,
>> 
>> Blake
>> 
>> 
>> On Fri, Dec 20, 2019 at 12:18 PM Nabarun Nag  wrote:
>> 
>>> Hi Dan,
>>> 
>>> When we do squash merge all the commit messages are preserved and also the
>>> co-authored tag works when we do squash merge.
>>> So the authorship and history are preserved.
>>> 
>>> In my own personal experience and reverts and pinpointing regression
>>> failures are tough when the commits are spread out. Also, reverts are
>>> easier when it is just one commit while we are bisecting failures.
>>> 
>>> 
>>> Regards
>>> Naba
>>> 
>>> On Fri, Dec 20, 2019 at 12:07 PM Dan Smith  wrote:
>>> 
 I'll change to -0.
 
 I think merge commits are a nice way to record history in some cases, and
 can also be a way to avoid messy conflicts that come with rebase. Merge
 commits also preserve authorship of commits (compared to squash-merge),
 which I think is valuable as an open source community that is trying to
 keep track of outside contributions.
 
 That said, if the rest of y'all feel it will help to disable the button,
>>> I
 won't stand in the way.
 
 -Dan
 
 On Fri, Dec 20, 2019 at 11:50 AM Anthony Baker 
>>> wrote:
 
> Whether we are talking about the geode/ repository or the geode-native/
> repository we are all one GEODE community.
> 
> The idea of a native client team may matter in some contexts but in
>>> this
> space we are all GEODE contributors.
> 
> Adopting a common approach across our repos will make it easier for new
> contributors to engage, learn our norms, and join our efforts.
> 
> $0.02,
> Anthony
> 
> 
>> On Dec 20, 2019, at 11:32 AM, Blake Bender 
>>> wrote:
>> 
>> Is this a policy the native client team must abide by, as well?  It
> varies
>> slightly with what we've been doing, and all other things being
>>> equal I
> see
>> no reason for us to change that.  If I am to have any measure of
 control
>> over the nc repository, I will definitely enforce a 1:1
>>> correspondence
>> between commits to develop and JIRA tickets.  IMO, if your
>>> refactoring
> in a
>> PR is sufficiently large or complex that it's difficult to tease i

[ANNOUNCE] Apache Geode 1.11.0

2019-12-31 Thread Mark Hanson

The Apache Geode community is pleased to announce the availability of
Apache Geode 1.11.0.

Apache Geode is a data management platform that provides a database-like
consistency model, reliable transaction processing and a shared-nothing
architecture to maintain very low latency performance with high concurrency
processing.

Geode 1.11.0 contains a number of improvements and bug fixes.

For the full list of changes please review the release
notes:https://cwiki.apache.org/confluence/display/GEODE/
Release+Notes#ReleaseNotes-1.11.0

The release artifacts can be downloaded from the project
website:http://geode.apache.org/releases/

The release documentation is available
at:http://geode.apache.org/docs/guide/111/about_geode.html

We would like to thank all the contributors that made the release possible.
Regards,
Mark Hanson on behalf of the Apache Geode team

[DISCUSS] What should we do with @Ignore tests?

2019-12-31 Thread Mark Hanson

Hi All,

As part of what I am doing to fix flaky tests, I periodically come across tests 
that are @Ignore’d. I am curious what we would like to do with them generally 
speaking. We could fix them, which would seem obvious, but we are struggling to 
fix flaky tests as it is.  We could delete them, but those tests were written 
for a reason (I hope).  Or we could leave them. This pollutes searches etc as 
inactive code requiring upkeep at least.

I don’t have an easy answer. Some have suggested deleting them. I tend to lean 
that direction, but I thought I would consult the community for a broader 
perspective.

Thanks,
Mark

Re: [DISCUSS] What should we do with @Ignore tests?

2019-12-31 Thread Mark Hanson

Hi Naba,

While I think what you are suggesting sounds reasonable,  I think what you are 
proposing is a more painful process then leaving them in.  I am encountering 
maybe two of them at once when addressing a flaky test. If we want to do big 
bulk removes then  the burden of research becomes less likely to happen.  Just 
a thought.

Thanks,
Mark

Sent from my iPhone

> On Dec 31, 2019, at 6:31 PM, Nabarun Nag  wrote:
> 
> +1 to Dan's suggestions.
> 
> - Remove in batches.
> - Send review requests for those PRs to relevant committers (authors of
> those tests etc.)
> - A brief explanation on why these tests are being deleted, and there is no
> loss of test coverage as it is covered by these other tests (or some other
> reason).
> 
> Regards
> Nabarun Nag
> 
>> On Tue, Dec 31, 2019 at 5:32 PM Dan Smith  wrote:
>> 
>> Some of these test have been ignored for a long time. However, looking at
>> the history, I see we have ignored some tests as recently as in the last
>> month, for what seem like some questionable reasons.
>> 
>> I'm concerned that this could be a two step process to losing test coverage
>> - someone who things the test is still valuable but intends to fix it later
>> ignores it, and then someone else deletes it.
>> 
>> So what I would suggest is that if we are going to delete them, let's do it
>> in batches in get folks that have context on the code being tested to
>> review the changes. Make sense?
>> 
>> Also +1 to not ignoring any more tests - it would be nice to get down to 0
>> Ignored tests and enforce that!
>> 
>> -Dan
>> 
>> On Tue, Dec 31, 2019 at 4:52 PM Aaron Lindsey 
>> wrote:
>> 
>>> I’m in favor of deleting all except the ones that have JIRA tickets open
>>> for them, like Bruce said.
>>> 
>>> Also going forward I’d like to see us not be checking in @Ignored tests —
>>> just delete them instead. If we need to get it back we have revision
>>> history. Just my two cents.
>>> 
>>> Aaron
>>> 
>>>> On Dec 31, 2019, at 2:53 PM, Bruce Schuchardt 
>>> wrote:
>>>> 
>>>> I agree with deleting @Ignored tests except for the few that have JIRA
>>> tickets open for them.  There are less than 1/2 dozen of these and we
>>> should consider keeping them since we have a way of tracking them.
>>>> 
>>>> On 12/31/19 2:07 PM, Alexander Murmann wrote:
>>>>> Here are a few things that are true for me or I believe are true in
>>> general:
>>>>> 
>>>>>   - Our test suite is more flaky than we'd like it to be
>>>>>   - I don't believe that adding more Unit tests that follow existing
>>>>>   patterns buys us that much. I'd rather see something similar to
>> what
>>> some
>>>>>   folks are doing with Membership right now where we isolate the code
>>> and
>>>>>   test it more systematically
>>>>>   - We have other testing gaps: We have benchmarks 👏🎉, but we are
>>> still
>>>>>   lacking coverage in that ares; our community is still lacking HA
>>> tests. I'd
>>>>>   rather fill those than bring back old DUnit tests that are chosen
>>> somewhat
>>>>>   at random.
>>>>>   - I'd rather be deliberate about what tests we introduce than
>>> wholesale
>>>>>   bring back a set of tests, since any of these re-introduced tests
>>> has a
>>>>>   potential to be flaky. Let's make sure our tests carry their
>> weight.
>>>>>   - If we delete these tests, we can always go back to a SHA from
>> today
>>>>>   and bring them back at a later date
>>>>>   - These tests have been ignored since a very long time and we've
>>> shipped
>>>>>   without them and nobody has missed them enough to bring them back.
>>>>> 
>>>>> Given all the above, my vote is for less noise in our code, which
>> means
>>>>> deleting all ignored tests. If we want to keep them, I'd love to hear
>> a
>>>>> plan of action on how we bring them back. Having a bunch of dead code
>>> helps
>>>>> nobody.
>>>>> 
>>>>> On Tue, Dec 31, 2019 at 1:50 PM Mark Hanson 
>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> As part of what I am doing to fix flaky tests, I periodically come
>>> across
>>>>>> tests that are @Ignore’d. I am curious what we would like to do with
>>> them
>>>>>> generally speaking. We could fix them, which would seem obvious, but
>>> we are
>>>>>> struggling to fix flaky tests as it is.  We could delete them, but
>>> those
>>>>>> tests were written for a reason (I hope).  Or we could leave them.
>> This
>>>>>> pollutes searches etc as inactive code requiring upkeep at least.
>>>>>> 
>>>>>> I don’t have an easy answer. Some have suggested deleting them. I
>> tend
>>> to
>>>>>> lean that direction, but I thought I would consult the community for
>> a
>>>>>> broader perspective.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mark
>>> 
>>> 
>>

Re: [DISCUSS] Proposal to require linear commit history on develop

2020-01-02 Thread Mark Hanson

;>> other branches). It could be done, but is it worth the hassle?
>>>>>> 
>>>>>> One more point about rebase-and-merge is that it seems like this
>>> would
>>>>>> make bisecting a failure easier since there would be smaller
>> commits.
>>>>>> 
>>>>>> Aaron
>>>>>> 
>>>>>>> On Dec 31, 2019, at 5:41 PM, Owen Nichols 
>>>> wrote:
>>>>>>> 
>>>>>>> Can you elaborate on why you would have to deal with conflicts if
>>> the
>>>>>>> refactoring work was kept as a separate PR from the fix?
>>>>>>> 
>>>>>>> As with many git workflows, there’s an easy way and a hard way to
>>>>> manage
>>>>>>> interdependent PRs (tl;dr only merge, never rebase!). I wonder if
>>>> this
>>>>>>> points out an opportunity for a “tips and tricks” page on the
>> Geode
>>>>> wiki.
>>>>>>> 
>>>>>>> On Tue, Dec 31, 2019 at 5:22 PM Aaron Lindsey <
>>>> aaronlind...@apache.org
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> -0.9
>>>>>>>> 
>>>>>>>> I’m not in favor of the revised proposal that disallows
>>>>>> rebase-and-merge.
>>>>>>>> Say I am working on a PR and I have a refactor commit and
>> another
>>>>> commit
>>>>>>>> which implements a new feature. I don’t want those commits to
>> get
>>>>>> squashed
>>>>>>>> because that makes it hard to understand the diff. However, if I
>>>> make
>>>>>> those
>>>>>>>> commits as two separate PRs then I am going to have to deal with
>>>>>> conflicts.
>>>>>>>> 
>>>>>>>> I’m not sure when we made it a rule that every commit in develop
>>> had
>>>>> to
>>>>>>>> compile and pass tests. I know we implemented a rule that all
>> PRs
>>>> had
>>>>> to
>>>>>>>> pass certain checks, but I never thought that rule implied all
>>>> commits
>>>>>> had
>>>>>>>> to pass those checks.
>>>>>>>> 
>>>>>>>> In general I just don’t see the problem with rebase-and-merge
>> and
>>>> this
>>>>>>>> feels like an unnecessary restriction, but I will go with it if
>>>> that’s
>>>>>> what
>>>>>>>> everyone wants to do.
>>>>>>>> 
>>>>>>>> Aaron
>>>>>>>> 
>>>>>>>>> On Dec 31, 2019, at 3:09 PM, Owen Nichols >> 
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> To recap, this proposal is now revised to remove 2 of the 3
>> merge
>>>>>> options
>>>>>>>>> from GitHub, leaving only Squash and Merge.  PR #4513
>>>>>>>>> <https://github.com/apache/geode/pull/4513> serves as an
>> exhibit
>>>> of
>>>>>>>> what is
>>>>>>>>> proposed (it is not to be merged unless discussion leads to
>>>> consensus
>>>>>>>> and a
>>>>>>>>> successful vote).  Please use the dev list (not the PR) for all
>>>>>>>> discussion
>>>>>>>>> (and voting, if we get that far).
>>>>>>>>> 
>>>>>>>>> Squash and merge is already used almost exclusively by the
>> Geode
>>>>>>>> community,
>>>>>>>>> with any exceptions tending to be accidental more often than
>>>>>> intentional.
>>>>>>>>> However, some have raised the concern that implementing this
>>>>>> restriction
>>>>>>>>> could result in harm or wasted time.  Can someone give an
>> example
>>>> of
>>>>> a
>>>>>>>> such
>>>>>>>>> a scenario?
>>>>>>>>> 
>>>>>>>>> It seems there is a divide here between junior and senior
>>> community
>>

[DISCUSS] Someone to update 3rd-party libraries used by GEODE

2020-01-03 Thread Mark Hanson

Hello Apache Geode Community, 

It is time to update the 3rd-party libraries used by GEODE. To get that done, I 
am requesting a volunteer to take on the responsibility.

This consists of updates going through the libraries we depend on and updating 
them to the latest version that works with our code.

We would need to get this done within the next week or two, so that we have 
time to shake out issues before the next release.

Regards,
Mark Hanson on behalf of the Apache Geode team

Re: [DISCUSS] What should we do with @Ignore tests?

2020-01-03 Thread Mark Hanson

-- I guess we should reenable this test and see
>> how flaky it is?! After all that's what it says to do.
>> 
>> I quick glance over all of the results of $ git grep "@Ignore" shows that
>> the above is pretty much the sort of things we're looking at.
>> 
>> I'm 100% against deleting them. If everyone else wants to delete them, then
>> I propose reassigning me to do nothing but fix ALL @Ignored tests and I'll
>> jump on it.
>> 
>> And just to be very clear:
>> 
>> -1 to all proposals to delete all tests with @Ignore (I can't think of any
>> argument to change me to -0)
>> 
>> -Kirk
>> 
>> On Tue, Dec 31, 2019 at 2:08 PM Alexander Murmann 
>> wrote:
>> 
>>> Here are a few things that are true for me or I believe are true in
>>> general:
>>> 
>>>   - Our test suite is more flaky than we'd like it to be
>>>   - I don't believe that adding more Unit tests that follow existing
>>>   patterns buys us that much. I'd rather see something similar to what
>>> some
>>>   folks are doing with Membership right now where we isolate the code
>> and
>>>   test it more systematically
>>>   - We have other testing gaps: We have benchmarks 👏🎉, but we are
>> still
>>>   lacking coverage in that ares; our community is still lacking HA
>> tests.
>>> I'd
>>>   rather fill those than bring back old DUnit tests that are chosen
>>> somewhat
>>>   at random.
>>>   - I'd rather be deliberate about what tests we introduce than
>> wholesale
>>>   bring back a set of tests, since any of these re-introduced tests has
>> a
>>>   potential to be flaky. Let's make sure our tests carry their weight.
>>>   - If we delete these tests, we can always go back to a SHA from today
>>>   and bring them back at a later date
>>>   - These tests have been ignored since a very long time and we've
>> shipped
>>>   without them and nobody has missed them enough to bring them back.
>>> 
>>> Given all the above, my vote is for less noise in our code, which means
>>> deleting all ignored tests. If we want to keep them, I'd love to hear a
>>> plan of action on how we bring them back. Having a bunch of dead code
>> helps
>>> nobody.
>>> 
>>> On Tue, Dec 31, 2019 at 1:50 PM Mark Hanson  wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> As part of what I am doing to fix flaky tests, I periodically come
>> across
>>>> tests that are @Ignore’d. I am curious what we would like to do with
>> them
>>>> generally speaking. We could fix them, which would seem obvious, but we
>>> are
>>>> struggling to fix flaky tests as it is.  We could delete them, but
>> those
>>>> tests were written for a reason (I hope).  Or we could leave them. This
>>>> pollutes searches etc as inactive code requiring upkeep at least.
>>>> 
>>>> I don’t have an easy answer. Some have suggested deleting them. I tend
>> to
>>>> lean that direction, but I thought I would consult the community for a
>>>> broader perspective.
>>>> 
>>>> Thanks,
>>>> Mark
>>> 
>> 
> 
> 
> -- 
> -John
> Spring Data Team

Release Manager for 1.12 (February 3rd)

2020-01-14 Thread Mark Hanson

Hello All,

It is that time again. It is time to cut a new release branch for 1.12 on 
February 3rd.

We need a volunteer! No experience required. Committer status would be helpful, 
but not required.

In the mean time, we should focus on ensuring the CI is stable and start 
planning to cut the branch.
I would recommend our handy wiki as a great place to see what you are  in for 
should you choose to volunteer (please do)
https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode 


If we could have a volunteer by the end of the week, that would be amazing!

Thanks,
Mark

Re: [DISCUSS] include geode-benchmarks in 1.12 release

2020-01-15 Thread Mark Hanson

Just my two cents.

I think that we should probably strip CI into a separate repo. The key 
indicator is that if something were wrong in the CI yaml, would I hold a 
release for that? I think no. So that suggests to me it is a separate thing. 
Same goes for benchmarks. If we were failing a benchmark I would be concerned, 
but if the script were broken, would I hold the release? I think no as well.

I think that says that the CI code should also be a separate repo.

Thanks,
Mark

> On Jan 14, 2020, at 10:21 PM, Jacob Barrett  wrote:
> 
> Until someone outside of the geode ci community is asking for it I just don’t 
> see utility in going through the motions of making a release for it. 
> 
>> On Jan 14, 2020, at 10:13 PM, Owen Nichols  wrote:
>> 
>> The source is already public, so on some level a source release is no 
>> different from a git tag.  Benchmarks has matured enough that I think it 
>> makes sense to at least start branching and tagging the geode-benchmarks 
>> repo to capture exactly what was used in that Geode release.
>> 
>> Others in the dev and user community may find the benchmarks useful in other 
>> ways than we use them.  While our focus for CI is on tuning for 
>> repeatability, someone else might just want a load generator to break in a 
>> new cluster or get some rough numbers.  Some might want to get under the 
>> hood and tinker and tune, or contribute their own benchmarks, with the 
>> understanding that it’s not a turnkey or standalone product, but a tool that 
>> requires getting your hands dirty.
>> 
>> Would a “1 page” readme with a few tips on “how to run on a laptop” be 
>> enough to let other interested contributors help get geode-benchmarks to a 
>> “better state”?
>> 
>>> On Jan 14, 2020, at 9:38 PM, Jacob Barrett  wrote:
>>> 
>>> I don’t think the benchmarks provide any material benefit to a user in 
>>> their current state. They are heavily tuned for our CI process which relies 
>>> on very beefy machines. Usage on other hardware will require more tuning. I 
>>> don’t think it’s worth putting the source in the release until they are in 
>>> a better state.
>>> 
>>> -Jake
>>> 
>>> 
> On Jan 14, 2020, at 4:14 PM, Dan Smith  wrote:
 
 On Tue, Jan 14, 2020 at 4:11 PM Owen Nichols  wrote:
 
> I believe the desire is to include the source code for geode-benchmarks as
> part of the official geode release, much like how we include 
> geode-examples.
> 
 
 Oh! I thought you meant running the benchmarks in the release pipeline - I
 think last release we were running them but decided they were too flaky to
 use.
 
 +1 to including the benchmark source in the source release.
 
 -Dan
>>

A Release Hero/Manager is needed for 1.12 (February 3rd)

2020-01-15 Thread Mark Hanson

Just a reminder,  still looking for that release manager…  Please don’t all 
volunteer at once!

Glory and street cred await the intrepid release manager volunteer.

Hello All,

It is that time again. It is time to cut a new release branch for 1.12 on 
February 3rd.

We need a volunteer! No experience required. Committer status would be helpful, 
but not required.

In the mean time, we should focus on ensuring the CI is stable and start 
planning to cut the branch.
I would recommend our handy wiki as a great place to see what you are  in for 
should you choose to volunteer (please do)
https://cwiki.apache.org/confluence/display/GEODE/Releasing+Apache+Geode 


If we could have a volunteer by the end of the week, that would be amazing!

Thanks,
Mark

Release Manager for 1.12.0: We have a volunteer!

2020-01-17 Thread Mark Hanson

Hi All,

Ernest Burghardt (@echobravopapa on github) has volunteered to take on the 
Release Manager responsibilities for 1.12.0.

Thank you Ernie!

Best regards,
Mark

Question regarding a ConnectionPoolDUnitTest failure

2020-01-22 Thread Mark Hanson

Hi All, 

I could use a little help understanding the expected behavior for a test case.

We are registering for events and we are getting destroys for destroys, creates 
for creates etc. However, in the case of a recreate, it seems like we are 
getting an entry event of LOCAL_LOAD_CREATE and some INVALIDATE's as well.

I am not sure why that this the case and 
ConnectionPoolDUnitTest.test014InvalidateAndDestroyPropagation seems to have 
the expectation of only getting creates as well, which it mostly does. 
Sometimes though it gets these unexpected invalidates.

An example is this 
EntryEventImpl[op=INVALIDATE;region=/root/test014InvalidateAndDestroyPropagation;key=2;callbackArg=null;originRemote=true;originMember=612f3aa1e54b(:loner):53030:faeb21ce;callbacksInvoked;id=EventID[id=25
 bytes;threadID=2;sequenceID=556];isFromServer],

Note the callbackArg=null, this is odd, usually it is populated.

Any help someone could provide would be a great help. This seems like a bug.

Thanks,
Mark

Re: [Vote] Include GEODE-7752 into 1.12

2020-02-05 Thread Mark Hanson

Hi Ernie,

Thanks for the heads up. This is breaking DistributedTestOpenJDK and Udo 
confirmed the are of concern and is going to revert this.

Thanks,

Mark

> On Feb 5, 2020, at 4:14 PM, Ernest Burghardt  wrote:
> 
> Udo,
> the PR has some failing tests that are in the "non-mandatory" category for
> merge, TBH I'd prefer to have solid green changes being added at this point
> forward...
> 
> Thanks,
> EB
> 
> On Wed, Feb 5, 2020 at 3:01 PM Alexander Murmann 
> wrote:
> 
>> Udo, you say "refactor", but this sounds more like a feature or user facing
>> improvement. Do you mind elaborating a little more why this should be in
>> this release?
>> 
>> On Wed, Feb 5, 2020 at 1:54 PM Patrick Johnson >> 
>> wrote:
>> 
>>> +1
>>> 
>>> On 2/5/20, 1:53 PM, "Udo Kohlmeyer"  wrote:
>>> 
>>>Hi there Geode dev,
>>> 
>>>I would like to request that GEODE-7752
>>>(7028f601680fee3f57cbdff63951128d7180ca13) gets included into 1.12.
>>> 
>>>This piece of code is a refactor of work done in GEODE-7715. This
>>>refactor specializes Builders and simplifies Builder behavior for
>>> better
>>>user experience.
>>> 
>>>--Udo
>>> 
>>> 
>>> 
>>> 
>>

[Question] I had a PR that didn't run a StressNewTestOpenJDK11(really), but passed?

2020-02-12 Thread Mark Hanson

Hi,

Why is that?

It gave the error message in the log of "
17:26:13 66 changed tests
17:26:13 66 is too many changed tests to stress test. Allowing this job to pass 
without stress testing.”

I didn’t touch 66 files. I touched 5 classes. It maybe that there are 66 tests 
therein, but if that is the  bar, StressNewTest is not very useful. I never 
have one new @Test that needs stress testing, I have been refactoring whole 
classes. This seems like a bug.

Thanks,
Mark

[PROPOSAL] StressNewTest in Pull Request Pipeline to be made optional.

2020-02-28 Thread Mark Hanson

Hi All,

Proposal: Force StressNewTest to fail a change with 25 or more files rather 
than pass it without running it.

Currently, the StressNewTest job in the pipeline will just pass a job that has 
more than 25 files changed. It will be marked as green with no work done. There 
are reasons, relating to run time being too long to be tracked by concourse, so 
we just let it through as a pass. I think this is a bad signal. I think that 
this should automatically be a failure in the short term. As a result, I also 
think it should not be required. It is a bad signal, and I think that by making 
it a fail, this will at least not give us a false sense of security. I 
understand that this opens the flood gates so to speak, but I don’t think as a 
community it is a big problem because we can still say that you should not 
merge if the StressNewTest fails because of your test.

I personally find the false sense of security more troubling than anything. 
Hence the reason, I would like this to be

Let me know what you think..

Thanks,
Mark

Re: [PROPOSAL] StressNewTest in Pull Request Pipeline to be made optional.

2020-02-28 Thread Mark Hanson

Just to make sure we are clear, I am not suggesting that we disable 
stressnewtest, but that we make it not required. It would still run and provide 
feedback, but it would not give us an unwarranted green in my approach.

> On Feb 28, 2020, at 10:49 AM, Ju@N  wrote:
> 
> +1 to what Owen said, I don't think disabling *StressNewTest* is a
> good idea.
> 
> On Fri, 28 Feb 2020 at 18:35, Owen Nichols  wrote:
> 
>> -1 to making StressNew not required
>> 
>> +1 to eliminating the current loophole — StressNew should never give a
>> free pass.
>> 
>> Any time your PR is having trouble passing StressNew, please bring it up
>> on the dev list. We can review on a case-by-case basis and decide whether
>> to try increasing the timeout, changing the repeat count, refactoring the
>> PR, or as an absolute last resort requesting authorization for an override
>> (for example, a change to spotless rules might touch a huge number of files
>> but carry no risk).
>> 
>> One bug we should fix is that StressNew sometimes counts more files
>> touched than really were, especially if you had many commits or merges or
>> rebases on your PR branch.  Possible workarounds there include squashing
>> and/or creating a new PR and/or splitting into multiple PRs.  I’ve spent
>> some time trying to reproduce why files are mis-counted, with no success,
>> but perhaps someone cleverer with git could provide a fix.
>> 
>> Another issue is that StressNew is only in the PR pipeline, not the main
>> develop pipeline.  This feels like an asymmetry where PRs must pass a
>> “higher” standard.  We should consider adding some form of StressNew to the
>> main pipeline as well (maybe compare to the previous SHA that passed?).
>> 
>> The original motivation for the 25-file limit was an attempt to limit how
>> long StressNew might run for.  Since concourse already applies a timeout,
>> that check is unnecessary.  However, a compromise solution might be to use
>> the number of files changed to dial back the number of repeats, e.g. stay
>> with 50 repeats if fewer than 25 files changed, otherwise compute 1250 /
>> <#-files-changed> and do only that many repeats (e.g. if 50 files changed,
>> run all tests 25x instead of 50x).
>> 
>> While StressNew is intended to catch new flaky tests, it can also catch
>> poorly-designed tests that fail just by running twice in the same VM.  This
>> may be a sign that the test does not clean up properly and could be
>> polluting other tests in unexpected ways?  It might be useful to run a
>> “StressNew” with repeats=2 over a much broader scope, maybe even all tests,
>> at least once in a while?
>> 
>> 
>>> On Feb 28, 2020, at 9:51 AM, Mark Hanson  wrote:
>>> 
>>> Hi All,
>>> 
>>> Proposal: Force StressNewTest to fail a change with 25 or more files
>> rather than pass it without running it.
>>> 
>>> Currently, the StressNewTest job in the pipeline will just pass a job
>> that has more than 25 files changed. It will be marked as green with no
>> work done. There are reasons, relating to run time being too long to be
>> tracked by concourse, so we just let it through as a pass. I think this is
>> a bad signal. I think that this should automatically be a failure in the
>> short term. As a result, I also think it should not be required. It is a
>> bad signal, and I think that by making it a fail, this will at least not
>> give us a false sense of security. I understand that this opens the flood
>> gates so to speak, but I don’t think as a community it is a big problem
>> because we can still say that you should not merge if the StressNewTest
>> fails because of your test.
>>> 
>>> I personally find the false sense of security more troubling than
>> anything. Hence the reason, I would like this to be
>>> 
>>> Let me know what you think..
>>> 
>>> Thanks,
>>> Mark
>> 
>> 
> 
> -- 
> Ju@N

Re: [PROPOSAL] StressNewTest in Pull Request Pipeline to be made optional.

2020-02-28 Thread Mark Hanson

Alright, so basically it seems like people are not ok with the not requiring 
stressnewtest approach. So I guess that means that we need to identify -1s 
willing to help resolve this problem…

Who would like to help?

Thanks,
Mark

> On Feb 28, 2020, at 11:12 AM, Ernest Burghardt  wrote:
> 
> -1 to limiting any tests... if there are issues with the tests let's fix
> that.  we have too many commits coming in with little or no testing over
> new/changed code, so I can't see how removing any existing test coverage as
> a good idea
> 
> Best,
> EB
> 
> On Fri, Feb 28, 2020 at 10:58 AM Mark Hanson  wrote:
> 
>> Just to make sure we are clear, I am not suggesting that we disable
>> stressnewtest, but that we make it not required. It would still run and
>> provide feedback, but it would not give us an unwarranted green in my
>> approach.
>> 
>>> On Feb 28, 2020, at 10:49 AM, Ju@N  wrote:
>>> 
>>> +1 to what Owen said, I don't think disabling *StressNewTest* is a
>>> good idea.
>>> 
>>> On Fri, 28 Feb 2020 at 18:35, Owen Nichols  wrote:
>>> 
>>>> -1 to making StressNew not required
>>>> 
>>>> +1 to eliminating the current loophole — StressNew should never give a
>>>> free pass.
>>>> 
>>>> Any time your PR is having trouble passing StressNew, please bring it up
>>>> on the dev list. We can review on a case-by-case basis and decide
>> whether
>>>> to try increasing the timeout, changing the repeat count, refactoring
>> the
>>>> PR, or as an absolute last resort requesting authorization for an
>> override
>>>> (for example, a change to spotless rules might touch a huge number of
>> files
>>>> but carry no risk).
>>>> 
>>>> One bug we should fix is that StressNew sometimes counts more files
>>>> touched than really were, especially if you had many commits or merges
>> or
>>>> rebases on your PR branch.  Possible workarounds there include squashing
>>>> and/or creating a new PR and/or splitting into multiple PRs.  I’ve spent
>>>> some time trying to reproduce why files are mis-counted, with no
>> success,
>>>> but perhaps someone cleverer with git could provide a fix.
>>>> 
>>>> Another issue is that StressNew is only in the PR pipeline, not the main
>>>> develop pipeline.  This feels like an asymmetry where PRs must pass a
>>>> “higher” standard.  We should consider adding some form of StressNew to
>> the
>>>> main pipeline as well (maybe compare to the previous SHA that passed?).
>>>> 
>>>> The original motivation for the 25-file limit was an attempt to limit
>> how
>>>> long StressNew might run for.  Since concourse already applies a
>> timeout,
>>>> that check is unnecessary.  However, a compromise solution might be to
>> use
>>>> the number of files changed to dial back the number of repeats, e.g.
>> stay
>>>> with 50 repeats if fewer than 25 files changed, otherwise compute 1250 /
>>>> <#-files-changed> and do only that many repeats (e.g. if 50 files
>> changed,
>>>> run all tests 25x instead of 50x).
>>>> 
>>>> While StressNew is intended to catch new flaky tests, it can also catch
>>>> poorly-designed tests that fail just by running twice in the same VM.
>> This
>>>> may be a sign that the test does not clean up properly and could be
>>>> polluting other tests in unexpected ways?  It might be useful to run a
>>>> “StressNew” with repeats=2 over a much broader scope, maybe even all
>> tests,
>>>> at least once in a while?
>>>> 
>>>> 
>>>>> On Feb 28, 2020, at 9:51 AM, Mark Hanson  wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Proposal: Force StressNewTest to fail a change with 25 or more files
>>>> rather than pass it without running it.
>>>>> 
>>>>> Currently, the StressNewTest job in the pipeline will just pass a job
>>>> that has more than 25 files changed. It will be marked as green with no
>>>> work done. There are reasons, relating to run time being too long to be
>>>> tracked by concourse, so we just let it through as a pass. I think this
>> is
>>>> a bad signal. I think that this should automatically be a failure in the
>>>> short term. As a result, I also think it should not be required. It is a
>>>> bad signal, and I think that by making it a fail, this will at least not
>>>> give us a false sense of security. I understand that this opens the
>> flood
>>>> gates so to speak, but I don’t think as a community it is a big problem
>>>> because we can still say that you should not merge if the StressNewTest
>>>> fails because of your test.
>>>>> 
>>>>> I personally find the false sense of security more troubling than
>>>> anything. Hence the reason, I would like this to be
>>>>> 
>>>>> Let me know what you think..
>>>>> 
>>>>> Thanks,
>>>>> Mark
>>>> 
>>>> 
>>> 
>>> --
>>> Ju@N
>> 
>>

Re: PR Titles

2020-03-02 Thread Mark Hanson

If I may add one caveat. If a pull request is in a ready to review state … 
. If it is prefaced with Do Not Review: then .

Thanks,
Mark

> On Mar 2, 2020, at 9:30 AM, Alexander Murmann  wrote:
> 
>> 
>> Don't we have a published checklist or guideline or something for this
>> already?  Including stuff like 'always prefix the PR title with a JIRA
>> ticket #,' etc?
> 
> 
> Yes, we do. However, it only request the addition of the JIRA ticket #. It
> does not call out having a descriptive title.
> 
> I like the idea of adding a point to the checklist to call this out.
> 
> 
> On Mon, Mar 2, 2020 at 9:02 AM Blake Bender  wrote:
> 
>> Don't we have a published checklist or guideline or something for this
>> already?  Including stuff like 'always prefix the PR title with a JIRA
>> ticket #,' etc?
>> 
>> On Mon, Mar 2, 2020 at 8:47 AM Owen Nichols  wrote:
>> 
>>> +1
>>> 
>>> It’s easy to fix too!
>>> 
>>> On Mon, Mar 2, 2020 at 8:34 AM Jacob Barrett 
>> wrote:
>>> 
 Please use meaningful PR titles. When browsing the PRs or reading the
 notifications, one shouldn’t have to decode your cryptic title.
>> Consider
 the title as the first line of you commit message.
 
 -Jake
 
 
>>> 
>>

Re: PR Titles

2020-03-02 Thread Mark Hanson

The reason I prefer the rules not apply in draft is quite simple, people 
sometimes use the PR pipeline to test stuff.  Its easy enough to just put in 
GEODE- but sometimes it just “Test” because there is no GEODE associated. 
 Either way I don’t care enough to debate. I will abide by what the devs agree 
to. 

I disagree with statement about do not review. I think it is particularly 
useful. I have had people review PRs that were labelled Do not review. So 
grayed out icon is not sufficient. People wouldn’t label that way if it did not 
serve a purpose.

I would like to say that people need to be more flexible and supportive of 
opinions that differ from our own opinions because the alternative is an 
uninviting environment. That said, I am not that attached to any of my opinions 
above. They are merely suggestions taken or not.

Thanks,
Mark

> On Mar 2, 2020, at 10:47 AM, Ernest Burghardt  wrote:
> 
> I agree with Jake and do appreciate everyone using the "DRAFT" feature, one
> downfall of that is this is not visible unless on Github; i.e. if you see
> an email for a PR you won't know its in "DRAFT" mode until you go look at
> it...
> 
> a related part of this, to me, seems to be that the PR checklist is a bit
> out of date... much of it is now automatically enforced (awesome), but
> arguably the most important part of the checklist "[*] Have you written or
> updated unit tests to verify your changes?" is buried at the bottom and is
> an important signal being lost in the noise...
> 
> On Mon, Mar 2, 2020 at 10:12 AM Jacob Barrett  wrote:
> 
>> 
>> 
>>> On Mar 2, 2020, at 10:10 AM, Mark Hanson  wrote:
>>> 
>>> If I may add one caveat. If a pull request is in a ready to review state
>> … . If it is prefaced with Do Not Review: then > apply>.
>> -1
>> Use the “Draft” state of the PR and use good titles from the beginning.
>> Using “DO NO REVIEW” in the title is super annoying now that GitHub has a
>> feature for draft PRs.
>> 
>>

Re: Help needed to get my PR to pass the stressNewTest

2020-03-04 Thread Mark Hanson

Hi Eric, 

When you have a moment, ping me. I can help. I think…

Thanks,
Mark

> On Mar 4, 2020, at 1:27 PM, Eric Shu  wrote:
> 
> My PR (https://github.com/apache/geode/pull/4709) continue to fail in
> stressNewTest. I have been retrigger the test and all failed with same
> issue. I need help to find out how to get this PR to pass the target.
> 
> Thanks!
> -Eric
> 
>> Task :geode-assembly:repeatDistributedTest
> 
> 15:59:04
> 
> Do not allow more than 24 test workers
> 
> 16:50:44
> 
> Cannot abort process 'Gradle Test Executor 110' because it is not in
> started or detached state
> 
> 16:50:44
> 
> java.lang.IllegalStateException: Cannot abort process 'Gradle Test
> Executor 110' because it is not in started or detached state
> 
> 16:50:44
> 
>at 
> com.pedjak.gradle.plugins.dockerizedtest.DockerizedExecHandle.abort(DockerizedExecHandle.java:329)
> 
> 16:50:44
> 
>at org.gradle.process.internal.ExecHandle$abort$53.call(Unknown Source)
> 
> 16:50:44
> 
>at 
> com.pedjak.gradle.plugins.dockerizedtest.ExitCodeTolerantExecHandle.abort(ExitCodeTolerantExecHandle.groovy:32)
> 
> 16:50:44
> 
>at 
> org.gradle.process.internal.worker.DefaultWorkerProcess$2.stop(DefaultWorkerProcess.java:224)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.concurrent.CompositeStoppable.stop(CompositeStoppable.java:103)
> 
> 16:50:44
> 
>at 
> org.gradle.process.internal.worker.DefaultWorkerProcess.cleanup(DefaultWorkerProcess.java:232)
> 
> 16:50:44
> 
>at 
> org.gradle.process.internal.worker.DefaultWorkerProcess.start(DefaultWorkerProcess.java:166)
> 
> 16:50:44
> 
>at 
> org.gradle.process.internal.worker.DefaultWorkerProcessBuilder$MemoryRequestingWorkerProcess.start(DefaultWorkerProcessBuilder.java:221)
> 
> 16:50:44
> 
>at 
> org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.forkProcess(ForkingTestClassProcessor.java:108)
> 
> 16:50:44
> 
>at 
> org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.processTestClass(ForkingTestClassProcessor.java:84)
> 
> 16:50:44
> 
>at 
> org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:52)
> 
> 16:50:44
> 
>at sun.reflect.GeneratedMethodAccessor569.invoke(Unknown Source)
> 
> 16:50:44
> 
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
> 16:50:44
> 
>at java.lang.reflect.Method.invoke(Method.java:498)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:87)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:36)
> 
> 16:50:44
> 
>at org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:71)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.concurrent.InterruptibleRunnable.run(InterruptibleRunnable.java:42)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
> 
> 16:50:44
> 
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
> 16:50:44
> 
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 
> 16:50:44
> 
>at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
> 
> 16:50:44
> 
>at java.lang.Thread.run(Thread.java:748)

1 2 3 >

1 - 100 of 223 matches

Mail list logo