[Proposal] Thread monitoring mechanism

2018-02-20 Thread Gregory Vortman
Hello team,
One of the most severe issues hitting our real time application is thread stuck 
for multiple reasons, such as long lasting locks, deadlocks, threads which wait 
for reply forever in case of packet drop issue etc...
Such kind of stuck are under Radar of the existing system health check methods.
In mission critical applications, this will be resulted as an immediate outage.

As a short we are implementing kind of internal watch dog mechanism for stuck 
detector:
   There is a registration object
   Function executor having start/end hooks to register/unregister 
the thread via the registration object
Customized Monitoring scheduled thread is spawned on startup. The thread to 
wake up every N seconds, to scan the registration map and to detect 
unregistered threads for a long time (configurable).
Once such threads has been detected, process stack is taken and thread stack 
statistic metric is provided.

This helps us to monitor, detect and take fast decision about the action which 
should be taken - usually it is member bounce decision (consistency issue is 
possible, in our case it is better than deny of service).
The above solution is not touching GEODE core code, but implemented in 
boundaries of customized code only.

I would like to raise a proposal to introduce a long term generic thread 
monitoring mechanism, to detect threads which are stuck for any reason.
To maintain a monitoring object having a start/end methods to be invoked 
similarly to FunctionStats.startFunctionExecution and 
FunctionStats.endFunctionExecution.

Your feedback would be appreciated

Thank you for cooperation.
Best regards!

Gregory Vortman

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 



RE: Monitor the neighbour JVM using neihbour's member-timeout

2018-02-20 Thread Aravind Musigumpula
Hi Community,

Any Comments on the below one.


Thanks,
Aravind Musigumpula 


-Original Message-
From: Bruce Schuchardt [mailto:bschucha...@pivotal.io] 
Sent: Thursday, January 18, 2018 11:58 PM
To: dev@geode.apache.org
Subject: Re: Monitor the neighbour JVM using neihbour's member-timeout

We don't use JGroups for membership anymore.  We rewrote all of it and now only 
use JGroups for UDP messaging.  We have complete control over the use of the 
member-timeout setting.

Aravind's idea is relevant to this group.

On 1/17/18 3:39 PM, Michael Stolz wrote:
> Pardon my ignorance, but is this something that should be brought up 
> on the JGroups community?
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Lead
> Mobile: +1-631-835-4771
> Download the new GemFire book here.
>  gemfire>
>
> On Wed, Jan 17, 2018 at 2:37 AM, Aravind Musigumpula < 
> aravind.musigump...@amdocs.com> wrote:
>
>> Hi Everyone,
>>
>> Consider a Geode cluster in which some nodes contain a particular 
>> type of data which is critical to the business and hosts a large amount of 
>> data.
>> Some nodes may host data which is not critical to the business and 
>> hosts less amount of data compared to the previous type of nodes.
>>
>> If both the type of nodes are going through some operation which is 
>> making them unresponsive, the former type of node may take a couple 
>> of seconds extra than the later to respond.
>>
>> In this scenario is it fair to give the same member-timeout to all 
>> the members?
>> What if we want to wait for a little longer time for such nodes.
>>
>> In the present configuration in geode, we cannot wait a little longer 
>> for some nodes when compared to do this although we can configure 
>> different member-timeout for all the nodes. But i think no one will 
>> ever configure different timeouts for each node because those 
>> member-timeouts will be used to monitor their neighbors.
>>
>> In this solution, we all do is wait for the suspected member-timeout 
>> instead of its own timeout during final check.
>> It has no backward implications also, if somebody wants to use the 
>> existing behavior they will continue to use the same member-timeouts 
>> for all the nodes. So the behavior of the system is preserved.
>>
>> If you have any concerns in this solution, please let me know.
>>
>>
>> Thanks,
>> Aravind Musigumpula
>>
>>
>> -Original Message-
>> From: Aravind Musigumpula
>> Sent: Monday, December 18, 2017 6:55 PM
>> To: dev@geode.apache.org
>> Subject: RE: Monitor the neighbour JVM using neihbour's 
>> member-timeout
>>
>> Hi Community,
>>
>> Can you please give your suggestions on the below solution.
>>
>> I have raised a pull request for the same : 
>> https://github.com/apache/
>> geode/pull/1075 .
>>
>>
>> Thanks,
>> Aravind Musigumpula
>>
>> -Original Message-
>> From: Aravind Musigumpula
>> Sent: Friday, November 03, 2017 3:23 PM
>> To: dev@geode.apache.org
>> Subject: RE: Monitor the neighbour JVM using neihbour's 
>> member-timeout
>>
>> Thanks Bruce for suggestions, I will change the new variables from 
>> InternalDistributedMember to NetView and do changes related to 
>> backward compatibility.
>>
>> Now I know that there is another way that member can be removed from 
>> the view i.e if any member is sending a message and waits for 
>> ack-wait-threshold, if there is no response from the target the 
>> sender will do final check and remove it from the view if there is still no 
>> response.
>> But I don't understand how deprecating the settings member-timeout, 
>> ack-wait-threshold, ack-severe-alert-threshold into one will solve 
>> the problem. The main problem is that we want a member to survive in 
>> the view for longer time than others.
>>
>> If we deprecate the settings into one setting and pass the setting to 
>> monitoring member(say A), then it will use the target member(say B 
>> which we want to survive in view for longer time) timeout for health 
>> monitoring and ack-wait-threshold to wait for the response for any 
>> message before doing final check.
>> But what if some other member(say C) which is monitoring any other 
>> member(say D) have the member-timeout and ack-wait-threshold some 
>> smaller values. So if member C messages to B, C uses the smaller 
>> value of ack-wait-threshold(which is of member D) to get a response 
>> and does the final check again on basis of smaller member-timeout. So 
>> still member B can be kicked out of the view in small amount of time.
>>
>> I think this can be solved simply if we use the member-timeout of 
>> suspected member in the final check where we establish TCP 
>> connection. We don't need to club those three settings as well. We 
>> can set the member-timeout of a particular member to a higher value 
>> and the member which monitors it uses its own member-timeout as it is 
>> now, but during the final check it uses the suspected member-t

Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen

Dear Apache Enthusiast,

(You’re receiving this message because you’re subscribed to a user@ or 
dev@ list of one or more Apache Software Foundation projects.)


We’re pleased to announce the upcoming ApacheCon [1] in Montréal, 
September 24-27. This event is all about you — the Apache project community.


We’ll have four tracks of technical content this time, as well as lots 
of opportunities to connect with your project community, hack on the 
code, and learn about other related (and unrelated!) projects across the 
foundation.


The Call For Papers (CFP) [2] and registration are now open. Register 
early to take advantage of the early bird prices and secure your place 
at the event hotel.


Important dates
March 30: CFP closes
April 20: CFP notifications sent
	August 24: Hotel room block closes (please do not wait until the last 
minute)


Follow @ApacheCon on Twitter to be the first to hear announcements about 
keynotes, the schedule, evening events, and everything you can expect to 
see at the event.


See you in Montréal!

Sincerely, Rich Bowen, V.P. Events,
on behalf of the entire ApacheCon team

[1] http://www.apachecon.com/acna18
[2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018


[PROPOSAL] Deprecate running JMX Manager on non-locator members

2018-02-20 Thread Jens Deppe
TL;DR I'd like to propose that we deprecate and ultimately remove the
ability to run a JMX manager on a non-locator member.

I'm not sure about the history for this capability and the original use
cases but I imagine it would have been driven, to a large part, by the
ability for member discovery without locators. Since we have removed that
(and always require locators) it seems that at least one requirement is no
longer valid.

Does anyone have any other use cases where having a JMX manager on a
non-locator would be a requirement?

Thanks
--Jens


[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #834 was SUCCESSFUL (with 2331 tests)

2018-02-20 Thread Spring CI

---
Spring Data GemFire > Nightly-ApacheGeode > #834 was successful.
---
Scheduled
2333 tests in total.

https://build.spring.io/browse/SGF-NAG-834/





--
This message is automatically generated by Atlassian Bamboo

Re: [PROPOSAL] Deprecate running JMX Manager on non-locator members

2018-02-20 Thread John Blum
So long as we **never** remove the ability to start an "embedded"
*Locator/Manager* in an [application] *CacheServer *process*, *which is
highly valuable during "*development*".

For instance, I may want to spin up an application peer *CacheServer* instance
with an embedded *Locator/Manager*, like this...

https://github.com/jxblum/simplifying-apache-geode-with-spring-data/blob/master/simplifying-apachegeode-springdata-complete/src/main/java/example/app/server/SpringBootApacheGeodeServerApplication.java

Given this simple *Spring Boot *application class, it is real easy to form
a "*cluster of servers*" also just by changing the run profile
configuration in my IDE.

Optionally, I should not have to start a *Manager* in my application-based
*CacheServer* even if I do start an embedded *Locator*.  Additionally,
while it might be less likely to occur, it is also nice to be able to start
an application-based *CacheServer* process with an embedded *Manager*
without having to start a *Locator* if I don't want to, since it is still
possible to connect to this node using *Gfsh*, like so...

gfsh> connect --jmx-manager=localhost[1099]

However, the real point is, as an application developer, I should not have
to go outside my IDE and use *Gfsh* to spin up separate *Locator/Manager(s)*
and *CacheServer*(s), which is a non-starter for quick, iterative
development, particularly if I want to enable *debugging*, which is
significantly more difficult to do with *Gfsh* (not impossible, but
definitely more difficult).  *Gfsh* is not a "development" tool.

-j


Re: [DISCUSS] changes to registerInterest API

2018-02-20 Thread Jason Huynh
While doing this work, it looks like we have an equivalent
unregisterInterest(K key) that accepts a List of keys or a single key.  To
keep things consistent with registeringInterest, I propose that we
deprecate the behavior for passing in a list as the key and we introduce:

unregisterInterestForKeys(Iterable keys)


On Fri, Dec 1, 2017 at 7:55 AM Anthony Baker  wrote:

> I think Dan’s suggestion clarifies the intent.
>
> Anthony
>
>
> > On Nov 30, 2017, at 3:54 PM, Dan Smith  wrote:
> >
> > I think it should be registerInterestAll(Iterablekeys) to mirror
> > Collection.addAll.
> >
> > I could easily see a user creating their own Tuple class or using one
> that
> > happens to also implement Iterable as their key.
> >
> > We're not doing a very good job of discouraging people to use iterable
> > objects as their key if they work everywhere else and then break with
> this
> > one API.
> >
> > -Dan
> >
> > On Thu, Nov 30, 2017 at 2:45 PM, John Blum  wrote:
> >
> >> No, no... good question.
> >>
> >> I just think it would be wiser if users created a single, CompositeKey
> >> class type, with properly implements equals and hashCode methods, as I
> >> pointed out.
> >>
> >> I don't see any advantage in using a java.util.Collection as a key over
> >> implementing a CompositeKey type.
> >>
> >> As such, anything we can do to discourage users from using Collection
> types
> >> as a key, I think is a good thing.
> >>
> >>
> >> On Thu, Nov 30, 2017 at 2:35 PM, Jason Huynh  wrote:
> >>
> >>> Yeah I am not sure if anyone does it,and I don't think it would be a
> good
> >>> idea to use a collection as a key but just thought I'd ask the
> >> question...
> >>>
> >>> On Thu, Nov 30, 2017 at 2:33 PM John Blum  wrote:
> >>>
>  For instance, this came up recently...
> 
> 
>  https://stackoverflow.com/questions/46551278/gemfire-
> >>> composite-key-pojo-as-gemfire-key
> 
>  I have seen other similar posts too!
> 
> 
> 
>  On Thu, Nov 30, 2017 at 2:30 PM, John Blum  wrote:
> 
> > Does anyone actually do this in practice?  If so, yikes!
> >
> > Even if the List is immutable, the elements may not be, so using a
> >> List
>  as
> > a key starts to open 1 up to a lot of problems.
> >
> > As others have pointed out in SO and other channels, information
> >> should
> > not be kept in the key.
> >
> > It is perfect fine to have a "Composite" Key, but then define a
> > CompositeKey class type with properly implemented equals(:Object) and
> > hashCode():int methods.
> >
> > For the most part, Keys should really only ever be simple Scalar
> >> values
> > (e.g. Long, String, etc).
> >
> > -j
> >
> >
> >
> >
> > On Thu, Nov 30, 2017 at 2:25 PM, Jason Huynh 
> >>> wrote:
> >
> >> I started work on the following plan:
> >> - deprecate current "ALL_KEYS" and List passing behavior in
> >> registerInterest
> >> ()
> >> - add registerInterestForAllKeys();
> >> - add registerInterest(T... keys)
> >> - add registerInterest(Iterablekeys)
> >>
> >> I might be missing something here but:
> >> With the addition of registerInterest(Iterable keys), I think we
>  would
> >> not be able to register interest a List as the key itself.  A list
> >>> would
> >> be
> >> iterated over due to the addition of registerInterest(Iterable
> >>> keys).
> >> A
> >> list in a list would be passed into registerInterest and again be
>  iterated
> >> over.  I could change the newly created registerInterest call and
> >> explicitly name it something else or are we ok with Iterables not
> >>> being
> >> able to be registered as individual keys.
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Nov 20, 2017 at 9:05 AM Kirk Lund  wrote:
> >>
> >>> John's approach looks best for when you need to specify keys.
> >>>
> >>> For ALL_KEYS, what about an API that doesn't require a token or
> >> all
> >> keys:
> >>>
> >>> public void registerInterestForAllKeys();
> >>>
> >>> On Fri, Nov 17, 2017 at 1:24 PM, Jason Huynh 
>  wrote:
> >>>
>  Thanks John for the clarification!
> 
>  On Fri, Nov 17, 2017 at 1:12 PM John Blum 
> >>> wrote:
> 
> > This...
> >
> >> The Iterable version would handle any collection type by
> >>> having
> >> the
>  user
> > pass
> > in the iterator for the collection.
> >
> > Is not correct.
> >
> > The Collection interface itself "extends" the
> >> java.lang.Iterable
> > interface (see here...
> >
>  https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html
>  under
> > "*All
> > Superinterfaces*").
> >
> > Therefore a user can simply to this...
> >
> > *List* keys = ...
> >
> >>>

Re: [DISCUSS] changes to registerInterest API

2018-02-20 Thread Dan Smith
>
> unregisterInterestForKeys(Iterable keys)
>

+1


Re: [DISCUSS] changes to registerInterest API

2018-02-20 Thread John Blum
+1

On Tue, Feb 20, 2018 at 5:17 PM, Dan Smith  wrote:

> >
> > unregisterInterestForKeys(Iterable keys)
> >
>
> +1
>



-- 
-John
john.blum10101 (skype)


Geode unit tests completed in 'develop/DistributedTest' with non-zero exit code

2018-02-20 Thread apachegeodeci
Pipeline results can be found at:

Concourse: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/DistributedTest/builds/154