InternalCacheTransactionManager2PC removed from public API

2019-03-26 Thread Mario Ivanac
Hi,


Can you tell me why is this header removed from geode native public API 
(https://issues.apache.org/jira/browse/GEODE-3421),

and how we can use XA transactions (2PC) in geode native?


BR,

Mario


Request access to JIRA

2019-03-29 Thread Mario Ivanac
Hi Geode Dev,


Can you give me access to JIRA, so I can assign to tickets.

My JIRA username is "mivanac".


Thanks,

Mario


Re: InternalCacheTransactionManager2PC removed from public API

2019-04-12 Thread Mario Ivanac
Hi,


just reformulate this question. What is the way forward if we want to use 2PC 
with C++ geode native?


Regards,

Mario


Šalje: Mario Ivanac 
Poslano: 26. ožujka 2019. 8:42:57
Prima: dev@geode.apache.org
Predmet: InternalCacheTransactionManager2PC removed from public API

Hi,


Can you tell me why is this header removed from geode native public API 
(https://issues.apache.org/jira/browse/GEODE-3421),

and how we can use XA transactions (2PC) in geode native?


BR,

Mario


Gateway-sender alert-threshold function not working for gateway sender

2019-07-08 Thread Mario Ivanac
Hi, according to implementation, events which exceed alert threshold are 
reported only for AsyncEventQueue,

but not for GatewaySender 
(https://issues.apache.org/jira/browse/GEODE-6933).


Is there any reason for this behavior?


Re: Gateway-sender alert-threshold function not working for gateway sender

2019-07-11 Thread Mario Ivanac
Description from documentation


--alert-threshold
Maximum number of milliseconds that a region event can remain in the gateway 
sender queue before Geode logs an alert.



Šalje: Mario Ivanac 
Poslano: 8. srpnja 2019. 10:52:20
Prima: dev@geode.apache.org
Predmet: Gateway-sender alert-threshold function not working for gateway sender

Hi, according to implementation, events which exceed alert threshold are 
reported only for AsyncEventQueue,

but not for GatewaySender 
(<https://issues.apache.org/jira/browse/GEODE-6933>https://issues.apache.org/jira/browse/GEODE-6933).


Is there any reason for this behavior?


Server recovery severely degrades client read traffic

2019-08-01 Thread Mario Ivanac
Hi,

we are observing severe throttling from the cluster when getting data from a 
partitioned region (no SH nor TX) while server hosting one of the redundant 
buckets is recovering (see ticket 
https://issues.apache.org/jira/browse/GEODE-7039).


Currently, Get operations that have not landed on a server hosting the bucket 
will be proxied to other members that host bucket, in random fashion.

Question is, can we use bucketProfile.isInitializing flag for member selection 
algorithm, or do you have some better idea?


Re: Server recovery severely degrades client read traffic

2019-08-06 Thread Mario Ivanac
Hi,


we see problem in BucketAdvisor.getPreferredNode, in random selection of member:


// Pick one at random.
int i = myRand.nextInt(locProfiles.length);
return locProfiles[i].peerMemberId;



BR



Šalje: Anthony Baker 
Poslano: 2. kolovoza 2019. 19:43:18
Prima: dev@geode.apache.org 
Predmet: Re: Server recovery severely degrades client read traffic

Interesting find!  Can you share the code path you’re looking at?  I see one 
related to putAll but not for get.  Thanks!

Anthony


> On Aug 1, 2019, at 11:01 PM, Mario Ivanac  wrote:
>
> Hi,
>
> we are observing severe throttling from the cluster when getting data from a 
> partitioned region (no SH nor TX) while server hosting one of the redundant 
> buckets is recovering (see ticket 
> https://issues.apache.org/jira/browse/GEODE-7039).
>
>
> Currently, Get operations that have not landed on a server hosting the bucket 
> will be proxied to other members that host bucket, in random fashion.
>
> Question is, can we use bucketProfile.isInitializing flag for member 
> selection algorithm, or do you have some better idea?



Re: Reviewers for PR

2019-08-08 Thread Mario Ivanac
Hi,


could someone review PR https://github.com/apache/geode/pull/3854.


Thanks


Šalje: Mark Hanson 
Poslano: 6. kolovoza 2019. 19:21:42
Prima: dev@geode.apache.org 
Predmet: Reviewers for PR

Hi All,

Here is another PR that could use reviewers.

GEODE-6748: First part of solution #3854 
https://github.com/apache/geode/pull/3854 


Thanks,
Mark


geode-native ipv6

2019-08-08 Thread Mario Ivanac
Hi,


can you tell me does geode-native client support ipv6?


BR,

Mario


Re: geode-native ipv6

2019-08-14 Thread Mario Ivanac
Hi,


created https://issues.apache.org/jira/browse/GEODE-7086, and PR with code 
impacts.

Proposed solution was tried on IPv6 environment, and basic operations (PUT/GET) 
were successful.

Additional test needed.


BR,

Mario


Šalje: Blake Bender 
Poslano: 9. kolovoza 2019. 0:03:32
Prima: dev@geode.apache.org 
Predmet: Re: geode-native ipv6

This chunk of code in the client handshake code leads me to believe it is
still IPv4 only.  Won't say it's definitive, cause I'm not 100% certain
hostaddr is used on the server side, but still...

// writing first 4 bytes of the address. This will be same until
// IPV6 support is added in the client
uint32_t temp;
memcpy(&temp, hostAddr, 4);
m_memID.writeInt(static_cast(temp));


On Thu, Aug 8, 2019 at 1:18 PM Jacob Barrett  wrote:

> We are on the latest ACE.
>
> > On Aug 8, 2019, at 9:56 AM, Mark Hanson  wrote:
> >
> > The latest ACE framework seems to have support, but I don’t know how far
> off latest we are. I don’t think we test anything in an IPv6 context, so I
> would say no that we don’t officially support it in the client. Given some
> time, I could do some testing..
> >
> > Thanks,
> > Mark
> >
> >> On Aug 8, 2019, at 7:35 AM, Blake Bender  wrote:
> >>
> >> I'm sure someone will chime in with a more definitive answer, but I'm
> >> pretty certain the answer is no, sorry.
> >>
> >> Thanks,
> >>
> >> Blake
> >>
> >>
> >>> On Thu, Aug 8, 2019 at 4:28 AM Mario Ivanac 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>>
> >>> can you tell me does geode-native client support ipv6?
> >>>
> >>>
> >>> BR,
> >>>
> >>> Mario
> >>>
> >
>


Re: geode-native ipv6

2019-08-21 Thread Mario Ivanac
Hi,


Can you help me, how to simulate ipv6 in new integration test framework?


BR,

Mario


Šalje: Jacob Barrett 
Poslano: 14. kolovoza 2019. 21:00:35
Prima: dev@geode.apache.org 
Predmet: Re: geode-native ipv6

Can you build an integration test in the new framework?

> On Aug 14, 2019, at 11:25 AM, Mario Ivanac  wrote:
>
> Hi,
>
>
> created https://issues.apache.org/jira/browse/GEODE-7086, and PR with code 
> impacts.
>
> Proposed solution was tried on IPv6 environment, and basic operations 
> (PUT/GET) were successful.
>
> Additional test needed.
>
>
> BR,
>
> Mario
>
> 
> Šalje: Blake Bender 
> Poslano: 9. kolovoza 2019. 0:03:32
> Prima: dev@geode.apache.org 
> Predmet: Re: geode-native ipv6
>
> This chunk of code in the client handshake code leads me to believe it is
> still IPv4 only.  Won't say it's definitive, cause I'm not 100% certain
> hostaddr is used on the server side, but still...
>
> // writing first 4 bytes of the address. This will be same until
> // IPV6 support is added in the client
> uint32_t temp;
> memcpy(&temp, hostAddr, 4);
> m_memID.writeInt(static_cast(temp));
>
>
>> On Thu, Aug 8, 2019 at 1:18 PM Jacob Barrett  wrote:
>>
>> We are on the latest ACE.
>>
>>> On Aug 8, 2019, at 9:56 AM, Mark Hanson  wrote:
>>>
>>> The latest ACE framework seems to have support, but I don’t know how far
>> off latest we are. I don’t think we test anything in an IPv6 context, so I
>> would say no that we don’t officially support it in the client. Given some
>> time, I could do some testing..
>>>
>>> Thanks,
>>> Mark
>>>
>>>> On Aug 8, 2019, at 7:35 AM, Blake Bender  wrote:
>>>>
>>>> I'm sure someone will chime in with a more definitive answer, but I'm
>>>> pretty certain the answer is no, sorry.
>>>>
>>>> Thanks,
>>>>
>>>> Blake
>>>>
>>>>
>>>>> On Thu, Aug 8, 2019 at 4:28 AM Mario Ivanac 
>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> can you tell me does geode-native client support ipv6?
>>>>>
>>>>>
>>>>> BR,
>>>>>
>>>>> Mario
>>>>>
>>>
>>


Odg: geode-native ipv6

2019-08-22 Thread Mario Ivanac
Thanks for detailed answer.

Šalje: Jacob Barrett 
Poslano: 22. kolovoza 2019. 6:56
Prima: dev@geode.apache.org 
Predmet: Re: geode-native ipv6

You might start by updating the framework’s Cluster class accept a hostname 
that would be used for both Locator and Server classes. Then in an IPv6 tests 
you could set the hostname to '0:0:0:0:0:0:0:1’, which is the loopback address. 
In theory this should work.

In Cluster.hpp:
Look for all the NamedType definitions and add.
using BindAddress = NamedType;
Plumb it though to Locator and Server.

In your test:
Cluster cluster(…, BindAddress(“0:0:0:0:0:0:0:1"));


Even more simple may be to not allow the hostname but just a flag for IPv4 or 
IPv6 and internally set the hostname to the IPv6 or IPv6 localhost address.

In Cluster.hpp:
using UseIpv6 = NamedType;
Then plumb that through to Locator and Server and select the appropriate 
loopback address.

In your test:
Cluster cluster(…, UseIpv6(true));


I can help you in more detail if you want.

-Jake


> On Aug 21, 2019, at 12:34 AM, Mario Ivanac  wrote:
>
> Hi,
>
>
> Can you help me, how to simulate ipv6 in new integration test framework?
>
>
> BR,
>
> Mario
>
> 
> Šalje: Jacob Barrett 
> Poslano: 14. kolovoza 2019. 21:00:35
> Prima: dev@geode.apache.org 
> Predmet: Re: geode-native ipv6
>
> Can you build an integration test in the new framework?
>
>> On Aug 14, 2019, at 11:25 AM, Mario Ivanac  wrote:
>>
>> Hi,
>>
>>
>> created https://issues.apache.org/jira/browse/GEODE-7086, and PR with code 
>> impacts.
>>
>> Proposed solution was tried on IPv6 environment, and basic operations 
>> (PUT/GET) were successful.
>>
>> Additional test needed.
>>
>>
>> BR,
>>
>> Mario
>>
>> 
>> Šalje: Blake Bender 
>> Poslano: 9. kolovoza 2019. 0:03:32
>> Prima: dev@geode.apache.org 
>> Predmet: Re: geode-native ipv6
>>
>> This chunk of code in the client handshake code leads me to believe it is
>> still IPv4 only.  Won't say it's definitive, cause I'm not 100% certain
>> hostaddr is used on the server side, but still...
>>
>> // writing first 4 bytes of the address. This will be same until
>> // IPV6 support is added in the client
>> uint32_t temp;
>> memcpy(&temp, hostAddr, 4);
>> m_memID.writeInt(static_cast(temp));
>>
>>
>>> On Thu, Aug 8, 2019 at 1:18 PM Jacob Barrett  wrote:
>>>
>>> We are on the latest ACE.
>>>
>>>> On Aug 8, 2019, at 9:56 AM, Mark Hanson  wrote:
>>>>
>>>> The latest ACE framework seems to have support, but I don’t know how far
>>> off latest we are. I don’t think we test anything in an IPv6 context, so I
>>> would say no that we don’t officially support it in the client. Given some
>>> time, I could do some testing..
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>> On Aug 8, 2019, at 7:35 AM, Blake Bender  wrote:
>>>>>
>>>>> I'm sure someone will chime in with a more definitive answer, but I'm
>>>>> pretty certain the answer is no, sorry.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Blake
>>>>>
>>>>>
>>>>>> On Thu, Aug 8, 2019 at 4:28 AM Mario Ivanac 
>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> can you tell me does geode-native client support ipv6?
>>>>>>
>>>>>>
>>>>>> BR,
>>>>>>
>>>>>> Mario
>>>>>>
>>>>
>>>



Need PR reviews

2019-08-26 Thread Mario Ivanac
Hi Geode dev,

we need review for following PRs:

Jira ticket:

https://issues.apache.org/jira/browse/GEODE-7086
PR:
https://github.com/apache/geode-native/pull/510

Jira ticket: 
https://issues.apache.org/jira/browse/GEODE-7039
PR:
https://github.com/apache/geode/pull/3955

Thanks,
Mario


Odg: Need PR reviews

2019-08-27 Thread Mario Ivanac
Hi,

just to remind you.

Thanks.

Šalje: Mario Ivanac 
Poslano: 26. kolovoza 2019. 11:37
Prima: dev@geode.apache.org 
Predmet: Need PR reviews

Hi Geode dev,

we need review for following PRs:

Jira ticket:
<https://issues.apache.org/jira/browse/GEODE-7086>
https://issues.apache.org/jira/browse/GEODE-7086
PR:
https://github.com/apache/geode-native/pull/510

Jira ticket: <https://issues.apache.org/jira/browse/GEODE-7086>
https://issues.apache.org/jira/browse/GEODE-7039
PR:
https://github.com/apache/geode/pull/3955

Thanks,
Mario


Please review PR #4188

2019-10-21 Thread Mario Ivanac
Hi,

could somone review PR

https://github.com/apache/geode/pull/4188

for flaky test.

Thanks,
Mario


Special certificates for multisite

2019-10-29 Thread Mario Ivanac
Hi,

We are trying to find a solution for an situation we have. Below is the 
explanation of the issue, as well as a proposed way forward. We would 
appreciate comments from the community regarding this approach. Also, 
suggestions for a different approach to solve this issue would be welcome.

The situation is the following: when SSL is enabled, we would like to use 
multiple certificates in locators and servers: one certificate for 
communication inside one geographical site, and another for the communication 
happening between geographical sites. The approach we took is to use the 
default SSL context and implement our own Java Security Provider.

The client side can, from the security provider, easily distinguish which 
certificate to use (just check if you are initiating a connection towards an IP 
listed in gemfire.remote-locators). The thing we are missing is a criteria 
based on which we can choose the certificate on the server side of the 
connection. Explanation: Once the handshake starts, a ClientHello message is 
sent from the peer acting as a client in that connection (be it a geode server 
or a geode locator). When the ClientHello is received on the server side, we 
can’t always read the real originating IP (keeping the originating IP is 
sometimes not possible in cloud native environments), so we can’t be sure 
whether the message originated from the same geographical site or from another. 
We are left only with the information inside the ClientHello message. The 
default ClientHello message doesn’t contain information based on which we can 
conclude which site the message came from and which certificate to use.

Our proposal is to add the server_name extension in the ClientHello message. 
The content of that extension would be the distributed system ID of the site 
where the ClientHello originated. This way we could compare the distributed 
system ID in the ClientHello message with the ID of the site where the message 
is received.

One issue with this approach is that the usage of the extension wouldn’t be 
completely in line with the RFC 
description: we would be inserting client information instead of the targeted 
server name. Still, since this extension is otherwise of no use in Geode, and 
this approach doesn’t present a security risk (we would be sharing the 
distributed system ID, which doesn’t look dangerous), we see this as a 
potential way to go.





gateway sender queue

2019-11-04 Thread Mario Ivanac
Hi geode dev,
we have some questions regarding gateway senders and gateway sender queue.


  *   During testing we have observed, different behavior in parallel and 
serial gateway senders. In case we manually stop, than start gateway senders, 
for parallel gateway senders, queue is purged, but for serial gateway senders 
this is not the case. Is this normal behavior or bug?
  *   What happens with the queues when whole cluster is stopped and later 
started (In our tests with persistent queues, the events are kept)?
  *   Could we add extra option in gfsh command  "start gateway sender" that 
allows to control queues reset (for instance --cleanQueues)?

Thanks,
Mario




Odg: bug fix needed for release/1.11.0

2019-11-06 Thread Mario Ivanac
+1 for bringing this fix to release/1.11.0

Šalje: Mark Hanson 
Poslano: 6. studenog 2019. 18:28
Prima: dev@geode.apache.org 
Predmet: Re: bug fix needed for release/1.11.0

Any other votes? I have 2 people in favor.

Voting will close at noon.

Thanks,
Mark

> On Nov 6, 2019, at 8:00 AM, Bruce Schuchardt  wrote:
>
> The fix for this problem is in the CI pipeline today: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Build/builds/1341
>
> On 11/5/19 10:49 AM, Owen Nichols wrote:
>> +1 for bringing this fix to release/1.11.0 (after it has passed Benchmarks 
>> on develop)
>>
>>> On Nov 5, 2019, at 10:45 AM, Bruce Schuchardt  
>>> wrote:
>>>
>>> The PR for GEODE-6661 introduced a problem in SSL communications that needs 
>>> to be fixed.  It changed SSL handshakes to use a temporary buffer that's 
>>> discarded when the handshake completes, but sometimes this buffer contains 
>>> application data that must be retained.  This seems to be causing our 
>>> Benchmark SSL test failures in CI.
>>>
>>> I'm preparing a fix.  We can either revert the PR for GEODE-6661 on that 
>>> branch or cherry-pick the correction when it's ready.
>>>



Odg: gateway sender queue

2019-11-07 Thread Mario Ivanac
Hi,

thanks for answers.

Some more details regarding 1st question.

Is this behavior same (for serial and parallel gateway sender) in case queue is 
persistent?
Meaning, should queue (persistent) be purged if we restart gateway sender?


Thanks,
Mario


Šalje: Dan Smith 
Poslano: 5. studenog 2019. 18:52
Prima: dev@geode.apache.org 
Predmet: Re: gateway sender queue

Some replies, inline:

  *   During testing we have observed, different behavior in parallel and
> serial gateway senders. In case we manually stop, than start gateway
> senders, for parallel gateway senders, queue is purged, but for serial
> gateway senders this is not the case. Is this normal behavior or bug?
>

Hmm, I also think stop is supposed to clear the queue. I think if you are
seeing that it doesn't clear the queue, that might be a bug.



>   *   What happens with the queues when whole cluster is stopped and later
> started (In our tests with persistent queues, the events are kept)?
>

Persistent queues will keep all of the events when you restart.


>   *   Could we add extra option in gfsh command  "start gateway sender"
> that allows to control queues reset (for instance --cleanQueues)?
>

If stop does clear the queue, would this be needed? It might still be
reasonable - I've heard folks request a way to clear running queues as well.

-Dan


Odg: gateway sender queue

2019-11-08 Thread Mario Ivanac
Hi all,

one more clarification regarding 3rd question:

"*   Could we add extra option in gfsh command  "start gateway sender"
 that allows to control queues reset (for instance --cleanQueues)"

This option will indicate, when gateway sender is started, should we 
discard/clean existing queue, or should we use existing queue.
By default it will be to discard/clean existing queue.

Best Regards,
Mario
____
Šalje: Mario Ivanac 
Poslano: 7. studenog 2019. 9:01
Prima: Dan Smith ; dev@geode.apache.org 

Predmet: Odg: gateway sender queue

Hi,

thanks for answers.

Some more details regarding 1st question.

Is this behavior same (for serial and parallel gateway sender) in case queue is 
persistent?
Meaning, should queue (persistent) be purged if we restart gateway sender?


Thanks,
Mario


Šalje: Dan Smith 
Poslano: 5. studenog 2019. 18:52
Prima: dev@geode.apache.org 
Predmet: Re: gateway sender queue

Some replies, inline:

  *   During testing we have observed, different behavior in parallel and
> serial gateway senders. In case we manually stop, than start gateway
> senders, for parallel gateway senders, queue is purged, but for serial
> gateway senders this is not the case. Is this normal behavior or bug?
>

Hmm, I also think stop is supposed to clear the queue. I think if you are
seeing that it doesn't clear the queue, that might be a bug.



>   *   What happens with the queues when whole cluster is stopped and later
> started (In our tests with persistent queues, the events are kept)?
>

Persistent queues will keep all of the events when you restart.


>   *   Could we add extra option in gfsh command  "start gateway sender"
> that allows to control queues reset (for instance --cleanQueues)?
>

If stop does clear the queue, would this be needed? It might still be
reasonable - I've heard folks request a way to clear running queues as well.

-Dan


ParallelGatewaySenderQueue implementation

2019-11-11 Thread Mario Ivanac
Hi geode dev,

I am investigating SerialGatewaySenderQueue and ParallelGatewaySenderQueue 
implementation,

and I found that in ParallelGatewaySenderQueue.close() function,
code is deleted and comment is left:

// Because of bug 49060 do not close the regions of a parallel queue

My question is, where can I find some more info regarding this bug.

BR,
Mario


Odg: gateway sender queue

2019-11-13 Thread Mario Ivanac
Hi,

just to remind you on last question:

what is your opinion on adding additional option in gfsh command  "start 
gateway sender"
to control clearing of existing queues --cleanQueues.

This option will indicate, when gateway sender is started, should we 
discard/clean existing queue, or should we use existing queue.
By default it will be to discard/clean existing queue.

Best Regards,
Mario
____
Šalje: Mario Ivanac 
Poslano: 8. studenog 2019. 13:00
Prima: dev@geode.apache.org 
Predmet: Odg: gateway sender queue

Hi all,

one more clarification regarding 3rd question:

"*   Could we add extra option in gfsh command  "start gateway sender"
 that allows to control queues reset (for instance --cleanQueues)"

This option will indicate, when gateway sender is started, should we 
discard/clean existing queue, or should we use existing queue.
By default it will be to discard/clean existing queue.

Best Regards,
Mario
________
Šalje: Mario Ivanac 
Poslano: 7. studenog 2019. 9:01
Prima: Dan Smith ; dev@geode.apache.org 

Predmet: Odg: gateway sender queue

Hi,

thanks for answers.

Some more details regarding 1st question.

Is this behavior same (for serial and parallel gateway sender) in case queue is 
persistent?
Meaning, should queue (persistent) be purged if we restart gateway sender?


Thanks,
Mario


Šalje: Dan Smith 
Poslano: 5. studenog 2019. 18:52
Prima: dev@geode.apache.org 
Predmet: Re: gateway sender queue

Some replies, inline:

  *   During testing we have observed, different behavior in parallel and
> serial gateway senders. In case we manually stop, than start gateway
> senders, for parallel gateway senders, queue is purged, but for serial
> gateway senders this is not the case. Is this normal behavior or bug?
>

Hmm, I also think stop is supposed to clear the queue. I think if you are
seeing that it doesn't clear the queue, that might be a bug.



>   *   What happens with the queues when whole cluster is stopped and later
> started (In our tests with persistent queues, the events are kept)?
>

Persistent queues will keep all of the events when you restart.


>   *   Could we add extra option in gfsh command  "start gateway sender"
> that allows to control queues reset (for instance --cleanQueues)?
>

If stop does clear the queue, would this be needed? It might still be
reasonable - I've heard folks request a way to clear running queues as well.

-Dan


Review for #4310

2019-11-15 Thread Mario Ivanac
Hi all,

please could someone review https://issues.apache.org/jira/browse/GEODE-7414

PR https://github.com/apache/geode/pull/4310


Thanks,
Mario


Proposal of new config property "ssl-server-name-extension"

2019-11-19 Thread Mario Ivanac
Hi geode dev,

as a part of solution for https://issues.apache.org/jira/browse/GEODE-7414
we would like to introduce new config property "ssl-server-name-extension".

This property will contain generic string, which will be added as Server Name 
Indication (SNI) parameter to Client Hello message.

Do you agree with this proposal?

Thanks,
Mario


Odg: Proposal of new config property "ssl-server-name-extension"

2019-11-19 Thread Mario Ivanac
Hi all,

this proposal and ticket are result of mail discussion "Special certificates 
for multisite":

https://lists.apache.org/thread.html/2418dd1b5f9ae812daa48a51a8d2eb252a3c861a890264f47da3a4d3@%3Cdev.geode.apache.org%3E


BR,
Mario

Šalje: Charlie Black 
Poslano: 19. studenog 2019. 17:24
Prima: dev@geode.apache.org 
Predmet: Re: Proposal of new config property "ssl-server-name-extension"

I have read the e-mail and the ticket I am not sure how this field is going
to be used.   Maybe you can expand on the intent of this field.

>From the property "ssl-server-name-extension" it feels like we are
intending to correlate with something presented in the SSL certificate.
It would be great if that was explained plainly for the reader in more
detail.

For now I can only -1.

Charlie

On Tue, Nov 19, 2019 at 3:27 AM Mario Ivanac  wrote:

> Hi geode dev,
>
> as a part of solution for https://issues.apache.org/jira/browse/GEODE-7414
> we would like to introduce new config property "ssl-server-name-extension".
>
> This property will contain generic string, which will be added as Server
> Name Indication (SNI) parameter to Client Hello message.
>
> Do you agree with this proposal?
>
> Thanks,
> Mario
>


--
Charlie Black | cbl...@pivotal.io


Odg: Odg: Proposal of new config property "ssl-server-name-extension"

2019-11-19 Thread Mario Ivanac
Hi,

as described before:

This property will contain generic string, which will be added as Server Name 
Indication (SNI) parameter to ClientHello message.
ClientHello message is part of SSL handshake.

Mario

Šalje: Charlie Black 
Poslano: 19. studenog 2019. 18:20
Prima: Mario Ivanac 
Kopija: dev@geode.apache.org 
Predmet: Re: Odg: Proposal of new config property "ssl-server-name-extension"

The SSL handshake is done before the Geode handshake.So additions to the 
Geode handshake protocol will not affect SSL connections since the secure 
socket connection has already been negotiated and the Geode handshake is 
encrypted.

Charlie

On Tue, Nov 19, 2019 at 9:06 AM Mario Ivanac  wrote:
Hi all,

this proposal and ticket are result of mail discussion "Special certificates 
for multisite":

https://lists.apache.org/thread.html/2418dd1b5f9ae812daa48a51a8d2eb252a3c861a890264f47da3a4d3@%3Cdev.geode.apache.org%3E<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_2418dd1b5f9ae812daa48a51a8d2eb252a3c861a890264f47da3a4d3-40-253Cdev.geode.apache.org-253E&d=DwMF-g&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=TeO8Y4MHxN-HWthX0kIhmTbHjxbnon-82BZ-g9Q6TDI&m=GG4kW5SuTjSCV707Igt5WbMQyay_8vOtB9nH8cLBgAM&s=PjLj2CJYNHbQUiMKrd-FKMqwbuxVERJifxQWpM4HM8k&e=>


BR,
Mario

Šalje: Charlie Black mailto:cbl...@pivotal.io>>
Poslano: 19. studenog 2019. 17:24
Prima: dev@geode.apache.org<mailto:dev@geode.apache.org> 
mailto:dev@geode.apache.org>>
Predmet: Re: Proposal of new config property "ssl-server-name-extension"

I have read the e-mail and the ticket I am not sure how this field is going
to be used.   Maybe you can expand on the intent of this field.

>From the property "ssl-server-name-extension" it feels like we are
intending to correlate with something presented in the SSL certificate.
It would be great if that was explained plainly for the reader in more
detail.

For now I can only -1.

Charlie

On Tue, Nov 19, 2019 at 3:27 AM Mario Ivanac  wrote:

> Hi geode dev,
>
> as a part of solution for 
> https://issues.apache.org/jira/browse/GEODE-7414<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_GEODE-2D7414&d=DwMF-g&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=TeO8Y4MHxN-HWthX0kIhmTbHjxbnon-82BZ-g9Q6TDI&m=GG4kW5SuTjSCV707Igt5WbMQyay_8vOtB9nH8cLBgAM&s=4h7HHiRlRX_Cw8mVGuVfzHgfUbKul07BjaV1CVE3_H8&e=>
> we would like to introduce new config property "ssl-server-name-extension".
>
> This property will contain generic string, which will be added as Server
> Name Indication (SNI) parameter to Client Hello message.
>
> Do you agree with this proposal?
>
> Thanks,
> Mario
>


--
Charlie Black | cbl...@pivotal.io<mailto:cbl...@pivotal.io>


--
Charlie Black | cbl...@pivotal.io<mailto:cbl...@pivotal.io>


Odg: Proposal of new config property "ssl-server-name-extension"

2019-11-21 Thread Mario Ivanac
Hi,

all connections will use ssl-server-name-extension as part of Client Hello.

BR,
Mario

Šalje: Dan Smith 
Poslano: 19. studenog 2019. 22:17
Prima: dev@geode.apache.org 
Predmet: Re: Proposal of new config property "ssl-server-name-extension"

Can you clarify which connections will use this ssl-server-name-extension
as part of the Client Hello? client to locator, client to server, server to
server, WAN site to WAN site, ... all of the above?

I'm fine with adding the new property.

At some point, I think we need to think about making it easier to plug in
custom logic to control the entire socket creation and TLS handshake. I
think technically you can take over the whole process if you set the
ssl-use-default-context property and then configure the default SSLContext
for your entire process, but that is not ideal.

-Dan

On Tue, Nov 19, 2019 at 8:24 AM Charlie Black  wrote:

> I have read the e-mail and the ticket I am not sure how this field is going
> to be used.   Maybe you can expand on the intent of this field.
>
> From the property "ssl-server-name-extension" it feels like we are
> intending to correlate with something presented in the SSL certificate.
> It would be great if that was explained plainly for the reader in more
> detail.
>
> For now I can only -1.
>
> Charlie
>
> On Tue, Nov 19, 2019 at 3:27 AM Mario Ivanac 
> wrote:
>
> > Hi geode dev,
> >
> > as a part of solution for
> https://issues.apache.org/jira/browse/GEODE-7414
> > we would like to introduce new config property
> "ssl-server-name-extension".
> >
> > This property will contain generic string, which will be added as Server
> > Name Indication (SNI) parameter to Client Hello message.
> >
> > Do you agree with this proposal?
> >
> > Thanks,
> > Mario
> >
>
>
> --
> Charlie Black | cbl...@pivotal.io
>


Odg: Proposal of new config property "ssl-server-name-extension"

2019-11-21 Thread Mario Ivanac
Hi,

regarding your questions:

>>Mario, are there any limitations that should be understood about the types
of certificates used or how they're generated?<<

There is no limitations.


>>Do you have the freedom to use certificate chaining and have the root CA in 
>>each component's
truststore?<<

Yes

BR,
Mario


Šalje: Ivan Godwin 
Poslano: 21. studenog 2019. 4:29
Prima: dev@geode.apache.org 
Predmet: Re: Proposal of new config property "ssl-server-name-extension"

Thank you for the reference to the other thread, Jens. I hope my questions
aren't too late in the process.

Mario, are there any limitations that should be understood about the types
of certificates used or how they're generated? Do you have the freedom to
use certificate chaining and have the root CA in each component's
truststore? Dan's concern of a future, valid use case resonates quite a bit
with me.

Ivan


On Wed, Nov 20, 2019, 18:43 Jens Deppe  wrote:

> This thread contains more background on the reasons for this proposal:
>
> https://lists.apache.org/thread.html/2418dd1b5f9ae812daa48a51a8d2eb252a3c861a890264f47da3a4d3@%3Cdev.geode.apache.org%3E
>
> On Wed, Nov 20, 2019 at 10:46 AM Ivan Godwin  wrote:
>
> > I've reviewed the PR and I believe I understand the use case, but I feel
> a
> > bit uncomfortable with the misuse of SNI. As I understand it, and as it
> has
> > been already mentioned, SNI is used to determine which SSL certificate
> > should be presented to a client.
> >
> > I think that CLIENT_HELLO_EXTENSION should *not* be associated with SSL,
> > and that a new client property should be used instead. That is, a
> different
> > property, unrelated to SSL, should be used to set what will be verified
> > against CLIENT_HELLO_EXTENSION.
> >
> > On Tue, Nov 19, 2019 at 4:43 PM Jens Deppe  wrote:
> >
> > > I'd like to add my comment from the original PR here again:
> > >
> > >
> > > Although I support the particular use case, I would prefer the
> > > implementation being a bit more abstracted. Specifically, if we
> provided
> > an
> > > extension point which would allow modification of SSLParameters then we
> > > wouldn't be coupling to a particular use case. So I'm thinking that the
> > > user would define (via say a ssl-parameter-extension parameter) a class
> > > that takes in a SSLParameter and perhaps SSLConfig and whatever else
> for
> > > this use-case and does what it needs. The user class would need to
> > > implement an interface something like this:
> > >
> > > public interface SSLParameterExtension {
> > >   SSLParameter modify(SSLParameter, SSLConfig);
> > > }
> > >
> > > I would imagine seeing the user implementation of this being attached
> to
> > > SSLConfig.
> > >
> > >
> > > (https://github.com/apache/geode/pull/4310)
> > >
> > > I don't mind (mis)using the Server Name field to convey some other
> > > information, but since it's possible to abstract the specific nature
> and
> > > application of that information, I think we should do so. Anyone else
> who
> > > looks at the code is going to wonder what the use is especially if the
> > > consumer of that particular piece of info is going to be provided via
> an
> > > external SSLEngine implementation.
> > >
> > > --Jens
> > >
> > > On Tue, Nov 19, 2019 at 1:18 PM Dan Smith  wrote:
> > >
> > > > Can you clarify which connections will use this
> > ssl-server-name-extension
> > > > as part of the Client Hello? client to locator, client to server,
> > server
> > > to
> > > > server, WAN site to WAN site, ... all of the above?
> > > >
> > > > I'm fine with adding the new property.
> > > >
> > > > At some point, I think we need to think about making it easier to
> plug
> > in
> > > > custom logic to control the entire socket creation and TLS
> handshake. I
> > > > think technically you can take over the whole process if you set the
> > > > ssl-use-default-context property and then configure the default
> > > SSLContext
> > > > for your entire process, but that is not ideal.
> > > >
> > > > -Dan
> > > >
> > > > On Tue, Nov 19, 2019 at 8:24 AM Charlie Black 
> > wrote:
> > > >
> > > > > I have read the e-mail and the ticket I am not sure 

Odg: Proposal of new config property "ssl-server-name-extension"

2019-11-26 Thread Mario Ivanac
Hi Sai,


The security provider main class is configured through a java security file:

-Djava.security.properties=custom-security.file



Where we set:

security.provider.1=my.security.provider.class



The security provider is packaged as a .jar and added to the classpath. The 
security provider code is triggered once the geode default context is 
initialized, so there is no room to take over the context before that.



Also, the configuration of the TLS handshake message extensions is part of the 
SSLSocket configuration. I’m not aware of a way to configure this through the 
context.


BR,

Mario


Šalje: Sai Boorlagadda 
Poslano: 24. studenog 2019. 17:33
Prima: dev@geode.apache.org 
Predmet: Re: Proposal of new config property "ssl-server-name-extension"

Hello Mario,

I would like to see if having a custom security provider allows you to
configure the default SSL context to set the SNI?

>From your proposal, I see that you have implemented a Java Security
Provider to provide custom KeyManager implementation which distinguishes
certificate based on which the wan-site the peer client is connecting to.
How are you configuring this security provider? I am assuming you have some
bootstrapping code that inserts your security provider before launching
Geode, and also set gemfire property `ssl-use-default-context` to true to
let Geode use the default SSL context. Can this bootstrapping code create
and configure an SSL context with SNI and set it as default context before
launching geode?

This may appear as a workaround but the rationale behind
`ssl-use-default-context` is to delegate the external environment to
configure the SSL context in a required manner and let Geode just use it.

Sai

On Tue, Nov 19, 2019 at 3:27 AM Mario Ivanac  wrote:

> Hi geode dev,
>
> as a part of solution for https://issues.apache.org/jira/browse/GEODE-7414
> we would like to introduce new config property "ssl-server-name-extension".
>
> This property will contain generic string, which will be added as Server
> Name Indication (SNI) parameter to Client Hello message.
>
> Do you agree with this proposal?
>
> Thanks,
> Mario
>


Odg: Proposal of new config property "ssl-server-name-extension"

2019-12-09 Thread Mario Ivanac
Hi,

Since this proposal is open for almost three weeks,
and we have 2 plus one,

We will continue with proposed solution.

Regards,
Mario


Šalje: Mario Ivanac 
Poslano: 19. studenog 2019. 12:26
Prima: dev@geode.apache.org 
Predmet: Proposal of new config property "ssl-server-name-extension"

Hi geode dev,

as a part of solution for https://issues.apache.org/jira/browse/GEODE-7414
we would like to introduce new config property "ssl-server-name-extension".

This property will contain generic string, which will be added as Server Name 
Indication (SNI) parameter to Client Hello message.

Do you agree with this proposal?

Thanks,
Mario


Odg: Odg: Proposal of new config property "ssl-server-name-extension"

2019-12-09 Thread Mario Ivanac
Hi,

I agree with your proposal/question,
and implementation will follow it.

BR,
Mario

Šalje: Jens Deppe 
Poslano: 9. prosinca 2019. 15:55
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Proposal of new config property "ssl-server-name-extension"

Hi Mario,

I did have a question / suggestion about this proposal (possibly on a
different thread). Would you mind responding to that before proceeding
please. I'll just paste it in here too.


> Jens Deppe 
> Tue, Nov 19, 4:42 PM
> to dev
> I'd like to add my comment from the original PR here again:
>
>
> Although I support the particular use case, I would prefer the
> implementation being a bit more abstracted. Specifically, if we provided an
> extension point which would allow modification of SSLParameters then we
> wouldn't be coupling to a particular use case. So I'm thinking that the
> user would define (via say a ssl-parameter-extension parameter) a class
> that takes in a SSLParameter and perhaps SSLConfig and whatever else for
> this use-case and does what it needs. The user class would need to
> implement an interface something like this:
>
> public interface SSLParameterExtension {
>   SSLParameter modify(SSLParameter, SSLConfig);
> }
>
> I would imagine seeing the user implementation of this being attached to
> SSLConfig.
>
>
> (https://github.com/apache/geode/pull/4310)
>
> I don't mind (mis)using the Server Name field to convey some other
> information, but since it's possible to abstract the specific nature and
> application of that information, I think we should do so. Anyone else who
> looks at the code is going to wonder what the use is especially if the
> consumer of that particular piece of info is going to be provided via an
> external SSLEngine implementation.
>
>
Thanks!
--Jens

On Mon, Dec 9, 2019 at 2:37 AM Mario Ivanac  wrote:

> Hi,
>
> Since this proposal is open for almost three weeks,
> and we have 2 plus one,
>
> We will continue with proposed solution.
>
> Regards,
> Mario
>
> 
> Šalje: Mario Ivanac 
> Poslano: 19. studenog 2019. 12:26
> Prima: dev@geode.apache.org 
> Predmet: Proposal of new config property "ssl-server-name-extension"
>
> Hi geode dev,
>
> as a part of solution for https://issues.apache.org/jira/browse/GEODE-7414
> we would like to introduce new config property "ssl-server-name-extension".
>
> This property will contain generic string, which will be added as Server
> Name Indication (SNI) parameter to Client Hello message.
>
> Do you agree with this proposal?
>
> Thanks,
> Mario
>


Review for #4387

2019-12-16 Thread Mario Ivanac
Hi all,

please could someone review https://issues.apache.org/jira/browse/GEODE-7458
[GEODE-7458] Adding additional option in gfsh command "start gateway sender" to 
control clearing of existing queues - ASF 
JIRA
This Jira has been LDAP enabled, if you are an ASF Committer, please use your 
LDAP Credentials to login. Any problems email us...@infra.apache.org
issues.apache.org
PR https://github.com/apache/geode/pull/4387


Thanks,
Mario


Wiki access

2020-01-02 Thread Mario Ivanac
Hi All,

Happy New Year!

Can I have access to edit the Geode Wiki?
My username is "mivanac".

Thanks,
Mario


Review for #4488

2020-01-06 Thread Mario Ivanac
Hi all,

please could someone review PR https://github.com/apache/geode/pull/4488
[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]
GEODE-7582: Update initlocator list by mivanac · Pull Request #4488 · 
apache/geode
Thank you for submitting a contribution to Apache Geode. In order to streamline 
the review of the contribution we ask you to ensure the following steps have 
been taken: For all changes: [*] Is th...
github.com
Thanks,
Mario


RFC Introduction of SSL Parameter Extension

2020-01-07 Thread Mario Ivanac
Hi,

we have published new RFC "Introduction of SSL Parameter Extension"

https://cwiki.apache.org/confluence/display/GEODE/Introduction+of+SSL+Parameter+Extension


The period of collecting feedback will finish by the end of day 13.01.2020.
If you have any questions and concerns, please add to the RFC.

Thank you.

Regards,
Mario


Odg: RFC - Logging to Standard Out

2020-01-09 Thread Mario Ivanac
+1

Šalje: Jacob Barrett 
Poslano: 8. siječnja 2020. 21:39
Prima: dev@geode.apache.org 
Predmet: RFC - Logging to Standard Out

Please see RFC for Logging to Standard Out.

https://cwiki.apache.org/confluence/display/GEODE/Logging+to+Standard+Out 


Please comment by 1/21/2020.

Thanks,
Jake



Odg: ParallelGatewaySenderQueue implementation

2020-01-24 Thread Mario Ivanac
Hi geode dev,

Do you know more info regarding this bug  49060,
because I think this the cause of issue 
https://issues.apache.org/jira/browse/GEODE-7441.

When closing of region is returned (at stoping of parallel GW sender), 
persistent parallel GW sender queue is restored after restart.

BR,
Mario

Šalje: Mario Ivanac
Poslano: 11. studenog 2019. 13:29
Prima: dev@geode.apache.org 
Predmet: ParallelGatewaySenderQueue implementation

Hi geode dev,

I am investigating SerialGatewaySenderQueue and ParallelGatewaySenderQueue 
implementation,

and I found that in ParallelGatewaySenderQueue.close() function,
code is deleted and comment is left:

// Because of bug 49060 do not close the regions of a parallel queue

My question is, where can I find some more info regarding this bug.

BR,
Mario


Review for PR #4629

2020-02-13 Thread Mario Ivanac
Hi all,

please could someone review PR https://github.com/apache/geode/pull/4629
[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]
GEODE-7727: modify sender thread to detect release of connection by mivanac · 
Pull Request #4629 · apache/geode
Thank you for submitting a contribution to Apache Geode. In order to streamline 
the review of the contribution we ask you to ensure the following steps have 
been taken: For all changes: [*] Is th...
github.com


Thanks,
Mario


Creation of buckets for partitioned region

2020-02-14 Thread Mario Ivanac
Hi geode dev,

we have observed following behavior, at creation of partitioned regions.

After partitioned region is created, initialization of bucket will take place:


  *   only at point when first data is inserted in region (bucket will be 
incrementally created for every added entry, till [max buckets]),
  *   or "select *" query is performed against that partitioned region (in this 
case all buckets [max buckets] are created at once).

Can you confirm that this is expected behavior?

Thanks,
Mario


Odg: acceptance test task failing?

2020-02-17 Thread Mario Ivanac
Hi,

but this commit was merged to develop, on saturday morning, and test are 
falling from friday.

BR,
Mario

Šalje: Robert Houghton 
Poslano: 17. veljače 2020. 16:02
Prima: dev@geode.apache.org 
Predmet: Re: acceptance test task failing?

The test has been erroring consistently on develop since
https://github.com/apache/geode/commit/1a0d9769e482f49e0c725c0d6adc75d324f88958

On Mon, Feb 17, 2020, 06:39 Alberto Bustamante Reyes
 wrote:

> Hi,
>
> After just changing some typos on a PR, I got errors on
> geode-connectors:acceptanceTest task, but the tests were working fine
> before that change. I have seen that the acceptance test task in concur has
> failed since last 14th February (15 executions since that day). and it
> seems all of them got the same error.
>
> Is there any problem with the task or the develop branch?
>
> Thanks,
>
> Alberto B.
>
>


Question about Hash Indexes

2020-02-19 Thread Mario Ivanac
Hi geode dev,

could you help us with some questions regarding Hash Indexes:

  1.  Why are Hash Indexes deprecated?
  2.  Are there any plans in the future for there removal?

Thanks,
Mario


Odg: Question about Hash Indexes

2020-02-19 Thread Mario Ivanac
Thanks for clarification.

Šalje: Jason Huynh 
Poslano: 19. veljače 2020. 18:05
Prima: geode 
Predmet: Re: Question about Hash Indexes

Hi Mario,

>From my understanding:
1.) The Hash Index was implemented unlike a traditional hash index, instead
it is more of a memory saving index that uses hashing to not store keys.
When the backing data structure needs to expand, it needs to rehash a lot
of data and this can be detrimental at runtime.
2.) It's been deprecated so I don't see a reason why it wouldn't be removed
in the future...
2b.) I think a true traditional hash index would be good to have in the
project and maybe we would be able to un-deprecate it instead...

Thanks,
-Jason

On Wed, Feb 19, 2020 at 1:36 AM Mario Ivanac  wrote:

> Hi geode dev,
>
> could you help us with some questions regarding Hash Indexes:
>
>   1.  Why are Hash Indexes deprecated?
>   2.  Are there any plans in the future for there removal?
>
> Thanks,
> Mario
>


PR reviews

2020-02-25 Thread Mario Ivanac
Hi geode dev,

please could someone review PRs:


  *   https://github.com/apache/geode/pull/4711

  *   https://github.com/apache/geode/pull/4719

Thanks,
Mario


Update of SSLParameterExtension interface

2020-05-04 Thread Mario Ivanac
Hi all,

after comments that SSLParameterExtension interface has an init() method that 
takes a DistributionConfig as an argument (which is internal class),
new solution is proposed to replace DistributionConfig with Properties.


New PR is created with new proposal https://github.com/apache/geode/pull/5040,
and also RFC is updated with new proposal 
https://cwiki.apache.org/confluence/display/GEODE/Introduction+of+SSL+Parameter+Extension
Introduction of SSL Parameter Extension - Geode - Apache Software 
Foundation
Hit enter to search. Help. Online Help Keyboard Shortcuts Feed Builder What’s 
new
cwiki.apache.org

Should we cherry-picked this PR into the support/1.12 branch?

Regards,
Mario


Odg: About Geode rolling downgrade

2020-06-05 Thread Mario Ivanac
Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 14. svibnja 2020. 14:45
Prima: geode 
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to j

Odg: Problem in rolling upgrade since 1.12

2020-06-08 Thread Mario Ivanac
Hi,

as I see in 1.12 release, there were bigger impacts in membership part of code 
with:

"Move the membership code into a separate gradle sub-project"

https://issues.apache.org/jira/browse/GEODE-6883

Maybe to check is this connected to reported bug.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 8. lipnja 2020. 9:29
Prima: dev@geode.apache.org 
Predmet: Re: Problem in rolling upgrade since 1.12

Hi,

I attach a diff for the modified test case in case you would like to use it to 
check the problem I mentioned.

BR,

Alberto

From: Alberto Gomez 
Sent: Saturday, June 6, 2020 4:06 PM
To: dev@geode.apache.org 
Subject: Problem in rolling upgrade since 1.12

Hi,

I have observed that since version 1.12 rolling upgrades to future versions 
leave the first upgraded locator "as if" it was still on version 1.12.

This is the output from "list members" before starting the upgrade from version 
1.12:

Name | Id
 | ---
vm2  | 192.168.0.37(vm2:29367:locator):41001 [Coordinator]
vm0  | 192.168.0.37(vm0:29260):41002
vm1  | 192.168.0.37(vm1:29296):41003


And this is the output from "list members" after upgrading the first locator 
from 1.12 to 1.13/1.14:

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0)
vm1  | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0)


Finally this is the output in gfsh once the rolling upgrade has been completed 
(locators and servers upgraded):

Name | Id
 | 

vm2  | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) 
[Coordinator]
vm0  | 192.168.0.37(vm0:2457):41002
vm1  | 192.168.0.37(vm1:2576):41003

I verified this by running manual tests and also by running the following 
upgrade test (had to stop it in the middle to connect via gfsh and get the gfsh 
outputs):
RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable

After the rolling upgrade, the shutdown command fails with the following error:
Member 192.168.0.37(vm2:1453:locator):41001 could not be found.  Please 
verify the member name or ID and try again.

The only way I have found to come out of the situation is by restarting the 
locator.
Once restarted again, the output of gfsh shows that all members are upgraded to 
the new version, i.e. the locator does not show anymore that it is on version 
GEODE 1.12.0.

Anybody has any clue why this is happening?

Thanks in advance,

/Alberto G.


Odg: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

2020-07-14 Thread Mario Ivanac
Hi,

after adding additional checks in failing test, now I can see that test are 
failing due to fault that some batch are distributed at stopping of GW sender.
Cause of that, I suspect that this problem existed prior to this PR, but this 
PR is first to introduce test to check this.

I will continue to investigate this fault, but I can not locally reproduce this 
fault, so this is slowing troubleshooting.

BR,
Mario

Šalje: Alexander Murmann 
Poslano: 14. srpnja 2020. 1:11
Prima: Alexander Murmann 
Kopija: dev@geode.apache.org ; Mario Ivanac 

Predmet: Re: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

We continue to see these WAN tests adding a fail rate of just below 30% in
our mass test runs
<https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/create-mass-test-run-report/builds/10>
.

That's a very significant fail rate that impacts our ability to get our
code committed with confidence.

Can we resolve this issue? Otherwise, I think we need to consider reverting
GEODE-7458.

On Fri, Jun 19, 2020 at 3:28 PM Alexander Murmann 
wrote:

> Looking more into this, it looks like this was introduced by the changes
> for GEODE-7458 - "Adding additional option in gfsh command "start gateway
> sender" to control clearing of existing queues".
>
> That happened about a month ago, but it's inherent to those flaky tests
> that we discover them only after a while. Nonetheless, they become paper
> cuts that ultimately slow us down substantially if they persist.
>
> @Mario Ivanac If I am correct and GEODE-7458 introduced this you were the
> one making that change. Might you be able to take a look at making that
> test more reliable or reverting the change?
>
> Thank you!
>
> On Fri, Jun 19, 2020 at 7:57 AM Alexander Murmann 
> wrote:
>
>> Thank you so much for sharing this, Mark!
>>
>> It looks like there is a big cluster around WAN Gateway. Is anyone
>> already looking into the WAN issues?
>>
>> On Thu, Jun 18, 2020 at 10:06 PM Mark Hanson  wrote:
>>
>>> FYI, the build success rate was around 90% or so about two months ago.
>>>
>>> Here are the DUnit tests that are currently failing in our tests, most
>>> likely in CI, and PR pipelines.
>>>
>>> Please let me know if you have any questions.
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>>
>>> ***
>>>
>>>  Overall build success rate: 78.0% (156 of 200)
>>>
>>>
>>> ***
>>>
>>>
>>>
>>> The following test methods see failures in more than one class.  There
>>> may be a failing *TestBase class
>>>
>>>
>>>
>>> *.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
>>> 12 failures :
>>>
>>>   SerialWANPersistenceEnabledGatewaySenderDUnitTest:  8 failures
>>> (96.000% success rate)
>>>
>>>   SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  4 failures
>>> (98.000% success rate)
>>>
>>>
>>>
>>> *.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
>>> 12 failures :
>>>
>>>   ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  5
>>> failures (97.500% success rate)
>>>
>>>   ParallelWANPersistenceEnabledGatewaySenderDUnitTest:  7 failures
>>> (96.500% success rate)
>>>
>>>
>>>
>>> *.testPingWrongServer:  4 failures :
>>>
>>>   ClientServerMiscSelectorDUnitTest:  3 failures (98.500% success rate)
>>>
>>>   ClientServerMiscDUnitTest:  1 failures (99.500% success rate)
>>>
>>>
>>>
>>>
>>> ***
>>>
>>>
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.serial.SerialWANPersistenceEnabledGatewaySenderDUnitTest:
>>> 8 failures (96.000% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3539
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithClea

Odg: Odg: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

2020-10-23 Thread Mario Ivanac
Hi,

I am not aware that tests 
(GEODE-8172<https://issues.apache.org/jira/browse/GEODE-8172>, 
GEODE-8216<https://issues.apache.org/jira/browse/GEODE-8216>) are still 
failing, after last modification of test (October 7).

BR,
Mario

Šalje: Alexander Murmann 
Poslano: 23. listopada 2020. 22:25
Prima: dev@geode.apache.org 
Kopija: Mario Ivanac 
Predmet: Re: Odg: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

The failing WAN-related tests are still making up the majority of flaky 
tests<https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/create-mass-test-run-report/builds/40>
 we are observing. It's a real issue, since the noise from flaky tests can 
easily hide very real bugs. That's why earlier this year several people spent a 
few months improving the reliability of our tests.

Mario, in July you said you were investigating this. Is that still ongoing? Do 
you need any help?

From: Alexander Murmann 
Sent: Tuesday, July 14, 2020 09:00
To: dev@geode.apache.org 
Cc: Alexander Murmann 
Subject: Re: Odg: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

Thanks for looking into this, Mario!

You are probably right, that the underlying issue might have been
pre-existing and that the test is surfacing it. I am glad though that you
are investigating, because a close to 30% fail rate is a problem. Something
like this happens every once in a while and then someone has to do some
work to resolve historical problems that they hadn't planned to addres.

Thanks!

On Tue, Jul 14, 2020 at 1:06 AM Mario Ivanac  wrote:

> Hi,
>
> after adding additional checks in failing test, now I can see that test
> are failing due to fault that some batch are distributed at stopping of GW
> sender.
> Cause of that, I suspect that this problem existed prior to this PR, but
> this PR is first to introduce test to check this.
>
> I will continue to investigate this fault, but I can not locally reproduce
> this fault, so this is slowing troubleshooting.
>
> BR,
> Mario
> 
> Šalje: Alexander Murmann 
> Poslano: 14. srpnja 2020. 1:11
> Prima: Alexander Murmann 
> Kopija: dev@geode.apache.org ; Mario Ivanac
> 
> Predmet: Re: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes
>
> We continue to see these WAN tests adding a fail rate of just below 30% in
> our mass test runs
> <
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-develop-mass-test-run%2Fjobs%2Fcreate-mass-test-run-report%2Fbuilds%2F10&data=02%7C01%7Camurmann%40vmware.com%7C3eab11cb15d84879960f08d8280f073b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637303392297231701&sdata=STlN5I9iM8gRqsKndH6GKwuY5cVOskAgUjNgkoJvzCw%3D&reserved=0
> >
> .
>
> That's a very significant fail rate that impacts our ability to get our
> code committed with confidence.
>
> Can we resolve this issue? Otherwise, I think we need to consider reverting
> GEODE-7458.
>
> On Fri, Jun 19, 2020 at 3:28 PM Alexander Murmann 
> wrote:
>
> > Looking more into this, it looks like this was introduced by the changes
> > for GEODE-7458 - "Adding additional option in gfsh command "start gateway
> > sender" to control clearing of existing queues".
> >
> > That happened about a month ago, but it's inherent to those flaky tests
> > that we discover them only after a while. Nonetheless, they become paper
> > cuts that ultimately slow us down substantially if they persist.
> >
> > @Mario Ivanac If I am correct and GEODE-7458 introduced this you were the
> > one making that change. Might you be able to take a look at making that
> > test more reliable or reverting the change?
> >
> > Thank you!
> >
> > On Fri, Jun 19, 2020 at 7:57 AM Alexander Murmann 
> > wrote:
> >
> >> Thank you so much for sharing this, Mark!
> >>
> >> It looks like there is a big cluster around WAN Gateway. Is anyone
> >> already looking into the WAN issues?
> >>
> >> On Thu, Jun 18, 2020 at 10:06 PM Mark Hanson 
> wrote:
> >>
> >>> FYI, the build success rate was around 90% or so about two months ago.
> >>>
> >>> Here are the DUnit tests that are currently failing in our tests, most
> >>> likely in CI, and PR pipelines.
> >>>
> >>> Please let me know if you have any questions.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>>
> >>>
> **

[Discussion] RFC for Introduction of new command "alter gateway-sender"

2020-11-03 Thread Mario Ivanac
Hi all,

RFC for introduction of new command "alter gateway-sender" is submited: 
Introduction of new command "alter 
gateway-sender".

Please review it and comment by Nov 16.

Regards,
Mario


Odg: [DISCUSS] CODEOWNERS mechanism feedback

2021-03-25 Thread Mario Ivanac
Hi all,

just to remind on this topic.

Could we expand list of CODEOWNERS in some areas that are bottleneck.
For examples I have 2 PR`s that are waiting on review from WAN area for several 
weeks:

https://github.com/apache/geode/pull/6013
https://github.com/apache/geode/pull/6048

BR,
Mario

Šalje: Anthony Baker 
Poslano: 17. ožujka 2021. 18:08
Prima: dev@geode.apache.org 
Predmet: Re: [DISCUSS] CODEOWNERS mechanism feedback

I think for an active, healthy project community we need to balance two things 
that are somewhat oppositional:  make it easy to contribute and ensure the 
project meets the needs of our users for stability and robust behaviors in the 
face of failures.

> On Mar 17, 2021, at 9:45 AM, Jacob Barrett  wrote:
>
>



> My opinion, I think we need some metric by which to quantify the positive 
> impact against the negative impact otherwise the drag this process has on 
> progress may be discouraging to people joining or staying with this community.
>
> -Jake
>



Geode retry/acknowledge improvement

2021-04-19 Thread Mario Ivanac
Hi all,

we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
are injected between cluster members.
While running traffic, if any Istio/SideCar is restarted, thread will get stuck 
indefinitely, while waiting for reply on sent message.
It seams that due to restarting of proxy, in some cases, messages are lost, and 
sending side is waiting indefinitely for reply.

https://issues.apache.org/jira/browse/GEODE-9075

My question is, what is your estimation, how much effort/work is needed to 
implement message retry/acknowledge logic in geode,
to solve this problem?

BR,
Mario


Odg: Geode retry/acknowledge improvement

2021-04-20 Thread Mario Ivanac
Hi,

after analysis, we  assume that proxy at reception of packets,  sends ACK on 
TCP level, and after that moment proxy is restarted.
This is the reason, we dont see tcp retries.

Simular problem to this (but not packet loss), can be reproduce on geode,
if on existing connection, after request is sent, tcp reset is received. In 
that case, at reception of reset
connection will be closed, and thread will get stuck while waiting on reply.
I will add reproduction steps in ticket.


Šalje: Anthony Baker 
Poslano: 19. travnja 2021. 22:54
Prima: dev@geode.apache.org 
Predmet: Re: Geode retry/acknowledge improvement

Do you have a tcpdump that demonstrates the packet loss? How long did you wait 
for TCP to retry the failed packet delivery (sometimes this can be tweaked with 
tcp_retries2).  Does this manifest as a failed socket connection in geode?  
That ought to trigger some error handling IIRC.

Anthony


> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>
> Hi all,
>
> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
> are injected between cluster members.
> While running traffic, if any Istio/SideCar is restarted, thread will get 
> stuck indefinitely, while waiting for reply on sent message.
> It seams that due to restarting of proxy, in some cases, messages are lost, 
> and sending side is waiting indefinitely for reply.
>
> https://issues.apache.org/jira/browse/GEODE-9075
>
> My question is, what is your estimation, how much effort/work is needed to 
> implement message retry/acknowledge logic in geode,
> to solve this problem?
>
> BR,
> Mario



Odg: Geode retry/acknowledge improvement

2021-04-30 Thread Mario Ivanac
Hi,

just reminding on this topic.

BR,
Mario

Šalje: Mario Ivanac 
Poslano: 20. travnja 2021. 16:31
Prima: dev@geode.apache.org 
Predmet: Odg: Geode retry/acknowledge improvement

Hi,

after analysis, we  assume that proxy at reception of packets,  sends ACK on 
TCP level, and after that moment proxy is restarted.
This is the reason, we dont see tcp retries.

Simular problem to this (but not packet loss), can be reproduce on geode,
if on existing connection, after request is sent, tcp reset is received. In 
that case, at reception of reset
connection will be closed, and thread will get stuck while waiting on reply.
I will add reproduction steps in ticket.


Šalje: Anthony Baker 
Poslano: 19. travnja 2021. 22:54
Prima: dev@geode.apache.org 
Predmet: Re: Geode retry/acknowledge improvement

Do you have a tcpdump that demonstrates the packet loss? How long did you wait 
for TCP to retry the failed packet delivery (sometimes this can be tweaked with 
tcp_retries2).  Does this manifest as a failed socket connection in geode?  
That ought to trigger some error handling IIRC.

Anthony


> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>
> Hi all,
>
> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
> are injected between cluster members.
> While running traffic, if any Istio/SideCar is restarted, thread will get 
> stuck indefinitely, while waiting for reply on sent message.
> It seams that due to restarting of proxy, in some cases, messages are lost, 
> and sending side is waiting indefinitely for reply.
>
> https://issues.apache.org/jira/browse/GEODE-9075
>
> My question is, what is your estimation, how much effort/work is needed to 
> implement message retry/acknowledge logic in geode,
> to solve this problem?
>
> BR,
> Mario



Odg: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Mario Ivanac
Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075&data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VBtRAp6cQx1FEN6h4vBrjcqr3Rxa98JBUBc2Jfl%2F5iU%3D&reserved=0
>>
>> My question is, what is your estimation, how much effort/work is needed to 
>> implement message retry/acknowledge logic in geode,
>> to solve this problem?
>>
>> BR,
>> Mario
>



Odg: Odg: Geode retry/acknowledge improvement

2021-05-05 Thread Mario Ivanac
I think that this is enough.

Šalje: Alberto Gomez 
Poslano: 5. svibnja 2021. 11:29
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

You could answer to their latest e-mail to confirm that Darrel's suspicion 
could happen. Let's see if in that case they are willing to collaborate.

Alberto
____
From: Mario Ivanac 
Sent: Wednesday, May 5, 2021 11:28 AM
To: dev@geode.apache.org 
Subject: Odg: Odg: Geode retry/acknowledge improvement

Hi,

I think that we have problem that Darrel was suspicious, and that some kind of 
notification could be send from peer-to-peer to acknowledge that message is 
received on receiving side.

Regarding test with ip tables, execution gets stuck with conserve-sockets set 
to false or true.

BR,
Mario

Šalje: Darrel Schneider 
Poslano: 30. travnja 2021. 18:38
Prima: dev@geode.apache.org 
Predmet: Re: Odg: Geode retry/acknowledge improvement

In the geode hang you describe would the forced tcp-reset using iptables have 
cause the put send message to fail with an exception writing it to the socket? 
If so then I'd expect the geode Connection class to keep trying to send that 
message by creating a new connection to the member. It will keep doing this 
until the send is successful or the member leaves the cluster.

But if the tcp-reset allows the send to complete, without actually sending the 
request to the other member, then geode will be in trouble and will wait 
forever for a reply. Once geode successfully writes a p2p message on a socket, 
it expects it to be processed on the other side OR it expects the other side to 
leave the geode cluster. If neither of these happen then it will wait forever 
for a response. I've wondered in the past if this was a safe expectation. If 
not then do we need to send some type of msg id and after waiting for a reply 
for too long be able to check with the member to see if it has received the 
message we think we already sent?

You might see different behavior with your iptables test if you use 
conserve-sockets=false. In that case the socket used to write the p2p message 
is also used to read the response. But in the default conserve-sockets=true 
case, the reply comes on a different socket than the one used to send the 
message. It might be hard to get the thread doing the put for gfsh to use 
conserve-sockets=false. You could try just setting that on your server and the 
stuck thread stack should look different from what you are currently seeing.

From: Anthony Baker 
Sent: Friday, April 30, 2021 8:43 AM
To: dev@geode.apache.org 
Subject: Re: Odg: Geode retry/acknowledge improvement

Can you explain the scenario further?  Does the sidecar proxy both the sending 
and receiving socket (geode creates 2 sockets for each p2p member)?  In normal 
cases, closing these sockets should clear up any unacknowledged messages, 
freeing up the thread.

Anthony


> On Apr 20, 2021, at 7:31 AM, Mario Ivanac  wrote:
>
> Hi,
>
> after analysis, we  assume that proxy at reception of packets,  sends ACK on 
> TCP level, and after that moment proxy is restarted.
> This is the reason, we dont see tcp retries.
>
> Simular problem to this (but not packet loss), can be reproduce on geode,
> if on existing connection, after request is sent, tcp reset is received. In 
> that case, at reception of reset
> connection will be closed, and thread will get stuck while waiting on reply.
> I will add reproduction steps in ticket.
>
> 
> Šalje: Anthony Baker 
> Poslano: 19. travnja 2021. 22:54
> Prima: dev@geode.apache.org 
> Predmet: Re: Geode retry/acknowledge improvement
>
> Do you have a tcpdump that demonstrates the packet loss? How long did you 
> wait for TCP to retry the failed packet delivery (sometimes this can be 
> tweaked with tcp_retries2).  Does this manifest as a failed socket connection 
> in geode?  That ought to trigger some error handling IIRC.
>
> Anthony
>
>
>> On Apr 19, 2021, at 7:16 AM, Mario Ivanac  wrote:
>>
>> Hi all,
>>
>> we have deployed geode cluster in kubernetes environment, and Istio/SideCars 
>> are injected between cluster members.
>> While running traffic, if any Istio/SideCar is restarted, thread will get 
>> stuck indefinitely, while waiting for reply on sent message.
>> It seams that due to restarting of proxy, in some cases, messages are lost, 
>> and sending side is waiting indefinitely for reply.
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9075&data=04%7C01%7Cdarrel%40vmware.com%7C34dc38a12a744a5594a108d90beec365%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553942381055798%7CUnknown%7CTWFpbGZsb3d8eyJWI

RFC Improve data inconsistency in replicated regions

2022-02-02 Thread Mario Ivanac
Hi geode dev,

we have published new RFC "Improve data inconsistency in replicated regions"

https://cwiki.apache.org/confluence/display/GEODE/Improve+data+inconsistency+in+replicated+regions

If you have any questions and concerns, please add to the RFC.

Thank you.

Regards,
Mario


RFC Improve possible duplicate logic for secondary buckets in WAN

2022-02-25 Thread Mario Ivanac
Hi geode dev,

we have published new RFC "Improve possible duplicate logic for secondary 
buckets in WAN"

https://cwiki.apache.org/confluence/display/GEODE/Improve+possible+duplicate+logic+for+secondary+buckets+in+WAN

If you have any questions and concerns, please add to the RFC.

Thank you.

Regards,
Mario



Pending review from some code owners for PR linked to GEODE-9642 and GEODE-9809

2022-03-14 Thread Mario Ivanac
Hi,

The following PRs:

https://github.com/apache/geode/pull/6909
[https://opengraph.githubassets.com/270d2813ee93426a4627e111d3aeefb89ce4fd92cb103c8511325f1907ca7144/apache/geode/pull/6909]
GEODE-9642: Wait for colocation completed at partitioned region initialization 
by mivanac · Pull Request #6909 · 
apache/geode
We have observed, that adding parallel GW sender to existing (already 
initialized) partitioned regions is hanging. In case command alter-region is 
executed (attaching GW sender to initialized regio...
github.com

https://github.com/apache/geode/pull/7113
[https://opengraph.githubassets.com/fd7d84803ddba5641996246403f6b36ff8e314ca07838cde76260bcd43d2e25e/apache/geode/pull/7113]
GEODE-9809: stop monitoring of destroyed regions by mivanac · Pull Request 
#7113 · apache/geode
For all changes: [*] Is there a JIRA ticket associated with this PR? Is it 
referenced in the commit message? [*] Has your PR been rebased against the 
latest commit within the target branch (typ...
github.com



have received the approval from several code owners but there are still some 
code owners' reviews pending.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,
Mario


Review for #7381 and #7347

2022-03-28 Thread Mario Ivanac
Hi all,

please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PR: https://github.com/apache/geode/pull/7347



Thanks,
Mario


Odg: Review for #7381 and #7347

2022-04-06 Thread Mario Ivanac
Hi all,

just a reminder,
please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PRs: https://github.com/apache/geode/pull/7515
https://github.com/apache/geode/pull/7517



Thanks,
Mario

Šalje: Mario Ivanac 
Poslano: 28. ožujka 2022. 12:09
Prima: dev@geode.apache.org 
Predmet: Review for #7381 and #7347

Hi all,

please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PR: https://github.com/apache/geode/pull/7347



Thanks,
Mario


Odg: Review for #7381 and #7347

2022-04-24 Thread Mario Ivanac
Hi all,

just a reminder,
please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PRs: https://github.com/apache/geode/pull/7515
https://github.com/apache/geode/pull/7517



Thanks,
Mario

Šalje: Mario Ivanac 
Poslano: 6. travnja 2022. 11:09
Prima: dev@geode.apache.org 
Predmet: Odg: Review for #7381 and #7347

Hi all,

just a reminder,
please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PRs: https://github.com/apache/geode/pull/7515
https://github.com/apache/geode/pull/7517



Thanks,
Mario

Šalje: Mario Ivanac 
Poslano: 28. ožujka 2022. 12:09
Prima: dev@geode.apache.org 
Predmet: Review for #7381 and #7347

Hi all,

please could someone review:

 https://issues.apache.org/jira/browse/GEODE-9484

PR: https://github.com/apache/geode/pull/7381

and

https://issues.apache.org/jira/browse/GEODE-10020

PR: https://github.com/apache/geode/pull/7347



Thanks,
Mario


[PROPOSAL] RFC Introduce monitoring of async writer thread

2022-05-05 Thread Mario Ivanac
Hi,

I'd appreciate some more feedback on this published RFC about "Introduce 
monitoring of async writer thread":

https://cwiki.apache.org/confluence/display/GEODE/Introduce+monitoring+of+async+writer+thread

Thanks in advance.
Mario


Pending review from some code owners for PRs

2022-05-20 Thread Mario Ivanac
Hi,

The following PRs:

https://github.com/apache/geode/pull/7323

https://github.com/apache/geode/pull/7667

https://github.com/apache/geode/pull/7515

https://github.com/apache/geode/pull/7664

have received some approvals from several code owners, but there are still some 
code owners' reviews pending.
or requested changes are still not reviewed.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,
Mario


Odg: Pending review from some code owners for PRs

2022-05-24 Thread Mario Ivanac
Hi,

Friendly reminder about pending reviews of some code owners for these PRs.

Thanks,
Mario

Šalje: Mario Ivanac 
Poslano: 20. svibnja 2022. 15:24
Prima: dev@geode.apache.org 
Predmet: Pending review from some code owners for PRs

Hi,

The following PRs:

https://github.com/apache/geode/pull/7323

https://github.com/apache/geode/pull/7667

https://github.com/apache/geode/pull/7515

https://github.com/apache/geode/pull/7664

have received some approvals from several code owners, but there are still some 
code owners' reviews pending.
or requested changes are still not reviewed.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,
Mario


Pending PR reviews

2022-06-28 Thread Mario Ivanac
Hi,

The following PRs:

https://github.com/apache/geode/pull/7323

https://github.com/apache/geode/pull/7749

https://github.com/apache/geode/pull/7664

are waiting for review for some time.


Could code owners review these PRs?

Thanks,
Mario