Re: [DISCUSS] One more 1.13 change

2020-09-29 Thread Raymond Ingles
+1

On 9/28/20, 3:21 PM, "Dan Smith"  wrote:

Hi,

I'd like to backport this change to support/1.13 as well

GEODE-8522: Switching exception log back to debug - 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F5566&data=02%7C01%7Cringles%40vmware.com%7C7333b0ccfcef446143e308d863e3b629%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369176949268879&sdata=yG0RpDG9d9LCdgQauByTV%2Bfh7nHBWL8rvyX6ZZRKXSE%3D&reserved=0

This cleans up some noise in our logs that customers might see.

[https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Favatars3.githubusercontent.com%2Fu%2F47359%3Fs%3D400%26v%3D4&data=02%7C01%7Cringles%40vmware.com%7C7333b0ccfcef446143e308d863e3b629%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369176949268879&sdata=z%2Fa23PMHASn3GEALwPqI7AXr4o3om1bpz0OkRpyn3rw%3D&reserved=0]
GEODE-8522: Switching exception log back to debug (merge to 1.13) by 
upthewaterspout · Pull Request #5566 · 
apache/geode
This log message happens during the course of normal startup of multiple 
locators. We should not be logging a full stack trace during normal startup. 
(cherry picked from commit 3df057c) Thank you f...
github.com




Re: [DISCUSS] One more 1.13 change

2020-09-29 Thread Dan Smith
Thanks all, I merged the change.

-Dan

From: Raymond Ingles 
Sent: Tuesday, September 29, 2020 7:56 AM
To: dev@geode.apache.org 
Subject: Re: [DISCUSS] One more 1.13 change

+1

On 9/28/20, 3:21 PM, "Dan Smith"  wrote:

Hi,

I'd like to backport this change to support/1.13 as well

GEODE-8522: Switching exception log back to debug - 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F5566&data=02%7C01%7Cdasmith%40vmware.com%7C351d4f5919a6404bccec08d8648bc30e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369898722059642&sdata=HNA%2FwPOsdg2eEwQrQcJgektNwJKQdHpowa%2BxOzMf37U%3D&reserved=0

This cleans up some noise in our logs that customers might see.

[https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Favatars3.githubusercontent.com%2Fu%2F47359%3Fs%3D400%26v%3D4&data=02%7C01%7Cdasmith%40vmware.com%7C351d4f5919a6404bccec08d8648bc30e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637369898722059642&sdata=n95PeiYayrtWKBh418Z0yI4X6hJZQ%2BZ4MZGvvKMBg00%3D&reserved=0]
GEODE-8522: Switching exception log back to debug (merge to 1.13) by 
upthewaterspout · Pull Request #5566 · 
apache/geode
This log message happens during the course of normal startup of multiple 
locators. We should not be logging a full stack trace during normal startup. 
(cherry picked from commit 3df057c) Thank you f...
github.com




[PROPOSAL] Backport GEODE-6008 to support 1.12

2020-09-29 Thread Xiaojian Zhou
Hi,

GEODE-6008 changed “java.lang.IllegalStateException: NioSslEngine has been 
closed” to IOException, which enabled DirectChannel to handle it and retry the 
connection in the case that the connection is closed.

This fix is important and no risk to backport to support/1.12. Please vote for 
it.

Regards
Xiaojian Zhou




Re: [PROPOSAL] Backport GEODE-6008 to support 1.12

2020-09-29 Thread Barrett Oglesby
+1

From: Xiaojian Zhou 
Sent: Tuesday, September 29, 2020 3:09 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL] Backport GEODE-6008 to support 1.12

Hi,

GEODE-6008 changed “java.lang.IllegalStateException: NioSslEngine has been 
closed” to IOException, which enabled DirectChannel to handle it and retry the 
connection in the case that the connection is closed.

This fix is important and no risk to backport to support/1.12. Please vote for 
it.

Regards
Xiaojian Zhou




RE: [PROPOSAL] Backport GEODE-6008 to support 1.12

2020-09-29 Thread Dick Cavender
+1

-Original Message-
From: Xiaojian Zhou  
Sent: Tuesday, September 29, 2020 3:09 PM
To: dev@geode.apache.org
Subject: [PROPOSAL] Backport GEODE-6008 to support 1.12

Hi,

GEODE-6008 changed “java.lang.IllegalStateException: NioSslEngine has been 
closed” to IOException, which enabled DirectChannel to handle it and retry the 
connection in the case that the connection is closed.

This fix is important and no risk to backport to support/1.12. Please vote for 
it.

Regards
Xiaojian Zhou




Re: [PROPOSAL] Backport GEODE-6008 to support 1.12

2020-09-29 Thread Bruce Schuchardt
+1

On 9/29/20, 3:10 PM, "Xiaojian Zhou"  wrote:

Hi,

GEODE-6008 changed “java.lang.IllegalStateException: NioSslEngine has been 
closed” to IOException, which enabled DirectChannel to handle it and retry the 
connection in the case that the connection is closed.

This fix is important and no risk to backport to support/1.12. Please vote 
for it.

Regards
Xiaojian Zhou





Re: Colocated regions missing some buckets after restart

2020-09-29 Thread Donal Evans
Hi Mario,

I've tried using 12 colocated regions, starting the servers within 0.2 seconds 
of each other (according to the locator logs) and ensuring that the order 
they're started in is the same as the order they were shut down in, but I'm 
still unable to reproduce this issue. Is there anything else that I might be 
missing or doing differently from you?

Donal

From: Mario Kevo 
Sent: Monday, September 28, 2020 10:49 PM
To: dev@geode.apache.org 
Subject: Odg: Colocated regions missing some buckets after restart

Hi Donal,

Sometimes you need to do restart two or three times, but mostly it is 
reproduced by first restart.
start locator --name=locator1 --port=10334
start locator --name=locator2 --port=10335 --locators=localhost[10334]
start server --name=server1 --locators=127.0.0.1[10334],127.0.0.1[10335] 
--server-port=40404
start server --name=server2 --locators=127.0.0.1[10334],127.0.0.1[10335] 
--server-port=40405
I'm putting 1 entries, but you can use a lower value.

You need to be really quick with commands. There is an example from my locator 
log.
[info 2020/09/29 07:41:52.060 CEST  
tid=0x1d] Received a join request from 192.168.0.145(server4:22852):41002
[info 2020/09/29 07:41:52.406 CEST  
tid=0x1d] Received a join request from 192.168.0.145(server3:22879):41003

I prepare commands to start server in two terminals, so I can start them almost 
in the same time.
Sorry, I forgot to mention that you need to see which server is stopped first 
and starts him first (The issue was first reproduced on kubernetes, and that is 
how pods restarts servers).
Also if you are not able to reproduce the issue, try to set 10 or more 
colocated regions.

BR,
Mario


Šalje: Donal Evans 
Poslano: 28. rujna 2020. 23:48
Prima: dev@geode.apache.org 
Predmet: Re: Colocated regions missing some buckets after restart

Hi Mario,

I tried to reproduce the issue using the steps you describe, but I wasn't able 
to. After restarting the servers, all regions have the expected 113 buckets, 
and the server startup process is not noticeably slower. I have a few questions 
that might help understand why I'm unable to reproduce this:

  *   Do you see this behaviour 100% of the time with these steps, or is still 
only on some restarts that it shows up?
  *   Could you describe in more detail how exactly you're starting the 
locators/servers? I'm just using the gfsh "start locator" and "start server" 
commands, only specifying ports, with no other settings, so if you're doing 
anything different that may be a factor.
  *   How many entries are you putting into the region, and does the issue 
still reproduce if you use fewer entries? I'm using 1 entries as described 
in your earlier email.
  *   How quick do you have to be when restarting the servers in the two 
terminals at the same time? I'm currently just manually clicking between them 
and executing the two start server commands within a second of each other, but 
if that's not fast enough then I should probably be using a script or something.

Hopefully if we can understand what's different between what I'm doing and what 
you're doing then it will help us understand exactly what's going wrong.

- Donal

From: Mario Kevo 
Sent: Monday, September 28, 2020 6:23 AM
To: dev@geode.apache.org 
Subject: Odg: Colocated regions missing some buckets after restart

Hi all,

After more investigation I found that for some buckets is problem to define 
which server is primary.
While doing getPrimary if existing primary is null it waits for a new primary 
and after some time return null for it.

From what I found is while doing setHosting( 
grabBucket[PartitionedRegionDataStore.java]->grabFreeBucket[PartitionedRegionDataStore.java]->setHosting[ProxyBucketRegion.java]->setHosting[BucketAdvisor.java])
 it volunteer for primary and sendProfileUpdate to all other servers.
There it calls BucketProfileUpdateMessage.send and there is stucked as it 
cannot get response from the other members.

Ticket is opened on GEODE: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8546&data=02%7C01%7Cdoevans%40vmware.com%7Cdd061beb634d4fd7805708d8643b6b70%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637369553660830374&sdata=J79mRS8BYs2oTHGgz%2BqgmDIXO1zICK%2FIXSxKj%2FvWXF8%3D&reserved=0
How to reproduce the issue:

  1.   Start two locators and two servers
  2.   Create PARTITION_REDUNDANT_PERSISTENT region with redundant-copies=1
  3.   Create few PARTITION_REDUNDANT regions(I used six regions) colocated 
with persistent region and redundant-copies=1
  4.   Put some entries.
  5.   Restart servers(you can simply run "kill -15 " and then 
from two terminals start both of them at the same time)
  6.   It will take a time to get server startup finished and for the latest 
region bucketCount will be zero on one member

If someone with more experience with