[ 
https://issues.apache.org/jira/browse/GEODE-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422528#comment-17422528
 ] 

Mario Ivanac edited comment on GEODE-9642 at 9/30/21, 5:14 AM:
---------------------------------------------------------------

Steps to reproduce fault in smaller system:

 

start locator --name=locator-ln --port=10332 --locators=localhost[10332] 
--mcast-port=0 --J=-Dgemfire.remote-locators=localhost[10331] 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.jmx-manager-start=true 
--J=-Dgemfire.jmx-manager-http-port=8082 --J=-Dgemfire.jmx-manager-port=1092

configure pdx --read-serialized=true --disk-store=data

start server --name=server11 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40011 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server12 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40012 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server13 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40013 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server14 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40014 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

create disk-store --name=data --max-oplog-size=10 --dir=.

create region --name=/testregion --type=PARTITION_REDUNDANT_PERSISTENT 
--disk-store=data --total-num-buckets=13

query --query="select key,value from /testregion.entries"

create gateway-sender --id=ln --remote-distributed-system-id=2 
--enable-persistence=true --disk-store-name=data --parallel=true

##after all is up, execute command

alter region --name=/testregion --gateway-sender-id=ln

 

As a result, command hangs few minutes. This is a fault.


was (Author: mivanac):
Steps to reproduce fault in smaller system:

 

start locator --name=locator-ln --port=10332 --locators=localhost[10332] 
--mcast-port=0 --J=-Dgemfire.remote-locators=localhost[10331] 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.jmx-manager-start=true 
--J=-Dgemfire.jmx-manager-http-port=8082 --J=-Dgemfire.jmx-manager-port=1092

configure pdx --read-serialized=true --disk-store=data


start server --name=server11 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40011 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server12 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40012 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server13 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40013 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

start server --name=server14 --locators=localhost[10332] --mcast-port=0 
--J=-XX:+UseG1GC --J=-Xms500m --J=-Xmx500m --server-port=40014 
--J=-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true 
--J=-Dgemfire.disk.recoverValuesSync=true --off-heap-memory-size=512m 
--J=-Dgemfire.DEFAULT_MAX_OPLOG_SIZE=10 --J=-Dgemfire.EXPIRY_THREADS=1 
--J=-Dgemfire.distributed-system-id=1 --J=-Dgemfire.conserve-sockets=false

create disk-store --name=data --max-oplog-size=10 --dir=.

create region --name=/testregion --type=PARTITION_REDUNDANT_PERSISTENT 
--disk-store=data --total-num-buckets=13

query --query="select key,value from /testregion.entries"

create gateway-sender --id=ln --remote-distributed-system-id=2 
--enable-persistence=true --disk-store-name=data --parallel=true

# after all is up, execute command

alter region --name=/testregion --gateway-sender-id=ln

 

As a result, command hangs few minutes. This is a fault.

> Adding GW sender to allready initialized partitioned region is hanging in 
> large cluster
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-9642
>                 URL: https://issues.apache.org/jira/browse/GEODE-9642
>             Project: Geode
>          Issue Type: Bug
>          Components: regions, wan
>    Affects Versions: 1.13.0, 1.14.0
>            Reporter: Mario Ivanac
>            Assignee: Mario Ivanac
>            Priority: Major
>              Labels: needsTriage, pull-request-available
>
> We have observed, that adding parallel GW sender to existing (allready 
> initialized) partitioned regions is hanging.
> In case command alter-region is executed (attaching GW sender to initialized 
> region), it is hanging in cluster with more then 20 servers.
> Execution of command in cluster with 16 or less servers was successful, but 
> if cluster is expanded to 20 or more, command is hanging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to