Re: Requests taking too long if one member of the cluster fails

2020-11-23 Thread Mario Salazar de Torres
Hi @John Blum,

I am grateful for your explanation. Really, thanks! It has been an instructive 
read.
Finally understood why if you allow to write on all the replicas you'll end up 
risking the consistency.
Consequently, understood that, as in life itself, for distributed databases you 
can't have everything (C, A and P).

So yes, we'll have to tune the parametrization in our cluster setup so the time 
the requests are failing ( the ones falling into the buckets for which the sick 
server is primary owner ) is reduced.
And, we are tuning the client parameters, so requests that are going to fail, 
do it quickly, allowing the ones which are not supposed to fail, to entering 
the processing queue straight away.

I'll follow up on how it goes once all the test are executed 🙂

BR,
Mario.

From: John Blum 
Sent: Monday, November 23, 2020 3:42 AM
To: dev@geode.apache.org 
Cc: miguel.g.gar...@ericsson.com 
Subject: Re: Requests taking too long if one member of the cluster fails

Hi Mario-


1) Regarding why only write to the primary (bucket) of a PR (?)... again, it 
has to do with consistency.

Fundamentally, a distributed system is constrained by CAP.  The system can 
either be consistent or available in the face of network partitions.  You 
cannot have your cake and eat it too, 😉.

By design, Apache Geode favors consistency over availability.  However, it 
doesn't mean Geode becomes unavailable when a node or nodes, or the network, 
fails. With the ability to configure redundant copies, it is more like "limited 
availability" when a member or portion of the cluster is severed from the rest 
of the system, until the member(s) or network recovers.

But, to guarantee consistency, a single node (i.e. the "primary") hosting the 
PR must be "the source of truth".  If writes are allowed to go to secondaries, 
then you need a sophisticated consensus algorithm (e.g. Paxos, Raft) to resolve 
conflicts when 2 or more writes in different copies change the same logical 
object but differ in value.  Therefore, writes go to the primary and are then 
distributed to the secondaries (which require an ACK) while holding a lock.

If you think about this in object-oriented terms, the safest object in a highly 
concurrent system is an immutable one.  However, if an object can be modified 
by multiple Threads, then it is safer if all Threads access the object though 
the same control plane to uphold the invariants of the object.

NOTE: For an object, serializing access through synchronization does increase 
contention.  However, keep in mind that a PR does not just have 1 primary.  
Each bucket of the PR (defaults to 113; is tunable) has a primary thereby 
reducing contenting on writes.

Finally, Geode's consistency guarantees are much more sophisticated than what I 
described above. You can read more about Geode's consistency 
here
 [1] (an entire chapter has been dedicated to this very important topic).



2) Regarding member-timeout...

Can this setting be too low?  Yes, of course; you must be careful.

Setting too low of a member-timeout could result in the system thrashing 
between the member being kicked out and the member rejoining the system.

This is costly because, after a member is kicked out, the system must "restore 
redundancy".  When the member rejoins, a "fence & merge" process occurs, then 
the system may need to "rebalance" the data.

Why would a node bounce between being a member, and part of the system, and 
getting kicked out?

Well, it depends on your infrastructure, for one.  If you have an unreliable 
network (more applicable in the cloud environments in certain cases), then 
minor but frequent network blips that severe 1 or more members could cause the 
member(s) to bounce between being kicked out and rejoining.  If enough members 
are severed from the system, then the system might need to decide on a quorum.

If a member is sick (e.g. running low on memory) thereby making the member 
seemingly unresponsive when in fact the member is just overloaded, this can 
cause issues.

There are many factors to consider when configuring Geode.  Don't simply set a 
property thinking it just solved my immediate problem when in fact it might 
have shifted the problem somewhere else.

The setting for member-timeout may very well be what you need, or you may need 
to consider other factors (e.g. the size of your system, both number of nodes 
as well as the size of the data, level of redundancy, you mention collocated 
data (this also is a factor), the environment, etc, etc).

This is the trickiest part of using any system like Geode.  You typically must 
tune it properly to your UC and requirements over several iterations to meet 
your SLAs.

This 
chapter
 [2] in the User Guide will be y

Re: Requests taking too long if one member of the cluster fails

2020-11-23 Thread Jacob Barrett
On the native side of things I would suggest trying the same test with a Java 
client and compare. It is very possible the C++ client is lacking in its 
ability to respond to failures as timely as the more heavily used Java client.

-Jake

On Nov 23, 2020, at 3:54 AM, Mario Salazar de Torres 
mailto:mario.salazar.de.tor...@est.tech>> 
wrote:

Hi @John Blum,

I am grateful for your explanation. Really, thanks! It has been an instructive 
read.
Finally understood why if you allow to write on all the replicas you'll end up 
risking the consistency.
Consequently, understood that, as in life itself, for distributed databases you 
can't have everything (C, A and P).

So yes, we'll have to tune the parametrization in our cluster setup so the time 
the requests are failing ( the ones falling into the buckets for which the sick 
server is primary owner ) is reduced.
And, we are tuning the client parameters, so requests that are going to fail, 
do it quickly, allowing the ones which are not supposed to fail, to entering 
the processing queue straight away.

I'll follow up on how it goes once all the test are executed 🙂

BR,
Mario.

From: John Blum mailto:jb...@vmware.com>>
Sent: Monday, November 23, 2020 3:42 AM
To: dev@geode.apache.org 
mailto:dev@geode.apache.org>>
Cc: miguel.g.gar...@ericsson.com 
mailto:miguel.g.gar...@ericsson.com>>
Subject: Re: Requests taking too long if one member of the cluster fails

Hi Mario-


1) Regarding why only write to the primary (bucket) of a PR (?)... again, it 
has to do with consistency.

Fundamentally, a distributed system is constrained by CAP.  The system can 
either be consistent or available in the face of network partitions.  You 
cannot have your cake and eat it too, 😉.

By design, Apache Geode favors consistency over availability.  However, it 
doesn't mean Geode becomes unavailable when a node or nodes, or the network, 
fails. With the ability to configure redundant copies, it is more like "limited 
availability" when a member or portion of the cluster is severed from the rest 
of the system, until the member(s) or network recovers.

But, to guarantee consistency, a single node (i.e. the "primary") hosting the 
PR must be "the source of truth".  If writes are allowed to go to secondaries, 
then you need a sophisticated consensus algorithm (e.g. Paxos, Raft) to resolve 
conflicts when 2 or more writes in different copies change the same logical 
object but differ in value.  Therefore, writes go to the primary and are then 
distributed to the secondaries (which require an ACK) while holding a lock.

If you think about this in object-oriented terms, the safest object in a highly 
concurrent system is an immutable one.  However, if an object can be modified 
by multiple Threads, then it is safer if all Threads access the object though 
the same control plane to uphold the invariants of the object.

NOTE: For an object, serializing access through synchronization does increase 
contention.  However, keep in mind that a PR does not just have 1 primary.  
Each bucket of the PR (defaults to 113; is tunable) has a primary thereby 
reducing contenting on writes.

Finally, Geode's consistency guarantees are much more sophisticated than what I 
described above. You can read more about Geode's consistency 
here
 [1] (an entire chapter has been dedicated to this very important topic).



2) Regarding member-timeout...

Can this setting be too low?  Yes, of course; you must be careful.

Setting too low of a member-timeout could result in the system thrashing 
between the member being kicked out and the member rejoining the system.

This is costly because, after a member is kicked out, the system must "restore 
redundancy".  When the member rejoins, a "fence & merge" process occurs, then 
the system may need to "rebalance" the data.

Why would a node bounce between being a member, and part of the system, and 
getting kicked out?

Well, it depends on your infrastructure, for one.  If you have an unreliable 
network (more applicable in the cloud environments in certain cases), then 
minor but frequent network blips that severe 1 or more members could cause the 
member(s) to bounce between being kicked out and rejoining.  If enough members 
are severed from the system, then the system might need to decide on a quorum.

If a member is sick (e.g. running low on memory) thereby making the member 
seemingly unresponsive when in fact the member is just overloaded, this can 
cause issues.

There are many factors to consider when configuring Geode.  Don't simply set a 
property thinking it just solved my immediate problem when in fact it might 
have shifted the problem somewhere else.

The setting for member-timeout may very well be what you need, or you may need 
to consider o

Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Ankit:
 
Geode provided lucene query on json field. Your query can be supported. 
https://gemfire.docs.pivotal.io/910/geode/tools_modules/lucene_integration.html

However in above document, it did not provided a query example on JSON object. 

I can give you some sample code to query on JSON.

Regards
Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni  wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max 
contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray ([]) 
as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME where 
data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME where 
data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to JSONFormatter API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above sample
> queries.*
>
> *Thanks*
> *Ankit*
>



Re: Geode - store and query JSON documents

2020-11-23 Thread Anilkumar Gingade
Gester, Looking at the sample query, I Believe Ankit is asking about OQL query 
not Lucene...

-Anil.


On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:

Ankit:

Geode provided lucene query on json field. Your query can be supported. 

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.html&data=04%7C01%7Cagingade%40vmware.com%7Cd513ee6b680c483830df08d88fd194f5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417477593275133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=l4RfUYfWLRnun%2BOYKtIE0pjkC047LsWBBNMdQb3MY2M%3D&reserved=0

However in above document, it did not provided a query example on JSON 
object. 

I can give you some sample code to query on JSON.

Regards
Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni  
wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully 
in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max 
contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray 
([]) as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME where 
data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME where 
data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from 
() a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to JSONFormatter 
API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above sample
> queries.*
>
> *Thanks*
> *Ankit*
>




Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Anil:

The syntax is OQL. But I understand they want to query JSON object base on the 
criteria. 

On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:

Gester, Looking at the sample query, I Believe Ankit is asking about OQL 
query not Lucene...

-Anil.


On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:

Ankit:

Geode provided lucene query on json field. Your query can be supported. 

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.html&data=04%7C01%7Czhouxh%40vmware.com%7Ca1c897031e4b481a2f1508d88fd270f6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417481290223899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pxnkFepPHN61G0wIyfROqIFx5J9aRdyg1GpGHN%2FCU74%3D&reserved=0

However in above document, it did not provided a query example on JSON 
object. 

I can give you some sample code to query on JSON.

Regards
Xiaojian Zhou

On 11/22/20, 11:53 AM, "ankit Soni"  wrote:

Hello geode-devs, please provide a guidance on this.

Ankit.

On Sat, 21 Nov 2020 at 10:23, ankit Soni 
 wrote:

> Hello team,
>
> I am *evaluating usage of Geode (1.12) with storing JSON 
documents and
> querying the same*. I am able to store the json records 
successfully in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can 
max contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": 
\"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and 
JSONArray ([]) as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new 
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME 
where data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME 
where data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION 
--redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive time: region.put("key",
> JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record 
from () a
> file and Storing in geode approx. takes . 3 secs
>Is there any suggestions/configuration related to 
JSONFormatter API or
> other to optimize this...?
>
> *Looking forward to guidance on querying this JOSN for above 
sample
> queries.*
>
> *Thanks*
> *Ankit*
>





Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-11-23 Thread Xiaojian Zhou
Passed dunit tests is not enough. It might only mean we don't have enough test 
coverage. 

We need to inspect the code to see what will be the behavior when 2 servers 
configured different conserve-sockets.

On 11/20/20, 3:30 PM, "Donal Evans"  wrote:

Regarding behaviour during RollingUpgrade; I created a draft PR with this 
change to test the feasibility and see what problems, if any, would be caused 
by tests assuming the default setting to be true. After fixing two DUnit tests 
that were not explicitly setting the value of conserve-sockets to true, no test 
failures were observed. I also ran a large suite of proprietary tests that 
include rolling upgrade and observed no problems there. This doesn't mean that 
there would definitely be no problems caused by this change, but I can at least 
say that none of the testing we currently have showed any problems.

From: Anthony Baker 
Sent: Friday, November 20, 2020 8:52 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false

Question:  how would this work with a rolling upgrade?  If the user did not 
set this property and we changed the default I believe that we would prevent 
the upgraded member from rejoining the cluster.

Of course the user could explicitly set this property as you point out.


Anthony


> On Nov 20, 2020, at 8:49 AM, Donal Evans  wrote:
>
> While I agree that the potential impact of having the setting changed out 
from a user may be high, the cost of addressing that change is very small. All 
users have to do is explicitly set the conserve-sockets value to true if they 
were previously using the default and they will be back to where they started 
with no change in behaviour or resource requirements. This could be as simple 
as adding a single line to a properties file, which seems like a pretty small 
inconvenience.
>
> Get Outlook for 
Android
>
> 
> From: Anthony Baker 
> Sent: Thursday, November 19, 2020 5:57:33 PM
> To: dev@geode.apache.org 
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false
>
> I think there are many good reasons to flip the default value for this 
property. I do question whether requiring a user to allocate new hardware to 
support the changed resource requirements is appropriate for a minor version 
bump. In most cases I think that would come as an unwelcome surprise during the 
upgrade.
>
> Anthony
>
>> On Nov 19, 2020, at 10:42 AM, Dan Smith  wrote:
>>
>> Personally, this has caused enough grief in the past (both ways, 
actually!) that I 'd say this is a major version change.
>> I agree with John. Either value of conserve-sockets can crash or hang 
your system depending on your use case.
>>
>> If this was just a matter of slowing down or speeding up performance, I 
think we could change it. But users that are impacted won't just see their 
system slow down. It will crash or hang. Potentially only with production sized 
workloads.
>>
>> With conserve-sockets=false every thread on the server creates its own 
sockets to other servers. With N servers that's N sockets per thread. With our 
default of a max of 800 threads for client connections and a 20 server cluster 
you are looking at a worst case of 800 * 20 = 16K sending sockets per server, 
with another 16K receiving sockets and 16K receiving threads. That's before 
considering function execution threads, WAN receivers, and various other 
executors we have on the server. Users with too many threads will hit their 
file descriptor or thread limits. Or they will run out of memory for thread 
stacks, socket buffers, etc.
>>
>> -Dan
>>
>




Re: Requests taking too long if one member of the cluster fails

2020-11-23 Thread Anthony Baker
Yes, lowering the member timeout is one approach I’ve seen taken for 
applications that demand ultra low latency.  These workloads need to provide 
not just low “average” or even p99 latency, but put a hard limit on the max 
value.

When you do this you need to ensure coherency across at all aspects of timeouts 
(eg client read timeouts and retries).  You need to ensure that GC pauses don’t 
cause instability in the cluster.  For example, if a GC pause is greater than 
the member timeout, you should go back and re-tune your heap settings to drive 
down GC.  If you are running in a container of VM you need to ensure sufficient 
resources so that the GemFIre process is never paused.

All this presupposes a stable and performant network infrastructure.

Anthony


On Nov 21, 2020, at 1:40 PM, Mario Salazar de Torres 
mailto:mario.salazar.de.tor...@est.tech>> 
wrote:

So, what I've tried here is to set a really low member-timeout, which results 
the server holding the secondary copy becoming the primary owner in around 
<600ms. That's quite a huge improvement,
but I wanted to ask you if setting this member-timeout too low might carry 
unforeseen consequences.



Re: Geode - store and query JSON documents

2020-11-23 Thread ankit Soni
Hi
I am looking for any means of querying (OQL/Lucene/API etc..?) this stored
data. Looking for achieving this functionality first and second, in a
performant way.

I shared the OQL like syntax, to share my use-case easily and based on some
reference found on doc. I am ok if a Lucene query or some other way can
fetch the results.

It will be of great help if you share the sample query/code fetching this
data .

Thanks
Ankit.


On Mon, 23 Nov 2020 at 22:43, Xiaojian Zhou  wrote:

> Anil:
>
> The syntax is OQL. But I understand they want to query JSON object base on
> the criteria.
>
> On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:
>
> Gester, Looking at the sample query, I Believe Ankit is asking about
> OQL query not Lucene...
>
> -Anil.
>
>
> On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:
>
> Ankit:
>
> Geode provided lucene query on json field. Your query can be
> supported.
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.html&data=04%7C01%7Czhouxh%40vmware.com%7Ca1c897031e4b481a2f1508d88fd270f6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417481290223899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pxnkFepPHN61G0wIyfROqIFx5J9aRdyg1GpGHN%2FCU74%3D&reserved=0
>
> However in above document, it did not provided a query example on
> JSON object.
>
> I can give you some sample code to query on JSON.
>
> Regards
> Xiaojian Zhou
>
> On 11/22/20, 11:53 AM, "ankit Soni" 
> wrote:
>
> Hello geode-devs, please provide a guidance on this.
>
> Ankit.
>
> On Sat, 21 Nov 2020 at 10:23, ankit Soni <
> ankit.soni.ge...@gmail.com> wrote:
>
> > Hello team,
> >
> > I am *evaluating usage of Geode (1.12) with storing JSON
> documents and
> > querying the same*. I am able to store the json records
> successfully in
> > geode but seeking guidance on how to query them.
> > More details on code and sample json is,
> >
> >
> > *Sample client-code*
> >
> > import org.apache.geode.cache.client.ClientCache;
> > import org.apache.geode.cache.client.ClientCacheFactory;
> > import org.apache.geode.cache.client.ClientRegionShortcut;
> > import org.apache.geode.pdx.JSONFormatter;
> > import org.apache.geode.pdx.PdxInstance;
> >
> > public class MyTest {
> >
> > *//NOTE: Below is truncated json, single json document
> can max contain an array of col1...col30 (30 diff attributes) within data. *
> > public final static  String jsonDoc_2 = "{" +
> > "\"data\":[{" +
> > "\"col1\": {" +
> > "\"k11\": \"aaa\"," +
> > "\"k12\":true," +
> > "\"k13\": ," +
> > "\"k14\":
> \"2020-12-31:00:00:00\"" +
> > "}," +
> > "\"col2\":[{" +
> > "\"k21\": \"22\"," +
> > "\"k22\": true" +
> > "}]" +
> > "}]" +
> > "}";
> >
> > * //NOTE: Col1col30 are mix of JSONObject ({}) and
> JSONArray ([]) as shown above in jsonDoc_2;*
> >
> > public static void main(String[] args){
> >
> > //create client-cache
> > ClientCache cache = new
> ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> > Region region = cache. PdxInstance>createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> > .create(REGION_NAME);
> >
> > //store json document
> > region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
> >
> > //How to query json document like,
> >
> > // 1. select col2.k21, col1, col20 from /REGION_NAME
> where data.col2.k21 = '22' OR data.col2.k21 = '33'
> >
> > // 2. select col2.k21, col1.k11, col1 from
> /REGION_NAME where data.col1.k11 in ('aaa', 'xxx', 'yyy')
> > }
> > }
> >
> > *Server: Region-creation*
> >
> > gfsh> create region --name=REGION_NAME --type=PARTITION
> --redundant-copies=1 --total-num-buckets=61
> >
> >
> > *Se

Re: Geode - store and query JSON documents

2020-11-23 Thread Xiaojian Zhou
Ankit:

Anil can provide you some sample code of OQL query on JSON.

I will find some lucene sample code on JSON for you. 

Regards
Xiaojian

On 11/23/20, 9:27 AM, "ankit Soni"  wrote:

Hi
I am looking for any means of querying (OQL/Lucene/API etc..?) this stored
data. Looking for achieving this functionality first and second, in a
performant way.

I shared the OQL like syntax, to share my use-case easily and based on some
reference found on doc. I am ok if a Lucene query or some other way can
fetch the results.

It will be of great help if you share the sample query/code fetching this
data .

Thanks
Ankit.


On Mon, 23 Nov 2020 at 22:43, Xiaojian Zhou  wrote:

> Anil:
>
> The syntax is OQL. But I understand they want to query JSON object base on
> the criteria.
>
> On 11/23/20, 9:08 AM, "Anilkumar Gingade"  wrote:
>
> Gester, Looking at the sample query, I Believe Ankit is asking about
> OQL query not Lucene...
>
> -Anil.
>
>
> On 11/23/20, 9:02 AM, "Xiaojian Zhou"  wrote:
>
> Ankit:
>
> Geode provided lucene query on json field. Your query can be
> supported.
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgemfire.docs.pivotal.io%2F910%2Fgeode%2Ftools_modules%2Flucene_integration.html&data=04%7C01%7Czhouxh%40vmware.com%7Cf39e257a59314869f37108d88fd51348%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417492605622263%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TzvDCdlG6olUERrjYy%2F1L0ZqwbyaPgW6FCzXWoOSLJw%3D&reserved=0
>
> However in above document, it did not provided a query example on
> JSON object.
>
> I can give you some sample code to query on JSON.
>
> Regards
> Xiaojian Zhou
>
> On 11/22/20, 11:53 AM, "ankit Soni" 
> wrote:
>
> Hello geode-devs, please provide a guidance on this.
>
> Ankit.
>
> On Sat, 21 Nov 2020 at 10:23, ankit Soni <
> ankit.soni.ge...@gmail.com> wrote:
>
> > Hello team,
> >
> > I am *evaluating usage of Geode (1.12) with storing JSON
> documents and
> > querying the same*. I am able to store the json records
> successfully in
> > geode but seeking guidance on how to query them.
> > More details on code and sample json is,
> >
> >
> > *Sample client-code*
> >
> > import org.apache.geode.cache.client.ClientCache;
> > import org.apache.geode.cache.client.ClientCacheFactory;
> > import org.apache.geode.cache.client.ClientRegionShortcut;
> > import org.apache.geode.pdx.JSONFormatter;
> > import org.apache.geode.pdx.PdxInstance;
> >
> > public class MyTest {
> >
> > *//NOTE: Below is truncated json, single json document
> can max contain an array of col1...col30 (30 diff attributes) within 
data. *
> > public final static  String jsonDoc_2 = "{" +
> > "\"data\":[{" +
> > "\"col1\": {" +
> > "\"k11\": \"aaa\"," +
> > "\"k12\":true," +
> > "\"k13\": ," +
> > "\"k14\":
> \"2020-12-31:00:00:00\"" +
> > "}," +
> > "\"col2\":[{" +
> > "\"k21\": \"22\"," +
> > "\"k22\": true" +
> > "}]" +
> > "}]" +
> > "}";
> >
> > * //NOTE: Col1col30 are mix of JSONObject ({}) and
> JSONArray ([]) as shown above in jsonDoc_2;*
> >
> > public static void main(String[] args){
> >
> > //create client-cache
> > ClientCache cache = new
> ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> > Region region = cache. PdxInstance>createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> > .create(REGION_NAME);
> >
> > //store json document
> > region.put("key", 
JSONFormatter.fromJSON(jsonDoc_2));
> >
> > //How 

Geode - store and query JSON documents

2020-11-23 Thread ankit Soni
 Hello geode-dev,

I am *evaluating usage of Geode (1.12) with storing JSON documents and
querying the same*. I am able to store the json records successfully in
geode but seeking guidance on how to query them.
More details on code and sample json is,


*Sample client-code*

import org.apache.geode.cache.client.ClientCache;
import org.apache.geode.cache.client.ClientCacheFactory;
import org.apache.geode.cache.client.ClientRegionShortcut;
import org.apache.geode.pdx.JSONFormatter;
import org.apache.geode.pdx.PdxInstance;

public class MyTest {

*//NOTE: Below is truncated json, single json document can max
contain an array of col1...col30 (30 diff attributes) within data. *
public final static  String jsonDoc_2 = "{" +
"\"data\":[{" +
"\"col1\": {" +
"\"k11\": \"aaa\"," +
"\"k12\":true," +
"\"k13\": ," +
"\"k14\": \"2020-12-31:00:00:00\"" +
"}," +
"\"col2\":[{" +
"\"k21\": \"22\"," +
"\"k22\": true" +
"}]" +
"}]" +
"}";

* //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray
([]) as shown above in jsonDoc_2;*

public static void main(String[] args){

//create client-cache
ClientCache cache = new
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
.create(REGION_NAME);

//store json document
region.put("key", JSONFormatter.fromJSON(jsonDoc_2));

//How to query json document like,

// 1. select col2.k21, col1, col20 from /REGION_NAME where
data.col2.k21 = '22' OR data.col2.k21 = '33'

// 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
data.col1.k11 in ('aaa', 'xxx', 'yyy')
}
}

*Server: Region-creation*

gfsh> create region --name=REGION_NAME --type=PARTITION
--redundant-copies=1 --total-num-buckets=61


*Setup: Distributed cluster of 3 nodes
*

*My Observations/Problems*
-  Put operation takes excessive time: region.put("key",
JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
file and Storing in geode approx. takes . 3 secs
   Is there any suggestions/configuration related to JSONFormatter API or
other to optimize this...?

*Looking forward to guidance on querying this JOSN for above sample
queries.*

*Thanks*
*Ankit.*


Geode - store and query JSON documents

2020-11-23 Thread ankit Soni
 Hello geode-dev,

I am *evaluating usage of Geode (1.12) with storing JSON documents and
querying the same*. I am able to store the json records successfully in
geode but seeking guidance on how to query them.
More details on code and sample json is,


*Sample client-code*

import org.apache.geode.cache.client.ClientCache;
import org.apache.geode.cache.client.ClientCacheFactory;
import org.apache.geode.cache.client.ClientRegionShortcut;
import org.apache.geode.pdx.JSONFormatter;
import org.apache.geode.pdx.PdxInstance;

public class MyTest {

*//NOTE: Below is truncated json, single json document can max
contain an array of col1...col30 (30 diff attributes) within data. *
public final static  String jsonDoc_2 = "{" +
"\"data\":[{" +
"\"col1\": {" +
"\"k11\": \"aaa\"," +
"\"k12\":true," +
"\"k13\": ," +
"\"k14\": \"2020-12-31:00:00:00\"" +
"}," +
"\"col2\":[{" +
"\"k21\": \"22\"," +
"\"k22\": true" +
"}]" +
"}]" +
"}";

* //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray
([]) as shown above in jsonDoc_2;*

public static void main(String[] args){

//create client-cache
ClientCache cache = new
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
.create(REGION_NAME);

//store json document
region.put("key", JSONFormatter.fromJSON(jsonDoc_2));

//How to query json document like,

// 1. select col2.k21, col1, col20 from /REGION_NAME where
data.col2.k21 = '22' OR data.col2.k21 = '33'

// 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
data.col1.k11 in ('aaa', 'xxx', 'yyy')
}
}

*Server: Region-creation*

gfsh> create region --name=REGION_NAME --type=PARTITION
--redundant-copies=1 --total-num-buckets=61


*Setup: Distributed cluster of 3 nodes
*

*My Observations/Problems*
-  Put operation takes excessive time: region.put("key",
JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
file and Storing in geode approx. takes . 3 secs
   Is there any suggestions/configuration related to JSONFormatter API or
other to optimize this...?

*Looking forward to guidance on querying this JOSN for above sample
queries.*

*Thanks*
*Ankit.*


Re: Geode - store and query JSON documents

2020-11-23 Thread Anilkumar Gingade
Ankit,

Here is how you can query your JSON object.

String queryStr = "SELECT d.col1 FROM /JsonRegion v, v.data d where d.col1.k11 
= 'aaa'";

As replied earlier; the data is stored as PdxInstance type in the cache. In the 
PdxInstance, the data is stored as top level or nested collection of 
objects/values based on input JSON object structure. 
The query engine queries on the PdxInstance type and returns the value.

To see, how the PdxInstance data looks like in the cache, you can print the 
returned value from querying the region values:
E.g.:
 String queryStr = "SELECT v FROM /JsonRegion v";
 SelectResults results = (SelectResults) 
QueryService().newQuery(queryStr).execute();
  Object[] value = results.asList().toArray();
  System.out.println(" Projected value: " + value[0]);

You can find sample queries on different type of objects (collections, etc) at:
https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html

Also in order to determine where the time is getting spent, can you separate 
out object creation through JSONFormatter from put operation.
E.g.:
PdxInstance pdxInstance = JSONFormatter.fromJSON(jsonDoc_2);
// Time taken to format:
region.put("1", pdxInstance);
// Time taken to add to cache:

And measure the time separately. It will help to see if the time is spent in 
getting the PdxInstance or in doing puts. Also, can you measure the time in 
avg. 
E.g. Say time measured for puts from 1000 to 2000 and avg time for those puts. 

-Anil.


On 11/23/20, 11:27 AM, "ankit Soni"  wrote:

 Hello geode-dev,

I am *evaluating usage of Geode (1.12) with storing JSON documents and
querying the same*. I am able to store the json records successfully in
geode but seeking guidance on how to query them.
More details on code and sample json is,


*Sample client-code*

import org.apache.geode.cache.client.ClientCache;
import org.apache.geode.cache.client.ClientCacheFactory;
import org.apache.geode.cache.client.ClientRegionShortcut;
import org.apache.geode.pdx.JSONFormatter;
import org.apache.geode.pdx.PdxInstance;

public class MyTest {

*//NOTE: Below is truncated json, single json document can max
contain an array of col1...col30 (30 diff attributes) within data. *
public final static  String jsonDoc_2 = "{" +
"\"data\":[{" +
"\"col1\": {" +
"\"k11\": \"aaa\"," +
"\"k12\":true," +
"\"k13\": ," +
"\"k14\": \"2020-12-31:00:00:00\"" +
"}," +
"\"col2\":[{" +
"\"k21\": \"22\"," +
"\"k22\": true" +
"}]" +
"}]" +
"}";

* //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray
([]) as shown above in jsonDoc_2;*

public static void main(String[] args){

//create client-cache
ClientCache cache = new
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
Region region = cache.createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
.create(REGION_NAME);

//store json document
region.put("key", JSONFormatter.fromJSON(jsonDoc_2));

//How to query json document like,

// 1. select col2.k21, col1, col20 from /REGION_NAME where
data.col2.k21 = '22' OR data.col2.k21 = '33'

// 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
data.col1.k11 in ('aaa', 'xxx', 'yyy')
}
}

*Server: Region-creation*

gfsh> create region --name=REGION_NAME --type=PARTITION
--redundant-copies=1 --total-num-buckets=61


*Setup: Distributed cluster of 3 nodes
*

*My Observations/Problems*
-  Put operation takes excessive time: region.put("key",
JSONFormatter.fromJSON(jsonDoc_2));  - Fetching a single record from () a
file and Storing in geode approx. takes . 3 secs
   Is there any suggestions/configuration related to JSONFormatter API or
other to optimize this...?

*Looking forward to guidance on querying this JOSN for above sample
queries.*

*Thanks*
*Ankit.*



Re: [PROPOSAL] Change the default value of conserve-sockets to false

2020-11-23 Thread Anthony Baker
Udo, you’re correct that individual servers can set the property independently. 
I was assuming this is more like the ’security-manager` property and others 
that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. 
That makes it pretty challenging to reason about what is happening in a cluster 
when doing root cause analysis. There is even an API to change this value 
dynamically:  
https://geode.apache.org/docs/guide/12/managing/monitor_tune/performance_controls_controlling_socket_use.html

…but I’ve only seen that used to make function threads/sockets follow the 
correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer 
mailto:u...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, 
even in a rolling upgrade. This setting should not have an effect on the 
cluster (other than potentially positive). Also, if the Geode were to enforce 
this setting across the cluster, then we have seriously broken our “shared 
nothing value” here..



Re: Geode - store and query JSON documents

2020-11-23 Thread ankit Soni
Hi Anil,

Thanks a lot for your reply. This really helps to proceed. The query shared
by you worked but I need a slight variation of it, i.e where clause
contains col2 (data.col2.k21 = '22') which is array unlike col1
(object).

FYI: value is stored in cache.
PDX[28847624, __GEMFIRE_JSON]{
data=[PDX[28847624, __GEMFIRE_JSON] {
col1=PDX[28626794, __GEMFIRE_JSON] {k11=aaa, k12=true, k13=,
k14=2020-12-31T00..}
Col2=[PDX[25385544, __GEMFIRE_JSON]{k21=, k22=true}]}]}
Based on OQL querying doc shared, tried few ways but no luck on querying
based on Col2.

It will be really helpful if you share updated query.

Thanks
Ankit.

On Tue, Nov 24, 2020, 2:42 AM Anilkumar Gingade  wrote:

> Ankit,
>
> Here is how you can query your JSON object.
>
> String queryStr = "SELECT d.col1 FROM /JsonRegion v, v.data d where
> d.col1.k11 = 'aaa'";
>
> As replied earlier; the data is stored as PdxInstance type in the cache.
> In the PdxInstance, the data is stored as top level or nested collection of
> objects/values based on input JSON object structure.
> The query engine queries on the PdxInstance type and returns the value.
>
> To see, how the PdxInstance data looks like in the cache, you can print
> the returned value from querying the region values:
> E.g.:
>  String queryStr = "SELECT v FROM /JsonRegion v";
>  SelectResults results = (SelectResults)
> QueryService().newQuery(queryStr).execute();
>   Object[] value = results.asList().toArray();
>   System.out.println(" Projected value: " + value[0]);
>
> You can find sample queries on different type of objects (collections,
> etc) at:
>
> https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html
>
> Also in order to determine where the time is getting spent, can you
> separate out object creation through JSONFormatter from put operation.
> E.g.:
> PdxInstance pdxInstance = JSONFormatter.fromJSON(jsonDoc_2);
> // Time taken to format:
> region.put("1", pdxInstance);
> // Time taken to add to cache:
>
> And measure the time separately. It will help to see if the time is spent
> in getting the PdxInstance or in doing puts. Also, can you measure the time
> in avg.
> E.g. Say time measured for puts from 1000 to 2000 and avg time for those
> puts.
>
> -Anil.
>
>
> On 11/23/20, 11:27 AM, "ankit Soni"  wrote:
>
>  Hello geode-dev,
>
> I am *evaluating usage of Geode (1.12) with storing JSON documents and
> querying the same*. I am able to store the json records successfully in
> geode but seeking guidance on how to query them.
> More details on code and sample json is,
>
>
> *Sample client-code*
>
> import org.apache.geode.cache.client.ClientCache;
> import org.apache.geode.cache.client.ClientCacheFactory;
> import org.apache.geode.cache.client.ClientRegionShortcut;
> import org.apache.geode.pdx.JSONFormatter;
> import org.apache.geode.pdx.PdxInstance;
>
> public class MyTest {
>
> *//NOTE: Below is truncated json, single json document can max
> contain an array of col1...col30 (30 diff attributes) within data. *
> public final static  String jsonDoc_2 = "{" +
> "\"data\":[{" +
> "\"col1\": {" +
> "\"k11\": \"aaa\"," +
> "\"k12\":true," +
> "\"k13\": ," +
> "\"k14\": \"2020-12-31:00:00:00\"" +
> "}," +
> "\"col2\":[{" +
> "\"k21\": \"22\"," +
> "\"k22\": true" +
> "}]" +
> "}]" +
> "}";
>
> * //NOTE: Col1col30 are mix of JSONObject ({}) and JSONArray
> ([]) as shown above in jsonDoc_2;*
>
> public static void main(String[] args){
>
> //create client-cache
> ClientCache cache = new
> ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
> Region region = cache.
> PdxInstance>createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
> .create(REGION_NAME);
>
> //store json document
> region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
>
> //How to query json document like,
>
> // 1. select col2.k21, col1, col20 from /REGION_NAME where
> data.col2.k21 = '22' OR data.col2.k21 = '33'
>
> // 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
> data.col1.k11 in ('aaa', 'xxx', 'yyy')
> }
> }
>
> *Server: Region-creation*
>
> gfsh> create region --name=REGION_NAME --type=PARTITION
> --redundant-copies=1 --total-num-buckets=61
>
>
> *Setup: Distributed cluster of 3 nodes
> *
>
> *My Observations/Problems*
> -  Put operation takes excessive tim