Ankit,
Here is how you can query your JSON object.
String queryStr = "SELECT d.col1 FROM /JsonRegion v, v.data d where d.col1.k11
= 'aaa'";
As replied earlier; the data is stored as PdxInstance type in the cache. In the
PdxInstance, the data is stored as top level or nested collection of
objects/values based on input JSON object structure.
The query engine queries on the PdxInstance type and returns the value.
To see, how the PdxInstance data looks like in the cache, you can print the
returned value from querying the region values:
E.g.:
String queryStr = "SELECT v FROM /JsonRegion v";
SelectResults results = (SelectResults)
QueryService().newQuery(queryStr).execute();
Object[] value = results.asList().toArray();
System.out.println("#### Projected value: " + value[0]);
You can find sample queries on different type of objects (collections, etc) at:
https://geode.apache.org/docs/guide/18/getting_started/querying_quick_reference.html
Also in order to determine where the time is getting spent, can you separate
out object creation through JSONFormatter from put operation.
E.g.:
PdxInstance pdxInstance = JSONFormatter.fromJSON(jsonDoc_2);
// Time taken to format:
region.put("1", pdxInstance);
// Time taken to add to cache:
And measure the time separately. It will help to see if the time is spent in
getting the PdxInstance or in doing puts. Also, can you measure the time in
avg.
E.g. Say time measured for puts from 1000 to 2000 and avg time for those puts.
-Anil.
On 11/23/20, 11:27 AM, "ankit Soni" <[email protected]> wrote:
Hello geode-dev,
I am *evaluating usage of Geode (1.12) with storing JSON documents and
querying the same*. I am able to store the json records successfully in
geode but seeking guidance on how to query them.
More details on code and sample json is,
*Sample client-code*
import org.apache.geode.cache.client.ClientCache;
import org.apache.geode.cache.client.ClientCacheFactory;
import org.apache.geode.cache.client.ClientRegionShortcut;
import org.apache.geode.pdx.JSONFormatter;
import org.apache.geode.pdx.PdxInstance;
public class MyTest {
*//NOTE: Below is truncated json, single json document can max
contain an array of col1...col30 (30 diff attributes) within data. *
public final static String jsonDoc_2 = "{" +
"\"data\":[{" +
"\"col1\": {" +
"\"k11\": \"aaa\"," +
"\"k12\":true," +
"\"k13\": 1111," +
"\"k14\": \"2020-12-31:00:00:00\"" +
"}," +
"\"col2\":[{" +
"\"k21\": \"222222\"," +
"\"k22\": true" +
"}]" +
"}]" +
"}";
* //NOTE: Col1....col30 are mix of JSONObject ({}) and JSONArray
([]) as shown above in jsonDoc_2;*
public static void main(String[] args){
//create client-cache
ClientCache cache = new
ClientCacheFactory().addPoolLocator(LOCATOR_HOST, PORT).create();
Region<String, PdxInstance> region = cache.<String,
PdxInstance>createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY)
.create(REGION_NAME);
//store json document
region.put("key", JSONFormatter.fromJSON(jsonDoc_2));
//How to query json document like,
// 1. select col2.k21, col1, col20 from /REGION_NAME where
data.col2.k21 = '222222' OR data.col2.k21 = '333333'
// 2. select col2.k21, col1.k11, col1 from /REGION_NAME where
data.col1.k11 in ('aaa', 'xxx', 'yyy')
}
}
*Server: Region-creation*
gfsh> create region --name=REGION_NAME --type=PARTITION
--redundant-copies=1 --total-num-buckets=61
*Setup: Distributed cluster of 3 nodes
*
*My Observations/Problems*
- Put operation takes excessive time: region.put("key",
JSONFormatter.fromJSON(jsonDoc_2)); - Fetching a single record from () a
file and Storing in geode approx. takes . 3 secs
Is there any suggestions/configuration related to JSONFormatter API or
other to optimize this...?
*Looking forward to guidance on querying this JOSN for above sample
queries.*
*Thanks*
*Ankit.*