For the best performance, you should store column2 as a java Map instead of a 
String which contains a json document. If column2 was Map<String, String>, you 
could do a query like this:


SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15', 'v7')"

You can create an index on the map to optimize this sort of query

gfsh>create index --name="IndexName" --expression="r.column2[*]" 
--region="/exampleRegion r"

This page might be helpful

https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html

In addition, I noticed that your value implements Serializable. You will get 
better performance out of the query engine if you configure PDX serialization 
for your object, either by configuring the auto serializer or implementing 
PdxSerializable. That avoids the need to deserialize your entire value on the 
server to query/index it.

-Dan


________________________________
From: ankit Soni <ankit.soni.ge...@gmail.com>
Sent: Friday, January 29, 2021 9:32 AM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Inputs for efficient querying

Hello Team,

I am loading data into Geode (V 1.12) with the following *Key (of type
String)* and *value (custom java object - ValueObject)*.

*public class ValueObject implements Serializable {*
*     private int id;*

*     private String keyColumn;   <--------- Region.Key *

*     private String column_2;     <---------- Json document*
*     private String column_3;*

*     private String column_4*



*     //few more string type members*
*}*

*Keycolum* is a normal string of around 8 chars, like "12345678",
"23456789" etc...

*In ValueObject, column_2 is of type string and having a values of type
valid JSON doc as bellow; *
{"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
: "v12", "k13" : "v13"}
{"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
: "v13", "k14" : "v14"}
{"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
"v12", "k13" : "v13", "k14" : "v14"}
.....

after storing the data in Geode i need to run following two queries.
*Query to be supported.*


*Q1. //query with filter on keyColumn*
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION.keyset key
where (key IN SET('12345678', '23456789', '34567890'))"


*Q2. //query with filter on column_2 attribute, something like "where
d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION v
where v.column_2.k10 INSET('v10', 'v15', 'v7')"

I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
this case)*...?

Request team to help, how can i efficiently form and execute above kind of
queries with geode OQL...?

Also advise, what kind of index are recommended to get higher query
performance for above queries...?

Thanks
Ankit.

Reply via email to