Re: Inputs for efficient querying

ankit Soni Tue, 09 Feb 2021 21:55:43 -0800

Can some one pls guide how  functionality like "BETWEEN" operator can be
achieved using geode OQL (for Date fields).


Thanks
Ankit

On Tue, Feb 9, 2021, 11:53 PM ankit Soni <[email protected]> wrote:

> Thanks Dan for your input. I am able to try this at my end and it's
> working as expected.
>
> As a next steps I need to support somewhat complex queries, so updated a
> ValueObject, like
>
> public class ValueObject implements PdxSerializable {
>     private static final long serialVersionUID = -754645299372860596L;
>     private int versionId;   //1 for latest record; 0 for previous latest
>     private String date;
>     private String col_1;
>     private String col_2;
>     private String col_3;
>     private String type;
>     private Map<String, String> map;
>
>     public ValueObject() {
>     }
>
> *Region*<String, ValueObject> : *Key:* random-string and *Value:*
> ValueObject
>
> Need to support queries that fetches columns for *latest record (whose
> versionId is Max)* with filters and aggregation like
> "SELECT date, col_1, col_2, col_3    <-----------Must be fetched from a
> row where MAX(versionId)
>      FROM /data-region d
>      WHERE d.type='t1'
>      AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
>     AND d.vesionId BETWEEN 0 AND 1
>     AND d.col_1 IN SET ('111111', '222222')
>     GROUP BY d.col_1"
>
> *Team, Kindly guide on following,*
> 1. How can i form an OQL query (syntax) to fetch the latest row based on
> MAX(versionId).
> 2. It seems *BETWEEN* support is not available, how can this be achieved.
> 3. What should be the* recommended index creation here*, for this query
> to gain fast performance.
> 4. Any recommendation for Key, currently it's a random string.
>
> Any suggestions on above will be really helpful.
>
> Thank you
> Ankit.
>
> On Fri, 29 Jan 2021 at 23:23, Dan Smith <[email protected]> wrote:
>
>> For the best performance, you should store column2 as a java Map instead
>> of a String which contains a json document. If column2 was Map<String,
>> String>, you could do a query like this:
>>
>>
>> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
>> 'v7')"
>>
>> You can create an index on the map to optimize this sort of query
>>
>> gfsh>create index --name="IndexName" --expression="r.column2[*]"
>> --region="/exampleRegion r"
>>
>> This page might be helpful
>>
>>
>> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>>
>> In addition, I noticed that your value implements Serializable. You will
>> get better performance out of the query engine if you configure PDX
>> serialization for your object, either by configuring the auto serializer or
>> implementing PdxSerializable. That avoids the need to deserialize your
>> entire value on the server to query/index it.
>>
>> -Dan
>>
>>
>> ________________________________
>> From: ankit Soni <[email protected]>
>> Sent: Friday, January 29, 2021 9:32 AM
>> To: [email protected] <[email protected]>
>> Subject: Inputs for efficient querying
>>
>> Hello Team,
>>
>> I am loading data into Geode (V 1.12) with the following *Key (of type
>> String)* and *value (custom java object - ValueObject)*.
>>
>> *public class ValueObject implements Serializable {*
>> *     private int id;*
>>
>> *     private String keyColumn;   <--------- Region.Key *
>>
>> *     private String column_2;     <---------- Json document*
>> *     private String column_3;*
>>
>> *     private String column_4*
>>
>>
>>
>> *     //few more string type members*
>> *}*
>>
>> *Keycolum* is a normal string of around 8 chars, like "12345678",
>> "23456789" etc...
>>
>> *In ValueObject, column_2 is of type string and having a values of type
>> valid JSON doc as bellow; *
>> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
>> : "v12", "k13" : "v13"}
>> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*,
>> "k13"
>> : "v13", "k14" : "v14"}
>> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
>> "v12", "k13" : "v13", "k14" : "v14"}
>> .....
>>
>> after storing the data in Geode i need to run following two queries.
>> *Query to be supported.*
>>
>>
>> *Q1. //query with filter on keyColumn*
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION.keyset key
>> where (key IN SET('12345678', '23456789', '34567890'))"
>>
>>
>> *Q2. //query with filter on column_2 attribute, something like "where
>> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION v
>> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>>
>> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
>> this case)*...?
>>
>> Request team to help, how can i efficiently form and execute above kind of
>> queries with geode OQL...?
>>
>> Also advise, what kind of index are recommended to get higher query
>> performance for above queries...?
>>
>> Thanks
>> Ankit.
>>
>

Re: Inputs for efficient querying

Reply via email to