Re: Inputs for efficient querying

ankit Soni Fri, 12 Feb 2021 07:47:18 -0800

Thanks Json for your guidance, this is certainly helpful in proceeding
further ...


Ankit.

On Thu, Feb 11, 2021, 6:31 AM Jason Huynh <[email protected]> wrote:

> Hi Ankit,
>
> I haven't had time to try this out but hopefully the answers get you on
> the correct path...
>
>     > 1. How can i form an OQL query (syntax) to fetch the latest row
> based on
>     > MAX(versionId).
> 1.) maybe a nested query or use order by?
> “Select x,y,z from /data-region d where d.versionId > 0 order by
> d.versionId”
> “Select x,y,z from /data-region d where d.versionId in (select
> max(versionId) from /data-region d)”
>
>     > 2. It seems *BETWEEN* support is not available, how can this be
> achieved.
> 2.) to_date might be of use or maybe you can call a method to convert the
> date to millis and do a >, <
> “Select x,y,z from /data-region d where d.date > to_date('01/11/2021’,
> 'MM/dd/yyyy’) and d.date < to_date('01/12/2021’, 'MM/dd/yyyy’)”
>
>
>     > 3. What should be the* recommended index creation here*, for this
> query
>     > to gain fast performance.
> 3.) optimal indexes will probably require knowing more about the entire
> data set.
> Whichever column lookup that can reduce the data down to the smallest
> size, will probably be the field to create the index on
>
> Regards,
> -Jason
>
> On 2/9/21, 9:55 PM, "ankit Soni" <[email protected]> wrote:
>
>     Can some one pls guide how  functionality like "BETWEEN" operator can
> be
>     achieved using geode OQL (for Date fields).
>
>     Thanks
>     Ankit
>
>     On Tue, Feb 9, 2021, 11:53 PM ankit Soni <[email protected]>
> wrote:
>
>     > Thanks Dan for your input. I am able to try this at my end and it's
>     > working as expected.
>     >
>     > As a next steps I need to support somewhat complex queries, so
> updated a
>     > ValueObject, like
>     >
>     > public class ValueObject implements PdxSerializable {
>     >     private static final long serialVersionUID =
> -754645299372860596L;
>     >     private int versionId;   //1 for latest record; 0 for previous
> latest
>     >     private String date;
>     >     private String col_1;
>     >     private String col_2;
>     >     private String col_3;
>     >     private String type;
>     >     private Map<String, String> map;
>     >
>     >     public ValueObject() {
>     >     }
>     >
>     > *Region*<String, ValueObject> : *Key:* random-string and *Value:*
>     > ValueObject
>     >
>     > Need to support queries that fetches columns for *latest record
> (whose
>     > versionId is Max)* with filters and aggregation like
>     > "SELECT date, col_1, col_2, col_3    <-----------Must be fetched
> from a
>     > row where MAX(versionId)
>     >      FROM /data-region d
>     >      WHERE d.type='t1'
>     >      AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
>     >     AND d.vesionId BETWEEN 0 AND 1
>     >     AND d.col_1 IN SET ('111111', '222222')
>     >     GROUP BY d.col_1"
>     >
>     > *Team, Kindly guide on following,*
>     > 1. How can i form an OQL query (syntax) to fetch the latest row
> based on
>     > MAX(versionId).
>     > 2. It seems *BETWEEN* support is not available, how can this be
> achieved.
>     > 3. What should be the* recommended index creation here*, for this
> query
>     > to gain fast performance.
>     > 4. Any recommendation for Key, currently it's a random string.
>     >
>     > Any suggestions on above will be really helpful.
>     >
>     > Thank you
>     > Ankit.
>     >
>     > On Fri, 29 Jan 2021 at 23:23, Dan Smith <[email protected]> wrote:
>     >
>     >> For the best performance, you should store column2 as a java Map
> instead
>     >> of a String which contains a json document. If column2 was
> Map<String,
>     >> String>, you could do a query like this:
>     >>
>     >>
>     >> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10',
> 'v15',
>     >> 'v7')"
>     >>
>     >> You can create an index on the map to optimize this sort of query
>     >>
>     >> gfsh>create index --name="IndexName" --expression="r.column2[*]"
>     >> --region="/exampleRegion r"
>     >>
>     >> This page might be helpful
>     >>
>     >>
>     >>
> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>     >>
>     >> In addition, I noticed that your value implements Serializable. You
> will
>     >> get better performance out of the query engine if you configure PDX
>     >> serialization for your object, either by configuring the auto
> serializer or
>     >> implementing PdxSerializable. That avoids the need to deserialize
> your
>     >> entire value on the server to query/index it.
>     >>
>     >> -Dan
>     >>
>     >>
>     >> ________________________________
>     >> From: ankit Soni <[email protected]>
>     >> Sent: Friday, January 29, 2021 9:32 AM
>     >> To: [email protected] <[email protected]>
>     >> Subject: Inputs for efficient querying
>     >>
>     >> Hello Team,
>     >>
>     >> I am loading data into Geode (V 1.12) with the following *Key (of
> type
>     >> String)* and *value (custom java object - ValueObject)*.
>     >>
>     >> *public class ValueObject implements Serializable {*
>     >> *     private int id;*
>     >>
>     >> *     private String keyColumn;   <--------- Region.Key *
>     >>
>     >> *     private String column_2;     <---------- Json document*
>     >> *     private String column_3;*
>     >>
>     >> *     private String column_4*
>     >>
>     >>
>     >>
>     >> *     //few more string type members*
>     >> *}*
>     >>
>     >> *Keycolum* is a normal string of around 8 chars, like "12345678",
>     >> "23456789" etc...
>     >>
>     >> *In ValueObject, column_2 is of type string and having a values of
> type
>     >> valid JSON doc as bellow; *
>     >> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" :
> "v7"*, "k12"
>     >> : "v12", "k13" : "v13"}
>     >> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" :
> "v10"*,
>     >> "k13"
>     >> : "v13", "k14" : "v14"}
>     >> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7",
> "k12" :
>     >> "v12", "k13" : "v13", "k14" : "v14"}
>     >> .....
>     >>
>     >> after storing the data in Geode i need to run following two queries.
>     >> *Query to be supported.*
>     >>
>     >>
>     >> *Q1. //query with filter on keyColumn*
>     >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>     >> from /DATA_REGION.keyset key
>     >> where (key IN SET('12345678', '23456789', '34567890'))"
>     >>
>     >>
>     >> *Q2. //query with filter on column_2 attribute, something like
> "where
>     >> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
>     >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>     >> from /DATA_REGION v
>     >> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>     >>
>     >> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL
> for
>     >> this case)*...?
>     >>
>     >> Request team to help, how can i efficiently form and execute above
> kind of
>     >> queries with geode OQL...?
>     >>
>     >> Also advise, what kind of index are recommended to get higher query
>     >> performance for above queries...?
>     >>
>     >> Thanks
>     >> Ankit.
>     >>
>     >
>
>

Re: Inputs for efficient querying

Reply via email to