Re: Inputs for efficient querying

ankit Soni Tue, 09 Feb 2021 10:23:44 -0800

Thanks Dan for your input. I am able to try this at my end and it's working
as expected.


As a next steps I need to support somewhat complex queries, so updated a
ValueObject, like

public class ValueObject implements PdxSerializable {
    private static final long serialVersionUID = -754645299372860596L;
    private int versionId;   //1 for latest record; 0 for previous latest
    private String date;
    private String col_1;
    private String col_2;
    private String col_3;
    private String type;
    private Map<String, String> map;

    public ValueObject() {
    }

*Region*<String, ValueObject> : *Key:* random-string and *Value:*
ValueObject

Need to support queries that fetches columns for *latest record (whose
versionId is Max)* with filters and aggregation like
"SELECT date, col_1, col_2, col_3    <-----------Must be fetched from a row
where MAX(versionId)
     FROM /data-region d
     WHERE d.type='t1'
     AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
    AND d.vesionId BETWEEN 0 AND 1
    AND d.col_1 IN SET ('111111', '222222')
    GROUP BY d.col_1"

*Team, Kindly guide on following,*
1. How can i form an OQL query (syntax) to fetch the latest row based on
MAX(versionId).
2. It seems *BETWEEN* support is not available, how can this be achieved.
3. What should be the* recommended index creation here*, for this query to
gain fast performance.
4. Any recommendation for Key, currently it's a random string.

Any suggestions on above will be really helpful.

Thank you
Ankit.

On Fri, 29 Jan 2021 at 23:23, Dan Smith <dasm...@vmware.com> wrote:

> For the best performance, you should store column2 as a java Map instead
> of a String which contains a json document. If column2 was Map<String,
> String>, you could do a query like this:
>
>
> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
> 'v7')"
>
> You can create an index on the map to optimize this sort of query
>
> gfsh>create index --name="IndexName" --expression="r.column2[*]"
> --region="/exampleRegion r"
>
> This page might be helpful
>
>
> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>
> In addition, I noticed that your value implements Serializable. You will
> get better performance out of the query engine if you configure PDX
> serialization for your object, either by configuring the auto serializer or
> implementing PdxSerializable. That avoids the need to deserialize your
> entire value on the server to query/index it.
>
> -Dan
>
>
> ________________________________
> From: ankit Soni <ankit.soni.ge...@gmail.com>
> Sent: Friday, January 29, 2021 9:32 AM
> To: dev@geode.apache.org <dev@geode.apache.org>
> Subject: Inputs for efficient querying
>
> Hello Team,
>
> I am loading data into Geode (V 1.12) with the following *Key (of type
> String)* and *value (custom java object - ValueObject)*.
>
> *public class ValueObject implements Serializable {*
> *     private int id;*
>
> *     private String keyColumn;   <--------- Region.Key *
>
> *     private String column_2;     <---------- Json document*
> *     private String column_3;*
>
> *     private String column_4*
>
>
>
> *     //few more string type members*
> *}*
>
> *Keycolum* is a normal string of around 8 chars, like "12345678",
> "23456789" etc...
>
> *In ValueObject, column_2 is of type string and having a values of type
> valid JSON doc as bellow; *
> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
> : "v12", "k13" : "v13"}
> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
> : "v13", "k14" : "v14"}
> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
> "v12", "k13" : "v13", "k14" : "v14"}
> .....
>
> after storing the data in Geode i need to run following two queries.
> *Query to be supported.*
>
>
> *Q1. //query with filter on keyColumn*
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION.keyset key
> where (key IN SET('12345678', '23456789', '34567890'))"
>
>
> *Q2. //query with filter on column_2 attribute, something like "where
> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION v
> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>
> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
> this case)*...?
>
> Request team to help, how can i efficiently form and execute above kind of
> queries with geode OQL...?
>
> Also advise, what kind of index are recommended to get higher query
> performance for above queries...?
>
> Thanks
> Ankit.
>

Re: Inputs for efficient querying

Reply via email to