Re: Question about INDEX_THRESHOLD_SIZE

Anilkumar Gingade Fri, 11 Mar 2022 12:11:15 -0800

Mario,

There is similar test/example added by you in QueryWithRangeIndexDUnitTest. 
testQueryWithWildcardAndIndexOnAttributeFromHashMap()
When I run that test (on develop); I see the results as expected:
*************
Command result for <query --query="<trace> select e.key, e.value from 
/exampleRegion.entrySet e where e.value.positions['SUN'] like 'somethin%'">: 
Result      : true
Limit       : 100
Rows        : 1
Query Trace : Query Executed in 85.1964 ms; indexesUsed(1):IdIndex(Results: 
10000)
************
Are you running your test with any additional change like as you are saying :
>> I was working on allowing INDEX_THRESHOLD_SIZE System property to override 
>> CompiledValue.RESULT_LIMIT.


If so , you need to look at the change and see why its impacting that way. 
If not, please let me know what change can be made in that test to reproduce 
the issue you are seeing; that will help to debug/analyze the issue.

-Anil.




On 3/11/22, 12:18 AM, "Mario Kevo" <[email protected]> wrote:

    Hi,

    It works without an index but it doesn't work with an index.
    When I revert changes, it takes INDEX_THRESHOLD_SIZE default value(100). 
And if the entry that matches the condition is not in that resultset it will 
not be printed.
    Without index:
    gfsh>query --query="<trace>SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
    Result      : true
    Limit       : 100
    Rows        : 1
    Query Trace : Query Executed in 11.502283 ms; indexesUsed(0)

    key | value
    --- | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    300 | 
{"ID":300,"indexKey":0,"pkid":"300","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":300,..
    With index:
    gfsh>query --query="<trace>SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
    Result      : true
    Limit       : 100
    Rows        : 0
    Query Trace : Query Executed in 8.784831 ms; indexesUsed(1):index1(Results: 
100)
    BR,
    Mario
    ________________________________
    Šalje: Anilkumar Gingade <[email protected]>
    Poslano: 10. ožujka 2022. 23:16
    Prima: [email protected] <[email protected]>
    Predmet: Re: Question about INDEX_THRESHOLD_SIZE

    Mario,

    There are few changes happened around this area as part of GEODE-9632 fix; 
can you please revert that change and see if the query works both with and 
without index.
    Looking at the code; it seems to restrict the number index look up that 
needs to be performed; certain latency/throughput sensitive queries that or not 
expecting exact result may use this (my guess) but by default it should not be 
resulting in unexpected results.

    -Anil.


    On 3/10/22, 6:50 AM, "Mario Kevo" <[email protected]> wrote:

        Hi geode-dev,

        Some time ago I was working on allowing INDEX_THRESHOLD_SIZE System 
property to override CompiledValue.RESULT_LIMIT.
        After this change, adding this attribute will take into a count if you 
set it.
        But I need some clarification of this INDEX_THRESHOLD_SIZE attribute. 
Why is this set by default to 100?
        The main problem with this attribute is that if you want to get the 
correct result, you need to know how many entries will be in the region while 
starting servers and set it on that value or higher. Sometimes it is too hard 
to know how many entries will be in the region, so maybe better will be to set 
it by default to some higher number, something like Integer.MAX_VALUE.

        Where this attribute is used?
        It is used to get index results while doing queries.

        What is the problem?
        If we have INDEX_THRESHOLD_SIZE set to 500, and we have 1k entries it 
can happen that while doing a query it will get only 500 entries and where 
clause cannot be fulfilled and we got no results.
        Let's see it by an example!

        We have only one entry that matches the condition from the query, 
INDEX_THRESHOLD_SIZE set to 500, and 1k entries in the region.
        If we run the query without an index we got the result.
        gfsh>query --query="<trace>SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
        Result      : true
        Limit       : 100
        Rows        : 1
        Query Trace : Query Executed in 10.750238 ms; indexesUsed(0)

        key | value
        --- | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        700 | 
{"ID":700,"indexKey":0,"pkid":"700","shortID":null,"position1":{"mktValue":1945.0,"secId":"ORCL","secIdIndexed":"ORCL","secType":null,"sharesOutstanding":1944000.0,"underlyer":null,"pid":1944,"portfolioId":700,..
        If we create an index and then run again this query there is no result.
        gfsh>query --query="<trace>SELECT e.key, e.value from 
/example-region.entrySet e where e.value.positions['SUN'] like 'someth%'"
        Result      : true
        Limit       : 100
        Rows        : 0
        Query Trace : Query Executed in 22.079016 ms; 
indexesUsed(1):index1(Results: 500)
        This happened because we have no luck getting that entry that matches 
the condition in the intermediate results for the index.
        So the questions are:
        What if more entries enter the region that will make the index return 
more entries than this threshold sets? Then we're again in jeopardy that the 
query condition will not match.
        Why is this attribute set by default to 100?
        Can we change it to the Integer.MAX_VALUE by default to be sure that we 
have the correct result? What are the consequences?

        BR,
        Mario

Re: Question about INDEX_THRESHOLD_SIZE

Reply via email to