richardstartin commented on PR #10043:
URL: https://github.com/apache/pinot/pull/10043#issuecomment-1371979974

   The compatibility verifier is failing because the numDocsScannedInFilter 
increases, but timeUsedMs decreases by a large factor in each case: e.g. 
`numEntriesScannedInFilter` increased from 12 to 69, but `timeUsedMs` decreased 
from 20ms to 3ms.
   
   ```
   2023/01/05 04:11:35.051 ERROR [QueryOp] [main] Comparison FAILED: Line: 23, 
query: 'SELECT longDimSV1, doubleDimSV1 from FeatureTest3 WHERE doubleDimSV1 > 
99 AND generationNumber = 1 LIMIT 1000', actual response: 
{"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[270,99.62],[268,99.08],[183,99.11],[286,99.1]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":60,"numSegmentsProcessed":4,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"numDocsScanned":4,"numEntriesScannedInFilter":69,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":1200,"timeUsedMs":3,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs"
 
:0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1672891889033,"explainPlanNumEmptyFilterSegments":0,"numSegmentsPrunedByBroker":0,"numRowsResultSet":4,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":54,"explainPlanNumMatchAllFilterSegments":0,"numSegmentsPrunedByServer":56,"numSegmentsPrunedInvalid":0},
 expected response: 
{"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[286,99.1],[183,99.11],[270,99.62],[268,99.08]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":17,"numSegmentsProcessed":3,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numDocsScanned":4,"numEntriesScannedInFilter":12,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":300,"timeUsedMs":20,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlin
 
eResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1641415527398,"numRowsResultSet":4}
   ```
   
   I think this is because filter bitmaps are pushed down to 
`SVScanDocIdIterator.applyAnd`, so only the already filtered doc ids are 
counted towards the "cost" of the scan, but they are not pushed down to the 
range filter, so even though the range filter is faster than scanning, it has 
to consider more rows before filtering them out, and reports a higher 
`numEntriesScannedInFilter`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to