amitchopraait opened a new issue #6437: URL: https://github.com/apache/incubator-pinot/issues/6437
I have a column with 10000 distinct values. select count(distinct(device)) from metrics { "resultTable": { "dataSchema": { "columnNames": [ "distinctcount(device)" ], "columnDataTypes": [ "INT" ] }, "rows": [ [ 10000 ] ] }, "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 1, "numSegmentsProcessed": 1, "numSegmentsMatched": 1, "numConsumingSegmentsQueried": 0, "numDocsScanned": 3999999, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 0, "numGroupsLimitReached": false, "totalDocs": 3999999, "timeUsedMs": 21, "segmentStatistics": [], "traceInfo": {}, "minConsumingFreshnessTimeMs": 0 } When i do a group by for this column, i get numGroupsLimitReached = true in the response. Even though the documentation states the default limit is set to 100k select device, count(device) as aggreg from metrics group by device order by aggreg desc limit 10 { "resultTable": { "dataSchema": { "columnNames": [ "device", "aggreg" ], "columnDataTypes": [ "STRING", "LONG" ] }, "rows": [ [ "device-6230", 475 ], [ "device-3277", 470 ], [ "device-2311", 469 ], [ "device-3933", 469 ], [ "device-4059", 468 ], [ "device-6002", 468 ], [ "device-621", 466 ], [ "device-2903", 465 ], [ "device-3900", 463 ], [ "device-9324", 463 ] ] }, "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 1, "numSegmentsProcessed": 1, "numSegmentsMatched": 1, "numConsumingSegmentsQueried": 0, "numDocsScanned": 3999999, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 3999999, "numGroupsLimitReached": true, "totalDocs": 3999999, "timeUsedMs": 87, "segmentStatistics": [], "traceInfo": {}, "minConsumingFreshnessTimeMs": 0 } As per conversation in slack: Do you have it configured explicitly? The config key is pinot.server.query.executor.num.groups.limit 10 replies Amit Chopra 23 minutes ago @Jackie - I believe you mean in pinot-server.conf? no i haven’t set it. Jackie 21 minutes ago Hmm.. That is unexpected Jackie 20 minutes ago Do you run the query in PQL mode or SQL mode? Amit Chopra 20 minutes ago SQL mode Jackie 16 minutes ago I just checked the code and we don't set it in SQL mode.. Jackie 15 minutes ago Could you please file a github issue and put the details? Amit Chopra 14 minutes ago sure. Let me file an issue. Just so that i understand, you mean the 100k limit is not set? But what is the default limit in SQL mode today then? Jackie 13 minutes ago We don't put the numGroupsLimitReached in SQL mode. I don't know how it shows up in the response Amit Chopra 12 minutes ago got it Amit Chopra 11 minutes ago As per the second part of the question, given there was a limit of 10 on the query, shouldn’t this be handled by the engine (even if it was a column with more than 100k distinct values) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org