richardstartin commented on pull request #7820: URL: https://github.com/apache/pinot/pull/7820#issuecomment-984548475
@Jackie-Jiang to demonstrate an open mind, I've implemented your suggestion here: https://github.com/apache/pinot/pull/7820/commits/f75135887001394abb4097a9ddae486e5cea1c48 I'm not convinced this is necessarily cleaner (`ProjectionBlock` is though) and I think it was better to keep the "tunnel" for evaluators separate from the regular path with caching. Writing the array copies from the enormous array maintained by `DataBlockCache` (because it doesn't have enough information to size the result properly) to the correctly sized arrays in `JsonExtractScalarFunction` made me want to cry ever so slightly. Please pay attention to the changes to the cache key in `DataBlockCache` - JSONPath evaluations need to be discriminated by both the JSON Path itself and the default value, on top of the column name and type of the result. This is probably better than having large JSON documents lingering in the cache, but I'm struggling to see this cache as an unequivocal benefit for every possible transformation. Could we avoid caching JSON path evaluations at the block level and revert the commit? If we find that we spend a lot of time on duplicate JSON path evaluations, could we think about avoiding the planning of duplicate evaluations instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org