davecromberge commented on issue #12111: URL: https://github.com/apache/pinot/issues/12111#issuecomment-1854123973
I have explored the idea of maintaining a mapping from `AggregationFunctionType` to `AggregationType` and have a [diff](https://github.com/apache/pinot/compare/master...davecromberge:attempt-2/shared-aggregations?expand=1) available of the initial attempt. In summary, this does seem simpler than the "virtual" column approach but there are some comments about the implementation that I like to clarify / discuss. 1. Re-using the `ValueAggregatorFactory` mapping was not accessible to some spi packages, and introducing it might introduce circular dependencies between packages. 2. The segment index reader was not modified to exclude buffer views for duplicated indexes, but this was done in the StarTree loader instead where ForwardIndexReaders are not duplicated. The reason for doing so is to improve memory performance where an existing index is loaded where the aggregation might already be duplicated. 3. In the use case where adding a "duplicate" AggregationColumnPair to existing segments; the StarTree might need to be re-created, even though the new column pair is covered by the existing aggregations. 4. The `StarTreeV2BuilderConfig` removes duplicates column pairs before the StarTree is constructed. A consequence is that the resulting segment metadata would not match all the AggregationSpecs from the table - which might cause subsequent reloads / cycles of reloading. This may require the de-duplication to be done in the StarTree builder itself. One of the more difficult aspects of these changes is understanding the role of the SegmentMetadataV2 in the wider codebase and whether to either: - reflect all aggregation column pairs that are supported or covered by the startree index, or; - reflect only those aggregation column pairs that are actually in the startree index In the meantime, I'm happy to take the conversation into a PR if you are happy with the general direction that this is taking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org