davecromberge commented on issue #12111:
URL: https://github.com/apache/pinot/issues/12111#issuecomment-1854123973

   I have explored the idea of maintaining a mapping from 
`AggregationFunctionType` to `AggregationType` and have a 
[diff](https://github.com/apache/pinot/compare/master...davecromberge:attempt-2/shared-aggregations?expand=1)
 available of the initial attempt.  
   
   In summary, this does seem simpler than the "virtual" column approach but 
there are some comments about the implementation that I like to clarify / 
discuss.
   
   1. Re-using the `ValueAggregatorFactory` mapping was not accessible to some 
spi packages, and introducing it might introduce circular dependencies between 
packages.
   2. The segment index reader was not modified to exclude buffer views for 
duplicated indexes, but this was done in the StarTree loader instead where 
ForwardIndexReaders are not duplicated.  The reason for doing so is to improve 
memory performance where an existing index is loaded where the aggregation 
might already be duplicated.
   3. In the use case where adding a "duplicate" AggregationColumnPair to 
existing segments;  the StarTree might need to be re-created, even though the 
new column pair is covered by the existing aggregations.  
   4. The `StarTreeV2BuilderConfig` removes duplicates column pairs before the 
StarTree is constructed.  A consequence is that the resulting segment metadata 
would not match all the AggregationSpecs from the table - which might cause 
subsequent reloads / cycles of reloading.  This may require the de-duplication 
to be done in the StarTree builder itself.  
   
   One of the more difficult aspects of these changes is understanding the role 
of the SegmentMetadataV2 in the wider codebase and whether to either:
   - reflect all aggregation column pairs that are supported or covered by the 
startree index, or;
   - reflect only those aggregation column pairs that are actually in the 
startree index
   
   In the meantime, I'm happy to take the conversation into a PR if you are 
happy with the general direction that this is taking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to