dario-liberman commented on PR #10867:
URL: https://github.com/apache/pinot/pull/10867#issuecomment-1605753936

   > > > In order for this aggregation to work, does it require all the data to 
be partitioned by segments (i.e. all users show up in the same segment, and no 
user can be shared across segments)? That is the pre-requisite for 
`SEGMENT_PARTITIONED_DISTINCT_COUNT`
   > > 
   > > Yes. That is the pre-requisite to use the aggregation function. For 
realtime table, it needs the Kafka topic to be partitioned (eg., by user ids).
   > 
   > this is probably not practical and we should consider fixing this. Even if 
the kafka topic is partitioned by the same user_id, there is not guarantee that 
all users will be part of same segment.
   
   I shared above a work in progress branch with more funnel count aggregation 
strategies, effectively equivalents to DISTINCTCOUNT, DISTINCTCOUNTBITMAP and 
DISTINCTCOUNTTHETASKETCH. These do not depend on partitioning.
   
   The strategy equivalent to SEGMENTPARTITIONEDDISTINCTCOUNT we have here is 
just a first version. When the column is configured as partition column we only 
have the same users across time boundaries between segments, which when 
grouping over time (eg per hour) to see funnel trends, gives good enough 
approximations. In the future it should be possible to incorporate a partition 
level (or server level?) phase so that we aggregate differently between 
segments within the same partition and segments across partitions. I will need 
more time for that though, for now I am adding different strategies so we can 
use the right one for each use case, as it will also depend on the 
sessionization window desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to