BiteTheDDDDt commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1559088510

   > `count(*)` 会扫描每个聚合 key 导致性能会差一点认了, `count(distinct varchar_column)` 
为什么也不支持(居然只能支持数字字段)
   > 
   > 数据如果会重复导入, 那就三种模型来说: `unique` 模型不支持函数; `aggregate` 模型没有提到像 `unique` 
一样那么多限制, 但是在实际的过程中感觉跟 `unique` 是一样的, 各种不支持(如果只能改变顺序, 叫啥物化视图呢, 叫索引不更合适一点?); 
`duplicate` 模型要怎么基于重复数据做 sum 函数呢?
   > 
   > 2.0 也依然没有解决这些很基础的 sql 问题
   > 
   > ```sql
   > /* doris-version: 1.2.4.1 */
   > drop table if exists temp;
   > create table temp (
   >     id varchar(8),
   >     `date` date,
   >     user_id varchar(8),
   >     channel_id varchar(8) replace, /* 如果改成 unique 或 duplicate 模型则去掉这里的 
replace */
   >     `amount` int replace
   > )
   > aggregate key (id, `date`, user_id) /* unique = aggregate + 上面的 replace  */
   > partition by range(`date`) (
   >     partition p202305 values less than ("2023-06-01")
   > )
   > distributed by hash(id) buckets 1
   > properties (
   >     /* "enable_unique_key_merge_on_write" = "true", -- 如果用 unique key 可以使用 
*/
   >     "replication_allocation" = "tag.location.default: 1"
   > );
   > 
   > insert into temp(id, `date`, user_id, channel_id, amount) values
   > ('a', '2023-05-01', 'aa', '123a', '50'),
   > ('a', '2023-05-01', 'aa', '123a', '50'),  /* 重复数据 */
   > ('b', '2023-05-01', 'aa', '123b', '10'),
   > ('c', '2023-05-02', 'cc', '123c', '20'),
   > ('d', '2023-05-03', 'dd', '123d', '30');
   > 
   > create materialized view mv1 as /* 如果是 aggregate 或 unique 模型无法建立 */
   > select `date`, user_id, count(distinct channel_id)
   > from temp /* 只有 duplicate 模型 + bitmap_union(to_bitmap(num_col)) 可以, 然而 
num_col 不能是 varchar 类型 */
   > group by `date`, user_id
   > order by `date` desc, user_id;
   > /* SQL error(1105): errCode = 2, detailMessage = The function count must 
match pattern:count(column) */
   > 
   > 
   > create materialized view mv2 as /* 如果是 aggregate 或 unique 模型无法建立 */
   > select `date`, user_id, sum(amount) /* 如果是 duplicate 模型无法去重 */
   > from temp
   > group by `date`, user_id
   > order by `date` desc, user_id;
   > /* SQL error(1105): errCode = 2, detailMessage = The aggregation type of 
   > column[amount] must be same as the aggregate type of base column in 
aggregate table */
   > ```
   > 
   > 限制实在是太多了, 我有理由相信, 支持的只有文档中给出来的示例
   
   Hello, to_bitmap(num_col) can be changed to bitmap_hash(num_col) in 2.0.
   
   As for why the case you listed is restricted, it is because the aggregation 
type of a certain column on the materialized view/rollup needs to be consistent 
with the base table. The mv with group by actually creates a corresponding agg 
table for the base table.
   
   For example, consider the following situations:
   The base table type is dup, with a value column k1 on it, and a materialized 
view with a sum column k1 can be created (the base table has no aggregation 
type).
   The base table type is agg, with sum column k1 on it, and a materialized 
view with sum column k1 can be created (the aggregation type is the same).
   The base table type is uniq, and there is a value column k1 on it. At this 
time, the aggregation type equivalent to k1 is replace, so the materialized 
view of k1 with the sum aggregation type cannot be created. (Different 
aggregation types)
   
   For specific reasons, you can understand the principle of the agg table, and 
think about what problems will exist if this restriction is not imposed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to