johnpyp opened a new issue, #37755: URL: https://github.com/apache/doris/issues/37755
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version doris-2.1.4-rc03-e93678fd1e ### What's Wrong? I have one table defined with `DUPLICATE KEY(...)`, and another with `UNIQUE KEY(...)`. These tables have an *identical* table definition except for that one difference, and both use the same keys. They each have nearly identical counts as well (~800m rows): (using `SHOW DATA`: first table uses DUPLICATE KEY, second table uses UNIQUE KEY)   ### What You Expected? Unique table should be approximately the same size as the Duplicate table, maybe slightly larger due to hidden column overhead - definitely not more than 2x as large ### How to Reproduce? 1. Create any two tables, one using `DUPLICATE KEY` and one using `UNIQUE KEY`. 2. Ingest the same data into each. 3. `ANALYZE TABLE` on each table to make sure the storage numbers are up to date. 4. Compare data sizes with `SHOW DATA` ### Anything Else? If this is for some reason an intended feature of the UNIQUE data model, it would be great to warn about it in the documentation (I couldn't find anything about it). Additionally, it would be nice to have an "Offline Deduplication" that I can run on-demand for `DUPLICATE` tables (maybe by using temporary segment swaps or something) - similar to Clickhouse's `OPTIMIZE TABLE ... DEDUPLICATE`. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org