yegangy0718 commented on issue #7568: URL: https://github.com/apache/iceberg/issues/7568#issuecomment-1545070317
I'm not familiar with taking CDC as source. But @stevenzwu and I are working on tamping small files via shuffling https://github.com/apache/iceberg/issues/6303. The basic idea to is collect data distribution information and then use that to improve data clustering so that every iceberg writer receives specific data keys. Sharing it with you to see if it helps. cc @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
