jeesou commented on PR #10659: URL: https://github.com/apache/iceberg/pull/10659#issuecomment-2229015200
> @jeesou I will be creating a PR for the procedure for Analyze action, when #10288 is merged . Currently the spec supports only NDV. For more stats we will need to make spec chnages and other corresponding chnages. > > This changes helps reporting stats to Spark using the [DSv2 APIs](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportStatistics.java), which is subsequently used in [Join estimation](https://github.com/karuppayya/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L220), [Dynamic Partition Pruning](https://github.com/karuppayya/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L138) etc Hi @karuppayya , could you please let us know where can we find this spec which will be changed to accommodate the other statistics. And could you Share some knowledge on how to hit the DSv2 apis, while using pyspark, because the documentations don't have much clarity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org