Re: [PR] Support Spark Column Stats [iceberg]

via GitHub Wed, 10 Jul 2024 08:46:52 -0700


jeesou commented on PR #10659:
URL: https://github.com/apache/iceberg/pull/10659#issuecomment-2220881056


   Hi @huaxingao , @karuppayya, just to clear out a few aspects,
   This PR would be a continuation of 
https://github.com/apache/iceberg/pull/10288.
   
   To the above mentioned PR we would need to create a Procedure to trigger the 
Analyze action.
   
   But would this PR changes mean that we would be able to see other columnar 
statistics like min, max, etc along with NDV in the .stat file?
   
   Could you please help understand how "Iceberg can report column stats to 
Spark engine for CBO".
   
   Because even with this PR changes along with 
https://github.com/apache/iceberg/pull/10288, the Statistics section looks like 
:
   "blob-metadata" : [ {
         "type" : "apache-datasketches-theta-v1",
         "snapshot-id" : 6563878496533098178,
         "sequence-number" : 1,
         "fields" : [ 3 ],
         "properties" : {
           "ndv" : "3"
         }
       }]
       
      Other statistics are not coming right now, which is expected as per the 
code. 
      Please help understand that how will we generate the other statistics and 
how will Spark access the statistics.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Support Spark Column Stats [iceberg]

Reply via email to