kevinjqliu commented on issue #1053:
URL: 
https://github.com/apache/iceberg-python/issues/1053#issuecomment-2347434164

   What is the difference between your implementation's output vs sparks? 
   
   From the [spark 
docs](https://iceberg.apache.org/docs/latest/spark-queries/#files), "To show 
all files, data files and delete files across all tracked snapshots, query 
prod.db.table.all_files"
   
   > I initially thought that that all_files is returning files from all 
snapshots referenced in current table metadata and hence the repetitions in the 
output.
   
   this sounds right to me. maybe spark gets rid of duplicate rows? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to