puchengy opened a new issue, #46: URL: https://github.com/apache/iceberg-python/issues/46
### Apache Iceberg version None ### Please describe the bug 🐞 v1 data file spec_id is optionally, but it seems spark is able to recognize the spec_id, but pyiceberg can't, any idea why? spark ``` spark-sql> select * from pyang.test_ray_iceberg_read.files; content file_path file_format spec_id partition record_count file_size_in_bytes column_sizes value_counts null_value_counts nan_value_counts lower_bounds upper_bounds key_metadata split_offsets equality_ids sort_order_id readable_metrics 0 s3n://qubole-pinterest/warehouse/pyang.db/test_ray_iceberg_read/dt=2022-01-02/userid_bucket_16=4/00000-2-72876d76-7f6a-4b82-812e-5390351917ef-00001.parquet PARQUET 1 {"dt":"2022-01-02","userid_bucket_16":4} 1 871 {1:36,2:37,3:46} {1:1,2:1,3:1} {1:0,2:0,3:0} {} {1:,2:2,3:2022-01-02} {1:,2:2,3:2022-01-02} NULL [4] NULL 0 {"col":{"column_size":37,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":"2","upper_bound":"2"},"dt":{"column_size":46,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":"2022-01-02","upper_bound":"2022-01-02"},"userid":{"column_size":36,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":2,"upper_bound":2}} 0 s3n://qubole-pinterest/warehouse/pyang.db/test_ray_iceberg_read/dt=2022-01-01/00000-1-f2b3a0c1-a3e3-482a-bf24-9831626c5be7-00001.parquet PARQUET 0 {"dt":"2022-01-01","userid_bucket_16":null} 1 870 {1:36,2:36,3:46} {1:1,2:1,3:1} {1:0,2:0,3:0} {} {1:,2:1,3:2022-01-01} {1:,2:1,3:2022-01-01} NULL [4] NULL 0 {"col":{"column_size":36,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":"1","upper_bound":"1"},"dt":{"column_size":46,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":"2022-01-01","upper_bound":"2022-01-01"},"userid":{"column_size":36,"value_count":1,"null_value_count":0,"nan_value_count":null,"lower_bound":1,"upper_bound":1}} Time taken: 0.494 seconds, Fetched 2 row(s) ``` pyiceberg (0.4.0) ``` >>> tasks2[0] FileScanTask(file=DataFile[file_path='s3n://qubole-pinterest/warehouse/pyang.db/test_ray_iceberg_read/dt=2022-01-02/userid_bucket_16=4/00000-2-72876d76-7f6a-4b82-812e-5390351917ef-00001.parquet', file_format=FileFormat.PARQUET, partition=Record[dt='2022-01-02', userid_bucket_16=4], record_count=1, file_size_in_bytes=871, column_sizes={1: 36, 2: 37, 3: 46}, value_counts={1: 1, 2: 1, 3: 1}, null_value_counts={1: 0, 2: 0, 3: 0}, nan_value_counts={}, lower_bounds={1: b'\x02\x00\x00\x00', 2: b'2', 3: b'2022-01-02'}, upper_bounds={1: b'\x02\x00\x00\x00', 2: b'2', 3: b'2022-01-02'}, key_metadata=None, split_offsets=[4], sort_order_id=0, content=DataFileContent.DATA, equality_ids=None, spec_id=None], delete_files=set(), start=0, length=871) >>> tasks2[1] FileScanTask(file=DataFile[file_path='s3n://qubole-pinterest/warehouse/pyang.db/test_ray_iceberg_read/dt=2022-01-01/00000-1-f2b3a0c1-a3e3-482a-bf24-9831626c5be7-00001.parquet', file_format=FileFormat.PARQUET, partition=Record[dt='2022-01-01'], record_count=1, file_size_in_bytes=870, column_sizes={1: 36, 2: 36, 3: 46}, value_counts={1: 1, 2: 1, 3: 1}, null_value_counts={1: 0, 2: 0, 3: 0}, nan_value_counts={}, lower_bounds={1: b'\x01\x00\x00\x00', 2: b'1', 3: b'2022-01-01'}, upper_bounds={1: b'\x01\x00\x00\x00', 2: b'1', 3: b'2022-01-01'}, key_metadata=None, split_offsets=[4], sort_order_id=0, content=DataFileContent.DATA, equality_ids=None, spec_id=None], delete_files=set(), start=0, length=870) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org