Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

via GitHub Tue, 04 Mar 2025 05:00:51 -0800


Fokko commented on issue #1751:
URL: 
https://github.com/apache/iceberg-python/issues/1751#issuecomment-2697492762


   @andormarkus Sure thing, does the following help:
   
   ```
   from io import BytesIO
   
   from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
   from pyiceberg.avro.encoder import BinaryEncoder
   from pyiceberg.avro.resolver import construct_writer, resolve_reader
   from pyiceberg.manifest import DATA_FILE_TYPE, DEFAULT_READ_VERSION, 
DataFile, DataFileContent, FileFormat
   from pyiceberg.typedef import Record
   
   
   def test_serialize():
       data_file = DataFile(
           content=DataFileContent.DATA,
           file_path="s3://some-path/some-file.parquet",
           file_format=FileFormat.PARQUET,
           partition=Record(),
           record_count=131327,
           file_size_in_bytes=220669226,
           column_sizes={1: 220661854},
           value_counts={1: 131327},
           null_value_counts={1: 0},
           nan_value_counts={},
           lower_bounds={1: b"aaaaaaaaaaaaaaaa"},
           upper_bounds={1: b"zzzzzzzzzzzzzzzz"},
           key_metadata=b"\xde\xad\xbe\xef",
           split_offsets=[4, 133697593],
           equality_ids=[],
           sort_order_id=4,
       )
   
       # Encode
       output = BytesIO()
       encoder = BinaryEncoder(output)
       schema = DATA_FILE_TYPE[DEFAULT_READ_VERSION]
       construct_writer(file_schema=schema).write(encoder, data_file)
   
       # Decode
       decoder = CythonBinaryDecoder(output.getvalue())
       result = resolve_reader(
           schema,
           schema,
           read_types={-1: DataFile},
       ).read(decoder)
   
       assert result.file_path == "s3://some-path/some-file.parquet"
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

Reply via email to