bitsondatadev commented on code in PR #117: URL: https://github.com/apache/iceberg-python/pull/117#discussion_r1380591348
########## tests/io/test_pyarrow.py: ########## @@ -708,15 +709,17 @@ def _write_table_to_file(filepath: str, schema: pa.Schema, table: pa.Table) -> s @pytest.fixture def file_int(schema_int: Schema, tmpdir: str) -> str: - pyarrow_schema = pa.schema(schema_to_pyarrow(schema_int), metadata={"iceberg.schema": schema_int.model_dump_json()}) + pyarrow_schema = schema_to_pyarrow(schema_int, metadata={ICEBERG_SCHEMA: bytes(schema_int.model_dump_json(), 'utf-8')}) Review Comment: Should this string be using a [constant in a lib somewhere](https://stackoverflow.com/a/44109455)? Or at we could least create an encodings class that centralizes all the schema stuff (e.g. create a constant for `'utf-8'`, hides `ICEBERG_SCHEMA` and expose some cleaner methods that hides the bytes conversion, etc... WDYT? ########## pyiceberg/io/pyarrow.py: ########## @@ -435,13 +435,18 @@ def delete(self, location: Union[str, InputFile, OutputFile]) -> None: raise # pragma: no cover - If some other kind of OSError, raise the raw error -def schema_to_pyarrow(schema: Union[Schema, IcebergType]) -> pa.schema: - return visit(schema, _ConvertToArrowSchema()) +def schema_to_pyarrow(schema: Union[Schema, IcebergType], metadata: Dict[bytes, bytes] = EMPTY_DICT) -> pa.schema: + return visit(schema, _ConvertToArrowSchema(metadata)) Review Comment: What is the `visit()` behavior with an empty dict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org