douglas-raillard-arm opened a new issue, #41863:
URL: https://github.com/apache/arrow/issues/41863
### Describe the enhancement requested
`pyarrow.dataset.write_dataset(compression='lz4_raw')` currently fails with:
```
Traceback (most recent call last):
File "/work/projects/lisa/testpyarrow.py", line 3, in <module>
_reencode_parquet('sched_switch.lz4.parquet', 'updated.parquet',
compression='lz4_raw')#, row_group_size=128*1024*1024, compression='LZ4')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "x.py", line 1, in my_write_parquet
options = pyarrow.dataset.ParquetFileFormat().make_write_options(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset_parquet.pyx", line 206, in
pyarrow._dataset_parquet.ParquetFileFormat.make_write_options
File "pyarrow/_dataset_parquet.pyx", line 594, in
pyarrow._dataset_parquet.ParquetFileWriteOptions.update
File "pyarrow/_dataset_parquet.pyx", line 599, in
pyarrow._dataset_parquet.ParquetFileWriteOptions._set_properties
File "pyarrow/_parquet.pyx", line 1855, in
pyarrow._parquet._create_writer_properties
File "pyarrow/_parquet.pyx", line 1369, in
pyarrow._parquet.check_compression_name
pyarrow.lib.ArrowException: Unsupported compression: lz4_raw
```
And indeed, no mention of `lz4_raw` is to be found in
`python/pyarrow/_parquet.pyx`.
Would it be possible to add support for LZ4_RAW codec when writing parquet
files, particularly using the dataset API ?
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]