Re: [PR] docs/configuration.md: Documented table properties (#1231) [iceberg-python]

via GitHub Mon, 28 Oct 2024 07:52:26 -0700


Fokko commented on code in PR #1232:
URL: https://github.com/apache/iceberg-python/pull/1232#discussion_r1819195993



##########
mkdocs/docs/configuration.md:
##########
@@ -30,15 +30,23 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key                                    | Options                           | 
Default | Description                                                           
                      |
-| -------------------------------------- | --------------------------------- | 
------- | 
-------------------------------------------------------------------------------------------
 |
-| `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}` | 
zstd    | Sets the Parquet compression coddec.                                  
                      |
-| `write.parquet.compression-level`      | Integer                           | 
null    | Parquet compression level for the codec. If not set, it is up to 
PyIceberg                  |
-| `write.parquet.row-group-limit`        | Number of rows                    | 
1048576 | The upper bound of the number of entries within a single row group    
                      |
-| `write.parquet.page-size-bytes`        | Size in bytes                     | 
1MB     | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit`         | Number of rows                    | 
20000   | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.dict-size-bytes`        | Size in bytes                     | 
2MB     | Set the dictionary page size limit per row group                      
                      |
-| `write.metadata.previous-versions-max` | Integer                           | 
100     | The max number of previous version metadata files to keep before 
deleting after commit.     |
+| Key                                    | Options                             
   | Default         | Description                                              
                                   |
+| -------------------------------------- | 
-------------------------------------- | --------------- | 
-------------------------------------------------------------------------------------------
 |
+| `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}`   
  | zstd            | Sets the Parquet compression codec.                       
                                  |
+| `write.parquet.compression-level`      | Integer                             
   | null            | Parquet compression level for the codec. If not set, it 
is up to PyIceberg.                 |
+| `write.parquet.row-group-limit`        | Number of rows                      
   | 1,048,576       | The upper bound of the number of entries within a single 
row group.                         |
+| `write.parquet.row-group-size-bytes`   | Size in bytes                       
   | 128 MB          | The maximum size (in bytes) of each Parquet row group.   
                                   |

Review Comment:
   This one is not supported: 
https://github.com/apache/iceberg-python/blob/583a7e97db28afa4259cdf504611845222338893/pyiceberg/io/pyarrow.py#L2550-L2556
   
   We can also make that explicit in the docs.



##########
mkdocs/docs/configuration.md:
##########
@@ -47,6 +55,8 @@ Iceberg tables support table properties to configure table 
behavior.
 | `commit.manifest.target-size-bytes`  | Size in bytes       | 8388608 (8MB) | 
Target size when merging manifest files                     |
 | `commit.manifest.min-count-to-merge` | Number of manifests | 100           | 
Target size when merging manifest files                     |
 | `commit.manifest-merge.enabled`      | Boolean             | False         | 
Controls whether to automatically merge manifests on writes |
+| `schema.name-mapping.default`          | Name mapping strategy               
   | N/A             | Default name mapping for schema evolution.               
                                   |
+| `format-version`                       | `{1, 2}`                            
   | 2               | The version of the Iceberg table format to use.          
                                   |

Review Comment:
   This is interesting. Before nog aligning the markdown table would result in 
a lint error.



##########
mkdocs/docs/configuration.md:
##########
@@ -30,15 +30,23 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key                                    | Options                           | 
Default | Description                                                           
                      |
-| -------------------------------------- | --------------------------------- | 
------- | 
-------------------------------------------------------------------------------------------
 |
-| `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}` | 
zstd    | Sets the Parquet compression coddec.                                  
                      |
-| `write.parquet.compression-level`      | Integer                           | 
null    | Parquet compression level for the codec. If not set, it is up to 
PyIceberg                  |
-| `write.parquet.row-group-limit`        | Number of rows                    | 
1048576 | The upper bound of the number of entries within a single row group    
                      |
-| `write.parquet.page-size-bytes`        | Size in bytes                     | 
1MB     | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit`         | Number of rows                    | 
20000   | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.dict-size-bytes`        | Size in bytes                     | 
2MB     | Set the dictionary page size limit per row group                      
                      |
-| `write.metadata.previous-versions-max` | Integer                           | 
100     | The max number of previous version metadata files to keep before 
deleting after commit.     |
+| Key                                    | Options                             
   | Default         | Description                                              
                                   |
+| -------------------------------------- | 
-------------------------------------- | --------------- | 
-------------------------------------------------------------------------------------------
 |
+| `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}`   
  | zstd            | Sets the Parquet compression codec.                       
                                  |
+| `write.parquet.compression-level`      | Integer                             
   | null            | Parquet compression level for the codec. If not set, it 
is up to PyIceberg.                 |
+| `write.parquet.row-group-limit`        | Number of rows                      
   | 1,048,576       | The upper bound of the number of entries within a single 
row group.                         |
+| `write.parquet.row-group-size-bytes`   | Size in bytes                       
   | 128 MB          | The maximum size (in bytes) of each Parquet row group.   
                                   |
+| `write.parquet.page-size-bytes`        | Size in bytes                       
   | 1 MB            | Target threshold for the approximate encoded size of 
data pages within a column chunk.      |
+| `write.parquet.page-row-limit`         | Number of rows                      
   | 20,000          | Target threshold for the number of rows within a data 
page inside a column chunk.           |
+| `write.parquet.dict-size-bytes`        | Size in bytes                       
   | 2 MB            | The dictionary page size limit per row group.            
                                   |
+| `write.parquet.bloom-filter-max-bytes` | Size in bytes                       
   | 1 MB            | The maximum size (in bytes) of the Bloom filter for 
Parquet files.                          |
+| `write.parquet.bloom-filter-enabled.column` | Column names                   
     | N/A             | Enable Bloom filters for specific columns by prefixing 
the column name.                     |

Review Comment:
   There ones are supported: 
https://github.com/apache/iceberg-python/blob/583a7e97db28afa4259cdf504611845222338893/pyiceberg/io/pyarrow.py#L2550-L2556
   
   We can also make that explicit in the docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] docs/configuration.md: Documented table properties (#1231) [iceberg-python]

Reply via email to