[GitHub] [iceberg] rdblue commented on a diff in pull request #8374: Python: Add updates, moves and deletes

via GitHub Fri, 01 Sep 2023 15:56:29 -0700


rdblue commented on code in PR #8374:
URL: https://github.com/apache/iceberg/pull/8374#discussion_r1313615392



##########
python/mkdocs/docs/api.md:
##########
@@ -133,52 +118,140 @@ partition_spec = PartitionSpec(
 from pyiceberg.table.sorting import SortOrder, SortField
 from pyiceberg.transforms import IdentityTransform
 
-sort_order = SortOrder(SortField(source_id=4, transform=IdentityTransform()))
-
-catalog = load_catalog("prod")
+# Sort on the symbol
+sort_order = SortOrder(SortField(source_id=2, transform=IdentityTransform()))
 
 catalog.create_table(
-    identifier="default.bids",
-    location="/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids/",
+    identifier="docs_example.bids",
     schema=schema,
     partition_spec=partition_spec,
     sort_order=sort_order,
 )
 ```
 
-### Update table schema
+## Load a table
+
+### Catalog table
+
+Loading the `bids` table:
+
+```python
+table = catalog.load_table("docs_example.bids")
+# Equivalent to:
+table = catalog.load_table(("docs_example", "bids"))
+# The tuple syntax can be used if the namespace or table contains a dot.
+```
+
+This returns a `Table` that represents an Iceberg table that can be queried 
and altered.
+
+### Static table
+
+To load a table directly from a metadata file (i.e., **without** using a 
catalog), you can use a `StaticTable` as follows:
+
+```python
+from pyiceberg.table import StaticTable
+
+static_table = StaticTable.from_metadata(
+    # For example:
+    # 
"s3a://warehouse/wh/nyc.db/taxis/metadata/00002-6ea51ce3-62aa-4197-9cf8-43d07c3440ca.metadata.json",
+    tbl.metadata_location,
+    properties={
+        "s3.endpoint": "http://127.0.0.1:9000";,
+        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
+        "s3.access-key-id": "admin",
+        "s3.secret-access-key": "password",
+    },
+)
+```
+
+The static-table is considered read-only.
 
-Add new columns through the `Transaction` or `UpdateSchema` API:
+## Schema evolution
 
-Use the Transaction API:
+PyIceberg supports full schema evolution through the Python API. It takes care 
of setting the field-IDs and makes sure that only non-breaking changes are done 
(can be overriden).
+
+In the examples below, the `.update_schema()` is called from the table itself.
+
+```python
+with table.update_schema() as update:
+    update.add_column("some_field", IntegerType(), "doc")
+```
+
+You can also initiate a transaction if you want to make more changes than just 
evolving the schema:
 
 ```python
 with table.transaction() as transaction:
-    transaction.update_schema().add_column("x", IntegerType(), "doc").commit()
+    with transaction.update_schema() as update_schema:
+        update.add_column("some_other_field", IntegerType(), "doc")
+    # ... Update properties etc
 ```
 
-Or, without a context manager:
+### Add column
+
+Using `add_column` you can add a column, without having to worry about the 
field-id:
 
 ```python
-transaction = table.transaction()
-transaction.update_schema().add_column("x", IntegerType(), "doc").commit()
-transaction.commit_transaction()
+with table.update_schema() as update:
+    update.add_column("retries", IntegerType(), "Number of retries to place 
the bid")
+    # In a struct
+    update.add_column("details.confirmed_by", StringType(), "Name of the 
exchange")
 ```
 
-Or, use the UpdateSchema API directly:
+### Rename column
+
+Renaming a field in an Iceberg table is simple:
 
 ```python
 with table.update_schema() as update:
-    update.add_column("x", IntegerType(), "doc")
+    update.rename("retries", "num_retries")
+    # In a struct, only the new name field
+    update.rename("properties.confirmed_by", "exchange")
 ```
 
-Or, without a context manager:
+### Rename column
+
+Move a field inside of struct:
 
 ```python
-table.update_schema().add_column("x", IntegerType(), "doc").commit()
+with table.update_schema() as update:
+    update.move_first("symbol")
+    update.move_after("bid", "ask")
+    # In a struct, only the new name field

Review Comment:
   What do you mean here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #8374: Python: Add updates, moves and deletes

Reply via email to