Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829210096
########## mkdocs/docs/api.md: ########## @@ -353,6 +353,127 @@ lat: [[52.371807,37.773972,53.11254],[53.21917]] long: [[4.896029,-122.431297,6.0989],[6.56667]] ``` +### Partial overwrites + +When using the `overwrite` API, you can use an `overwrite_filter` to delete data that matches the filter before appending new data into the table. + +For example, with an iceberg table created as: + +```python +from pyiceberg.catalog import load_catalog + +catalog = load_catalog("default") + +from pyiceberg.schema import Schema +from pyiceberg.types import NestedField, StringType, DoubleType + +schema = Schema( + NestedField(1, "city", StringType(), required=False), + NestedField(2, "lat", DoubleType(), required=False), + NestedField(3, "long", DoubleType(), required=False), +) + +tbl = catalog.create_table("default.cities", schema=schema) +``` + +And with initial data populating the table: + +```python +import pyarrow as pa +df = pa.Table.from_pylist( + [ + {"city": "Amsterdam", "lat": 52.371807, "long": 4.896029}, + {"city": "San Francisco", "lat": 37.773972, "long": -122.431297}, + {"city": "Drachten", "lat": 53.11254, "long": 6.0989}, + {"city": "Paris", "lat": 48.864716, "long": 2.349014}, + ], +) +tbl.append(df) +``` + +You can overwrite the record of `Paris` with a record of `New York`: + +```python +from pyiceberg.expressions import EqualTo +df = pa.Table.from_pylist( + [ + {"city": "New York", "lat": 40.7128, "long": 74.0060}, + ] +) +tbl.overwrite(df, overwrite_filter=EqualTo('city', "Paris")) +``` + +This produces the following result with `tbl.scan().to_arrow()`: + +```python +pyarrow.Table +city: large_string +lat: double +long: double +---- +city: [["New York"],["Amsterdam","Drachten","Paris"]] Review Comment: I don't think this example is correct. Paris should have been overwritten, right? It looks like we lost San Fran'. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org