Re: [PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

via GitHub Fri, 10 Jan 2025 14:18:33 -0800


nickdelnano commented on code in PR #11923:
URL: https://github.com/apache/iceberg/pull/11923#discussion_r1911517154



##########
docs/docs/spark-getting-started.md:
##########
@@ -77,21 +77,24 @@ Once your table is created, insert data using [`INSERT 
INTO`](spark-writes.md#in
 
 ```sql
 INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');
-INSERT INTO local.db.table SELECT id, data FROM source WHERE length(data) = 1;
 ```
 
 Iceberg also adds row-level SQL updates to Spark, [`MERGE 
INTO`](spark-writes.md#merge-into) and [`DELETE 
FROM`](spark-writes.md#delete-from):
 
 ```sql
-MERGE INTO local.db.target t USING (SELECT * FROM updates) u ON t.id = u.id
-WHEN MATCHED THEN UPDATE SET t.count = t.count + u.count
+CREATE TABLE local.db.updates (id bigint, data string) USING iceberg;
+INSERT INTO local.db.updates VALUES (1, 'x'), (2, 'y'), (4, 'z');

Review Comment:
   about merge branches, commented here 
https://github.com/apache/iceberg/pull/11923/files?diff=unified&w=0#r1906122418
   
   The table states are straightforward until after the MERGE query (1 insert 
per table). I have added the table state as a comment after MERGE only. 
Otherwise there is a lot of duplication. Let me know your thoughts.



##########
docs/docs/spark-getting-started.md:
##########
@@ -160,7 +163,7 @@ This type conversion table describes how Spark types are 
converted to the Iceber
 | map             | map                        |       |
 
 !!! info
-    The table is based on representing conversion during creating table. In 
fact, broader supports are applied on write. Here're some points on write:
+    The table is based on type conversions during table creation. Broader type 
conversions are applied on write:

Review Comment:
   thanks, updated



##########
spark/v3.3/spark-runtime/src/integration/java/org/apache/iceberg/spark/SmokeTest.java:
##########
@@ -66,25 +64,25 @@ public void testGettingStarted() throws IOException {
     sql(
         "CREATE TABLE updates (id bigint, data string) USING parquet LOCATION 
'%s'",
         temp.newFolder());
-    sql("INSERT INTO updates VALUES (1, 'x'), (2, 'x'), (4, 'z')");
+    sql("INSERT INTO updates VALUES (1, 'x'), (2, 'y'), (4, 'z')");

Review Comment:
   the example still hits all branches of merge
   - id 1 and 2 are updated
   - id 3, 10, 11 are unchanged
   - id 4 does not match and is inserted
   
   the change here is to provide a unique `data` value for results as that 
helps to explain the example in the docs better



##########
spark/v3.3/spark-runtime/src/integration/java/org/apache/iceberg/spark/SmokeTest.java:
##########
@@ -66,25 +64,25 @@ public void testGettingStarted() throws IOException {
     sql(
         "CREATE TABLE updates (id bigint, data string) USING parquet LOCATION 
'%s'",

Review Comment:
   I tried and this change breaks tests that use SmokeTest with hadoop and hive 
catalogs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

Reply via email to