kaushikranjan opened a new issue, #12588:
URL: https://github.com/apache/iceberg/issues/12588

   ### Query engine
   
   Spark
   
   ### Question
   
   I have a spark-streaming job which writes data from source to destination 
table using MERGE INTO 
   
   MERGE INTO nessie.local.dst dst
           USING nessie.local.src src
           
           ON dst.id = src.id AND 
               dst.employer = src.employer
           
           WHEN MATCHED THEN 
               UPDATE SET dst.year = src.year
           WHEN NOT MATCHED THEN 
               INSERT (id, year, employer, created_on, updated_on) VALUES 
(src.id, src.year, src.employer, src.created_on, src.updated_on)
   
   When the number of tasks to read the destination table goes beyond a 
threshold, I stop the streaming process and run compaction  [to prevent any 
data corruption].
   
   The destination table uses "merge-on-read" for all write modes. 
   
   Post compaction, reading destination table causes spark to error out with 
Executors returning code:134.
   FYI - the total record count < 30,000 and total file size is ~4MB.
   
   I am running 3 executors with 4 cores, 8g memory and a driver with 4 cores, 
8g memory. 
   Is it a data-corruption issue?
   
   ** The destination table is partitioned on employer column and sorted on id.
   Do we need to specify strategy=sort when running rewrite_data_files? Or will 
the procedure pick up the strategy to sort - given that the table has sort keys 
already defined!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to