[I] How to run streaming upserts and maintenance simultaneously? [iceberg]

via GitHub Tue, 12 Nov 2024 12:49:04 -0800


meatheadmike opened a new issue, #11530:
URL: https://github.com/apache/iceberg/issues/11530


   ### Query engine
   
   Spark
   
   ### Question
   
   I'm trying to build a streaming upsert process using the latest version of 
spark (3.5.3) and iceberg (1.7.0). So far I've managed to get the streaming 
upsert process with using the MERGE INTO sql syntax. But of course any 
streaming job is going to generate a lot of small files. So I've set up a 
maintenance job that kicks off every 10 minutes. The maintenance process runs 
successfully, but then data ingest process crashes:
   
   ```
   24/11/12 20:28:13 INFO DirectoryPolicyImpl: Directory markers will be kept
   24/11/12 20:28:14 INFO SparkCleanupUtil: Deleted 40 file(s) using bulk 
deletes (job abort)
   24/11/12 20:28:14 ERROR WriteDeltaExec: Data source write support 
org.apache.iceberg.spark.source.SparkPositionDeltaWrite$PositionDeltaBatchWrite@39aea8ff
 aborted.
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
   ...
   ...
   ...
       at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211)
   24/11/12 20:28:14 INFO MicroBatchExecution: Async log purge executor pool 
for query [id = 3d278b6e-539b-454d-a89c-d16c8515e156, runId = 
6d324708-0ce9-4262-bf05-56047c129559] has been shutdown
   return f(*a, **kw)
   File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
326, in get_return_value
   raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o219.sql.
   : org.apache.iceberg.exceptions.ValidationException: Cannot commit, missing 
data files:
   ```
   
   Is there any solution to this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] How to run streaming upserts and maintenance simultaneously? [iceberg]

Reply via email to