goktugkose commented on issue #13763: URL: https://github.com/apache/iceberg/issues/13763#issuecomment-3190729865
> > We have encountered the same issue when rewriting data files. Similar to [@hguercan](https://github.com/hguercan)'s approach, we wrote a Spark SQL query to check whether different snapshots use the same data file. Also, we have noticed that pausing the Sink Connector does not stop all processes that belong to the sink tasks since INFO-level logs are still produced by those tasks. It seems that [@kumarpritam863](https://github.com/kumarpritam863) addressed these two items in [#13756](https://github.com/apache/iceberg/pull/13756). > > Thanks for your effort 🚀 > > ``` > > df= spark.sql( > > f""" > > SELECT > > data_file.file_path AS file_path, > > COUNT(DISTINCT snapshot_id) AS distinct_snapshots > > FROM `{CATALOG}`.`{DATASET}`.`{TABLE_NAME}`.entries e > > GROUP BY data_file.file_path > > HAVING COUNT(DISTINCT snapshot_id) > 1; > > """ > > ) > > ``` > > By any chance are you using CFK for controlling the connectors? I could see the same behaviour today on our dev cluster where i had a lot of logs from connectors that i already had "deleted". Only restarting the connect cluster lead to an expected state. > > EDIT: The reason i am pointing this out is probably the issues we are seeing are based on a connector still is "working" in parallel although it should be deleted and the Connect Cluster is not recognizing?! We are using [this](https://hub.docker.com/layers/confluentinc/cp-kafka-connect-base/6.2.2/images/sha256-772eba973eafeae3b4eee9bae536e726f1a951efc45cc33a3be87344c759fe57) image in our stage environments. Even though it was not the expected case, you're also pointing out the same behavior that I've mentioned before. Somehow, the connector does not comply with the Connect cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
