goktugkose commented on issue #13763:
URL: https://github.com/apache/iceberg/issues/13763#issuecomment-3190729865

   > > We have encountered the same issue when rewriting data files. Similar to 
[@hguercan](https://github.com/hguercan)'s approach, we wrote a Spark SQL query 
to check whether different snapshots use the same data file. Also, we have 
noticed that pausing the Sink Connector does not stop all processes that belong 
to the sink tasks since INFO-level logs are still produced by those tasks. It 
seems that [@kumarpritam863](https://github.com/kumarpritam863) addressed these 
two items in [#13756](https://github.com/apache/iceberg/pull/13756).
   > > Thanks for your effort 🚀
   > > ```
   > > df= spark.sql(
   > >     f"""
   > > SELECT 
   > >     data_file.file_path AS file_path,
   > >     COUNT(DISTINCT snapshot_id) AS distinct_snapshots
   > > FROM `{CATALOG}`.`{DATASET}`.`{TABLE_NAME}`.entries e
   > > GROUP BY data_file.file_path
   > > HAVING COUNT(DISTINCT snapshot_id) > 1;
   > >     """
   > > )
   > > ```
   > 
   > By any chance are you using CFK for controlling the connectors? I could 
see the same behaviour today on our dev cluster where i had a lot of logs from 
connectors that i already had "deleted". Only restarting the connect cluster 
lead to an expected state.
   > 
   > EDIT: The reason i am pointing this out is probably the issues we are 
seeing are based on a connector still is "working" in parallel although it 
should be deleted and the Connect Cluster is not recognizing?!
   
   We are using 
[this](https://hub.docker.com/layers/confluentinc/cp-kafka-connect-base/6.2.2/images/sha256-772eba973eafeae3b4eee9bae536e726f1a951efc45cc33a3be87344c759fe57)
 image in our stage environments. Even though it was not the expected case, 
you're also pointing out the same behavior that I've mentioned before. Somehow, 
the connector does not comply with the Connect cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to