dejii commented on PR #49768: URL: https://github.com/apache/airflow/pull/49768#issuecomment-3168176325
> You're suggested solution does not cover my use case where files are always copied to the same prefix. This imply that I need to check the creation time of the files also. I don’t think checking the creation time is necessary. The list of objects to be copied is determined by `S3ListOperator.execute`, so as long as the same prefix is used in the `S3ToGCSOperator`, you should get the same result across subsequent tasks https://github.com/apache/airflow/blob/aa6615352d98bc0f4b42a8e3accbe1f455e54ba8/providers/google/src/airflow/providers/google/cloud/transfers/s3_to_gcs.py#L185-L186 > Actually I don't know how the deffered operators behave if the triggerrer is restarted during the deferring. It's not restarted during deferral, but it’s designed to be stateless and resilient to restarts. To preserve that statelessness with your proposed solution, you'd need to serialize the list of objects—which might not be ideal, as it could consume significant space in the metadata database. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
