nagasrisai opened a new pull request, #64574:
URL: https://github.com/apache/airflow/pull/64574
The Salesforce Bulk API returns errors at the record level inside what looks
like a successful response. Some of those errors are transient, things like
`UNABLE_TO_LOCK_ROW` (Salesforce row-level locking under concurrent load) or
`API_TEMPORARILY_UNAVAILABLE` (momentary service hiccup). Right now there is no
way to handle those without wrapping the operator in custom retry logic in the
DAG itself.
This PR adds three optional parameters to `SalesforceBulkOperator`:
- `max_retries` (default `0`): how many times to re-submit records that
hit a transient error. Defaults to zero so existing behaviour is completely
unchanged.
- `retry_delay` (default `5.0`): seconds to wait before each retry attempt.
- `transient_error_codes` (default `{"UNABLE_TO_LOCK_ROW",
"API_TEMPORARILY_UNAVAILABLE"}`): the set of Salesforce `statusCode` values
that are eligible for retry.
Only the failing records are re-submitted each time, not the whole
payload. Because the Bulk API returns results in the same order as the input,
the retry results are written back into the correct positions in the final
result list.
Errors like `INVALID_FIELD` or `REQUIRED_FIELD_MISSING` are not in the
default transient set and are never retried, so no data is silently swallowed.
The implementation follows the spirit of what eladkal suggested in #64519,
using a similar retry-on-transient-error pattern to
`BigQueryInsertJobOperator`, adapted for the record-level result format that
Salesforce Bulk uses.
Eight tests cover the main scenarios: no retry when disabled, transient
errors retried, permanent errors not retried, retry count capped at
`max_retries`, delay enforced, custom error codes, both default transient
codes, and mixed failure types.
Closes apache/airflow#64519 (follow-up as agreed with @eladkal)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]