Copilot commented on code in PR #64766:
URL: https://github.com/apache/airflow/pull/64766#discussion_r3066486590


##########
providers/databricks/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -410,6 +410,51 @@ def execute(self, context: Context) -> int:
         return job_id
 
 
+class DatabricksDeleteJobsOperator(BaseOperator):
+    """
+    Deletes a Databricks job.
+
+    :param job_id: The ID of the Databricks job to delete. (templated)
+    :param databricks_conn_id: Reference to the Databricks connection.
+    """
+
+    template_fields: Sequence[str] = ("job_id",)
+
+    def __init__(
+        self,
+        *,
+        job_id: int,
+        databricks_conn_id: str = "databricks_default",

Review Comment:
   `job_id` is marked as a templated field, but the constructor type is `int`. 
In Airflow, Jinja rendering typically produces `str` values, so this signature 
is inconsistent with templating and with other Databricks operators (e.g. 
`DatabricksRunNowOperator` uses `job_id: str | None`). Consider widening the 
type (e.g. `int | str`) and/or coercing to `int` in `execute` before calling 
the hook.



##########
providers/databricks/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -410,6 +410,51 @@ def execute(self, context: Context) -> int:
         return job_id
 
 
+class DatabricksDeleteJobsOperator(BaseOperator):
+    """
+    Deletes a Databricks job.
+
+    :param job_id: The ID of the Databricks job to delete. (templated)
+    :param databricks_conn_id: Reference to the Databricks connection.

Review Comment:
   The class docstring documents only `job_id` and `databricks_conn_id`, but 
`__init__` also exposes retry-related parameters (and 
`polling_period_seconds`). Please document these parameters here for 
consistency with other operators in this module (e.g. 
`DatabricksCreateJobsOperator`).



##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks.py:
##########
@@ -302,6 +303,14 @@ def create_job(self, json: dict) -> int:
         response = self._do_api_call(CREATE_ENDPOINT, json)
         return response["job_id"]
 
+    def delete_job(self, job_id: int) -> None:
+        """
+        Call the ``api/2.2/jobs/delete`` endpoint.
+
+        :param job_id: The unique identifier of the job to be deleted.
+        """
+        self._do_api_call(DELETE_ENDPOINT, {"job_id": job_id})

Review Comment:
   `delete_job` takes `job_id: int`, but operator/template usage commonly 
yields a `str` at runtime. Consider accepting `str | int` (consistent with 
`reset_job`/`update_job` signatures) and/or coercing/validating to an `int` 
before making the API call to ensure the payload type matches the Databricks 
API expectation.



##########
providers/databricks/tests/unit/databricks/operators/test_databricks.py:
##########
@@ -603,6 +604,44 @@ def test_exec_update_job_permission_with_empty_acl(self, 
db_mock_class):
         db_mock.update_job_permission.assert_not_called()
 
 
+class TestDatabricksDeleteJobsOperator:
+    
@mock.patch("airflow.providers.databricks.operators.databricks.DatabricksHook")
+    def test_execute(self, mock_hook_class):
+        # Set up the mock hook instance
+        mock_hook = mock_hook_class.return_value
+
+        # Instantiate the DatabricksDeleteJobOperator
+        op = DatabricksDeleteJobsOperator(
+            task_id="delete_job_task",
+            job_id=12345,
+            databricks_conn_id="databricks_default",
+        )
+
+        # Execute the operator
+        op.execute({})
+
+        # Verify that the DatabricksHook was initialized with the correct 
connection ID and default retry arguments
+        mock_hook_class.assert_called_once_with(
+            "databricks_default",
+            retry_limit=3,
+            retry_delay=1,
+            retry_args=None,

Review Comment:
   This assertion hard-codes default retry values (`retry_limit=3`, 
`retry_delay=1`). For consistency with the other operator tests in this file 
and to avoid brittleness if defaults change (or if the operator is instantiated 
with non-default values), assert against `op.databricks_retry_limit` / 
`op.databricks_retry_delay` instead of literals.



##########
providers/databricks/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -410,6 +410,51 @@ def execute(self, context: Context) -> int:
         return job_id
 
 
+class DatabricksDeleteJobsOperator(BaseOperator):
+    """
+    Deletes a Databricks job.
+
+    :param job_id: The ID of the Databricks job to delete. (templated)
+    :param databricks_conn_id: Reference to the Databricks connection.
+    """
+
+    template_fields: Sequence[str] = ("job_id",)
+
+    def __init__(
+        self,
+        *,
+        job_id: int,
+        databricks_conn_id: str = "databricks_default",
+        polling_period_seconds: int = 30,
+        databricks_retry_limit: int = 3,
+        databricks_retry_delay: int = 1,
+        databricks_retry_args: dict[Any, Any] | None = None,
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.job_id = job_id
+        self.databricks_conn_id = databricks_conn_id
+        self.polling_period_seconds = polling_period_seconds
+        self.databricks_retry_limit = databricks_retry_limit
+        self.databricks_retry_delay = databricks_retry_delay
+        self.databricks_retry_args = databricks_retry_args

Review Comment:
   `polling_period_seconds` is accepted and stored on the operator, but it is 
never used by `DatabricksDeleteJobsOperator`. This makes the public API 
misleading; consider removing the parameter (and attribute) or using it if 
there is intended polling behavior.



##########
providers/databricks/src/airflow/providers/databricks/hooks/databricks.py:
##########
@@ -54,6 +54,7 @@
 REPAIR_RUN_ENDPOINT = ("POST", "2.2/jobs/runs/repair")
 OUTPUT_RUNS_JOB_ENDPOINT = ("GET", "2.2/jobs/runs/get-output")
 CANCEL_ALL_RUNS_ENDPOINT = ("POST", "2.2/jobs/runs/cancel-all")
+DELETE_ENDPOINT = ("POST", "2.2/jobs/delete")

Review Comment:
   Endpoint constant name `DELETE_ENDPOINT` is overly generic in this module, 
which already defines other delete-related endpoints like `DELETE_RUN_ENDPOINT` 
and `DELETE_REPO_ENDPOINT`. Renaming this to something specific like 
`DELETE_JOB_ENDPOINT` would better match existing naming patterns and avoid 
confusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to