This is an automated email from the ASF dual-hosted git repository.

jscheffl pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 02107a49b0b Enhance Edge3 Provider docs (#49859)
02107a49b0b is described below

commit 02107a49b0b08494aa1d3c646a9225d3a319b2cd
Author: Jens Scheffler <[email protected]>
AuthorDate: Thu May 8 14:50:47 2025 +0200

    Enhance Edge3 Provider docs (#49859)
    
    * Enhance Edge3 Provider docs
    
    * Complete Edge docs revision
    
    * Adjust and extend documentation after PR 49915
    
    * Review comments
    
    * Extend docs after PR 50278
---
 .pre-commit-config.yaml                            |   4 +-
 providers/edge3/docs/architecture.rst              | 189 ++++++++++++++++
 providers/edge3/docs/deployment.rst                | 173 ++++++++++++++
 providers/edge3/docs/edge_executor.rst             | 252 ++-------------------
 .../edge3/docs/img/distributed_architecture.svg    |   4 +
 providers/edge3/docs/img/edge_package.svg          |   4 +
 providers/edge3/docs/index.rst                     |  13 +-
 providers/edge3/docs/install_on_windows.rst        |  24 +-
 providers/edge3/docs/ui_plugin.rst                 |  66 ++++++
 providers/edge3/docs/why_edge.rst                  |  53 +++++
 providers/edge3/provider.yaml                      |  14 +-
 .../airflow/providers/edge3/get_provider_info.py   |   2 +-
 12 files changed, 540 insertions(+), 258 deletions(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 8e5c8cfdab0..3f3b10f1dcb 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -314,7 +314,7 @@ repos:
         exclude: 
material-icons\.css$|^images/.*$|^RELEASE_NOTES\.txt$|^.*package-lock\.json$|^.*/kinglear\.txt$|^.*pnpm-lock\.yaml$|.*/dist/.*
         args:
           - --ignore-words=docs/spelling_wordlist.txt
-          - 
--skip=providers/.*/src/airflow/providers/*/*.rst,providers/*/docs/changelog.rst,docs/*/commits.rst,providers/*/docs/commits.rst,providers/*/*/docs/commits.rst,docs/apache-airflow/tutorial/pipeline_example.csv,*.min.js,*.lock,INTHEWILD.md
+          - 
--skip=providers/.*/src/airflow/providers/*/*.rst,providers/*/docs/changelog.rst,docs/*/commits.rst,providers/*/docs/commits.rst,providers/*/*/docs/commits.rst,docs/apache-airflow/tutorial/pipeline_example.csv,*.min.js,*.lock,INTHEWILD.md,*.svg
           - --exclude-file=.codespellignorelines
   - repo: https://github.com/woodruffw/zizmor-pre-commit
     rev: v1.5.2
@@ -648,7 +648,7 @@ repos:
           ^.*commits\.(rst|txt)$|
           ^.*RELEASE_NOTES\.rst$|
           ^contributing-docs/03_contributors_quick_start\.rst$|
-          ^.*\.(png|gif|jp[e]?g|tgz|lock)$|
+          ^.*\.(png|gif|jp[e]?g|svg|tgz|lock)$|
           git|
           ^airflow-core/newsfragments/43349\.significant\.rst$|
           ^airflow-core/newsfragments/41368\.significant\.rst$|
diff --git a/providers/edge3/docs/architecture.rst 
b/providers/edge3/docs/architecture.rst
new file mode 100644
index 00000000000..6eb12742469
--- /dev/null
+++ b/providers/edge3/docs/architecture.rst
@@ -0,0 +1,189 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Edge Provider Architecture
+==========================
+
+Airflow consist of several components which are connected like in the 
following diagram. The Edge Worker which is
+deployed outside of the central Airflow cluster is connected via HTTP(s) to 
the API server of the Airflow cluster:
+
+.. graphviz::
+
+    digraph A{
+        rankdir="TB"
+        node[shape="rectangle", style="rounded"]
+
+
+        subgraph cluster {
+            label="Cluster";
+            {rank = same; dag; database}
+            {rank = same; workers; scheduler; api}
+
+            workers[label="(Central) Workers"]
+            scheduler[label="Scheduler"]
+            api[label="API server"]
+            database[label="Database"]
+            dag[label="DAG files"]
+
+            api->workers
+            api->database
+
+            workers->dag
+            workers->database
+
+            scheduler->database
+        }
+
+        subgraph edge_worker_subgraph {
+            label="Edge site";
+            {rank = same; edge_worker; edge_dag}
+            edge_worker[label="Edge Worker"]
+            edge_dag[label="DAG files (Remote copy)"]
+
+            edge_worker->edge_dag
+        }
+
+        edge_worker->api[label="HTTP(s)"]
+    }
+
+* **Workers** - Execute the assigned tasks - most standard setup has local or 
centralized workers, e.g. via Celery
+* **Edge Workers** - Special workers which pull tasks via HTTP(s) as provided 
as feature via this provider package
+* **Scheduler** - Responsible for adding the necessary tasks to the queue. The 
EdgeExecutor is running as a module inside the scheduler.
+* **API server** - HTTP REST API Server provides access to DAG/task status 
information. The required end-points are
+  provided by the Edge provider plugin. The Edge Worker uses this API to pull 
tasks and send back the results.
+* **Database** - Contains information about the status of tasks, Dags, 
Variables, connections, etc.
+
+In detail the parts of the Edge provider are deployed as follows:
+
+.. image:: img/edge_package.svg
+   :alt: Overview and communication of Edge Provider modules
+
+* **EdgeExecutor** - The EdgeExecutor is running inside the core Airflow 
scheduler. It is responsible for
+  scheduling tasks and sending them to the Edge job queue in the database. The 
EdgeExecutor is a subclass of the
+  ``airflow.executors.base_executor.BaseExecutor`` class. To activate the 
EdgeExecutor, you need to set the
+  ``executor`` configuration option in the ``airflow.cfg`` file to
+  ``airflow.providers.edge3.executors.EdgeExecutor``. For more details see 
:doc:`edge_executor`. Note that also
+  multiple executors can be used in parallel together with the EdgeExecutor.
+* **API server** - The API server is providing REST endpoints to the web UI as 
well as serves static files. The
+  Edge provider adds a plugin that provides additional REST API for the Edge 
Worker as well as UI elements to
+  manage workers (currently Airflow 2.10 only).
+  The API server is responsible for handling requests from the Edge Worker and 
sending back the results. To
+  activate the API server, you need to set the ``api_enabled`` configuration 
option in ``edge`` section in the
+  ``airflow.cfg`` file to ``True``. The API endpoints for edge is not started 
by default.
+  Fr more details see :doc:`ui_plugin`.
+* **Database** - The Airflow meta database is used to store the status of 
tasks, Dags, Variables, connections
+  etc. The Edge provider uses the database to store the status of the Edge 
Worker instances and the tasks that
+  are assigned to it. The database is also used to store the results of the 
tasks that are executed by the
+  Edge Worker. Setup of needed tables and migration is done automatically when 
the provider package is deployed.
+* **Edge Worker** - The Edge Worker is a lightweight process that runs on the 
edge device. It is responsible for
+  pulling tasks from the API server and executing them. The Edge Worker is a 
standalone process that can be
+  deployed on any machine that has access to the API server. It is designed to 
be lightweight and easy to
+  deploy. The Edge Worker is implemented as a command line tool that can be 
started with the ``airflow edge worker``
+  command. For more details see :doc:`deployment`.
+
+Edge Worker State Model
+-----------------------
+
+Each Edge Worker is tracked from the API server such that it is known which 
worker is currently active. This is
+for monitoring as well as administrators as else it is assumed a distributed 
monitoring and tracking is hard to
+achieve. This also allows central management for administrative maintenance.
+
+Workers send regular heartbeats to the API server to indicate that they are 
still alive. The heartbeats are used to
+determine the state of the worker.
+
+The following states are used to track the worker:
+
+.. graphviz::
+
+   digraph edge_worker_state {
+      node [shape=circle];
+
+      STARTING[label="starting"];
+      IDLE[label="idle"];
+      RUNNING[label="running"];
+      TERMINATING[label="terminating"];
+      OFFLINE[label="offline"];
+      UNKNOWN[label="unknown"];
+      MAINTENANCE_REQUEST[label="maintenance request"];
+      MAINTENANCE_PENDING[label="maintenance pending"];
+      MAINTENANCE_MODE[label="maintenance mode"];
+      MAINTENANCE_EXIT[label="maintenance exit"];
+      OFFLINE_MAINTENANCE[label="offline maintenance"];
+
+      STARTING->IDLE[label="initialization"];
+      IDLE->RUNNING[label="new task"];
+      RUNNING->IDLE[label="all tasks completed"];
+      IDLE->MAINTENANCE_REQUEST[label="triggered by admin"];
+      RUNNING->MAINTENANCE_REQUEST[label="triggered by admin"];
+      MAINTENANCE_REQUEST->MAINTENANCE_PENDING[label="if running tasks > 0"];
+      MAINTENANCE_REQUEST->MAINTENANCE_MODE[label="if running tasks = 0"];
+      MAINTENANCE_PENDING->MAINTENANCE_MODE[label="running tasks = 0"];
+      MAINTENANCE_PENDING->MAINTENANCE_EXIT[label="triggered by admin"];
+      MAINTENANCE_MODE->MAINTENANCE_EXIT[label="triggered by admin"];
+      MAINTENANCE_EXIT->RUNNING[label="if running tasks > 0"];
+      MAINTENANCE_EXIT->IDLE[label="if running tasks = 0"];
+      IDLE->OFFLINE[label="on clean shutdown"];
+      RUNNING->TERMINATING[label="on clean shutdown if running tasks > 0"];
+      TERMINATING->OFFLINE[label="on clean shutdown if running tasks = 0"];
+   }
+
+See also 
https://github.com/apache/airflow/blob/main/providers/edge3/src/airflow/providers/edge3/models/edge_worker.py#L45
+for a documentation of details of all states of the Edge Worker.
+
+Feature Backlog Edge Provider
+-----------------------------
+
+The current version of the EdgeExecutor is released with known limitations. It 
will mature over time.
+
+The following features are known missing and will be implemented in increments:
+
+- API token per worker: Today there is a global API token available only
+- Edge Worker Plugin
+
+  - Make plugin working on Airflow 3.0, depending on AIP-68
+  - Overview about queues / jobs per queue
+  - Allow starting Edge Worker REST API separate to api-server
+  - Add some hints how to setup an additional worker
+
+- Edge Worker CLI
+
+  - Use WebSockets instead of HTTP calls for communication
+  - Send logs also to TaskFileHandler if external logging services are used
+  - Integration into telemetry to send metrics from remote site
+  - Publish system metrics with heartbeats (CPU, Disk space, RAM, Load)
+  - Be more liberal e.g. on patch version. Currently requires exact version 
match
+    (In current state if versions do not match, the worker will gracefully shut
+    down when jobs are completed, no new jobs will be started)
+
+- Tests
+
+  - System tests in Github, test the deployment of the worker with a Dag 
execution
+  - Test/Support on Windows for Edge Worker
+
+- Scaling test - Check and define boundaries of workers/jobs. Today it is 
known to
+  scale into a range of 50 workers. This is not a hard limit but just an 
experience reported.
+- Load tests - impact of scaled execution and code optimization
+- Incremental logs during task execution can be served w/o shared log disk on 
api-server
+- Reduce dependencies during execution: Today the worker depends on the 
airflow core with a lot
+  of transitive dependencies. Target is to reduce the dependencies to a 
minimum like TaskSDK
+  and providers only.
+
+- Documentation
+
+  - Provide scripts and guides to install edge components as service (systemd)
+  - Extend Helm-Chart for needed support
+  - Provide an example docker compose for worker setup
diff --git a/providers/edge3/docs/deployment.rst 
b/providers/edge3/docs/deployment.rst
new file mode 100644
index 00000000000..f608cb7f101
--- /dev/null
+++ b/providers/edge3/docs/deployment.rst
@@ -0,0 +1,173 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Edge Worker Deployment
+======================
+
+Edge Workers can be deployed outside of the central Airflow infrastructure. 
They
+are connected to the Airflow API server via HTTP(s). The Edge Worker is a
+lightweight component that can be deployed on any machine that has outbound
+HTTP(s) access to the Airflow API server. The Edge Worker is designed to be
+lightweight and easy to deploy. It allows you to run Airflow tasks on machines
+that are not part of your main data center, e.g. edge servers. This also 
allows to
+deploy only reduced dependencies on the edge worker.
+
+Here are a few imperative requirements for your workers:
+
+- ``airflow`` needs to be installed, and the Airflow CLI needs to be in the 
path. This includes
+  the Task SDK as well as the edge3 provider package.
+- Airflow configuration settings should be homogeneous across the cluster and 
on the edge site
+- Operators that are executed on the Edge Worker need to have their 
dependencies
+  met in that context. Please take a look to the respective provider package
+  documentations
+- The worker needs to have access to the ``DAGS_FOLDER``, and you need to
+  synchronize the filesystems by your own means. A common setup would be to
+  store your ``DAGS_FOLDER`` in a Git repository and sync it across machines 
using
+  Chef, Puppet, Ansible, or whatever you use to configure machines in your
+  environment. If all your boxes have a common mount point, having your
+  pipelines files shared there should work as well
+
+
+Minimum Airflow configuration settings for the Edge Worker to make it running 
is:
+
+- Section ``[core]``
+
+  - ``executor``: Executor must be set or added to be 
``airflow.providers.edge3.executors.EdgeExecutor``
+  - ``internal_api_secret_key``: An encryption key must be set on api-server 
and Edge Worker component as
+    shared secret to authenticate traffic. It should be a random string like 
the fernet key
+    (but preferably not the same).
+
+- Section ``[edge]``
+
+  - ``api_enabled``: Must be set to true. It is disabled intentionally to not 
expose
+    API endpoint by default. This is the endpoint the worker connects to.
+    In a future release a dedicated API server can be started.
+  - ``api_url``: Must be set to the URL which exposes the api endpoint as it 
is reachable from the
+    worker. Typically this looks like 
``https://your-hostname-and-port/edge_worker/v1/rpcapi``.
+
+To kick off a worker, you need to setup Airflow and kick off the worker
+subcommand
+
+.. code-block:: bash
+
+    airflow edge worker
+
+Your worker should start picking up tasks as soon as they get fired in
+its direction. To stop a worker running on a machine you can use:
+
+.. code-block:: bash
+
+    airflow edge stop
+
+It will try to stop the worker gracefully by sending ``SIGINT`` signal to main
+process as and wait until all running tasks are completed. Also in a console 
you can use
+``Ctrl-C`` to stop the worker.
+
+If you want to monitor the remote activity and worker, use the UI plugin which
+is included in the provider package and install it on the webserver and use the
+"Admin" - "Edge Worker Hosts" and "Edge Worker Jobs" pages.
+(Note: The plugin is not ported to Airflow 3.0 web UI at time of writing)
+
+If you want to check status of the worker via CLI you can use the command
+
+.. code-block:: bash
+
+    airflow edge status
+
+Some caveats:
+
+- Tasks can consume resources. Make sure your worker has enough resources to 
run ``worker_concurrency`` tasks
+- Make sure that the ``pool_slots`` of a Tasks matches with the 
``worker_concurrency`` of the worker.
+  See also :ref:`edge_executor:concurrency_slots`.
+- Queue names are limited to 256 characters
+
+See :doc:`apache-airflow:administration-and-deployment/modules_management` for 
details on how Python and Airflow manage modules.
+
+.. _deployment:maintenance:
+
+Worker Maintenance Mode
+-----------------------
+
+Sometimes infrastructure needs to be maintained. The Edge Worker provides a
+maintenance mode to
+- Stop accepting new tasks
+- Drain all ongoing work gracefully
+
+Also please note if the worker detects that the Airflow or Edge provider 
package version
+is not the same as the one running on the API server, it will stop accepting 
new tasks and shut down gracefully.
+This is to prevent running tasks with different versions of the code.
+
+Worker status can be checked via the web UI in the "Admin" - "Edge Worker 
Hosts" page.
+
+.. image:: img/worker_hosts.png
+
+.. note::
+
+    As of time of writing the web UI to see edge jobs and manage workers is 
not ported to Airflow 3.0.
+    Until this is available you can use the CLI commands as described in 
:ref:`deployment:maintenance-mgmt-cli`.
+
+
+Worker maintenance can also be triggered via the CLI command on the machine 
that runs the worker.
+
+.. code-block:: bash
+
+    airflow edge maintenance --comments "Some comments for the maintenance" on
+
+This will stop the local worker instance from accepting new tasks and will 
complete running tasks.
+If you add the command argument ``--wait`` the CLI will wait until all
+running tasks are completed before return.
+
+If you want to know the status of you local worker while waiting on 
maintenance you can
+use the command
+
+.. code-block:: bash
+
+    airflow edge status
+
+This will show the status of the local worker instance as JSON and the tasks 
running on it.
+
+The status and maintenance comments will also be shown in the web UI
+in the "Admin" - "Edge Worker Hosts" page.
+
+.. image:: img/worker_maintenance.png
+
+The local worker instance can be started to fetch new tasks via the command
+
+.. code-block:: bash
+
+    airflow edge maintenance off
+
+This will start the worker again and it will start accepting tasks again.
+
+.. _deployment:maintenance-mgmt-cli:
+
+Worker Maintenance Management CLI
+---------------------------------
+
+Besides the CLI command to trigger maintenance on the local worker instance, 
there are also additional commands to
+manage the maintenance of all workers in the cluster. These commands can be 
used to trigger maintenance
+on all workers in the cluster or to check the status of all workers in the 
cluster.
+
+These set of commands need database access, and can only be called on the 
central Airflow
+instance. The commands are:
+
+- ``airflow edge list-workers``: List all workers in the cluster
+- ``airflow edge remote-edge-worker-request-maintenance``: Request a remote 
edge worker to enter maintenance mode
+- ``airflow edge remote-edge-worker-update-maintenance-comment``: Updates the 
maintenance comment for a remote edge worker
+- ``airflow edge remote-edge-worker-exit-maintenance``: Request a remote edge 
worker to exit maintenance mode
+- ``airflow edge shutdown-remote-edge-worker``: Shuts down a remote edge 
worker gracefully
+- ``airflow edge remove-remote-edge-worker``: Remove a worker instance from 
the cluster
diff --git a/providers/edge3/docs/edge_executor.rst 
b/providers/edge3/docs/edge_executor.rst
index 5b1df582ab0..e3af6aae107 100644
--- a/providers/edge3/docs/edge_executor.rst
+++ b/providers/edge3/docs/edge_executor.rst
@@ -20,159 +20,12 @@ Edge Executor
 
 ``EdgeExecutor`` is an option if you want to distribute tasks to workers 
distributed in different locations.
 You can use it also in parallel with other executors if needed. Change your 
``airflow.cfg`` to point
-the executor parameter to ``EdgeExecutor`` and provide the related settings.
+the executor parameter to ``EdgeExecutor`` and provide the related settings. 
The ``EdgeExecutor`` is the component
+to schedule tasks to the edge workers. The edge workers need to be set-up 
separately as described in :doc:`deployment`.
 
 The configuration parameters of the Edge Executor can be found in the Edge 
provider's :doc:`configurations-ref`.
 
-Here are a few imperative requirements for your workers:
-
-- ``airflow`` needs to be installed, and the Airflow CLI needs to be in the 
path
-- Airflow configuration settings should be homogeneous across the cluster and 
on the edge site
-- Operators that are executed on the Edge Worker need to have their 
dependencies
-  met in that context. Please take a look to the respective provider package
-  documentations
-- The worker needs to have access to its ``DAGS_FOLDER``, and you need to
-  synchronize the filesystems by your own means. A common setup would be to
-  store your ``DAGS_FOLDER`` in a Git repository and sync it across machines 
using
-  Chef, Puppet, Ansible, or whatever you use to configure machines in your
-  environment. If all your boxes have a common mount point, having your
-  pipelines files shared there should work as well
-
-
-Minimum configuration for the Edge Worker to make it running is:
-
-- Section ``[core]``
-
-  - ``executor``: Executor must be set or added to be 
``airflow.providers.edge3.executors.EdgeExecutor``
-  - ``internal_api_secret_key``: An encryption key must be set on webserver 
and Edge Worker component as
-    shared secret to authenticate traffic. It should be a random string like 
the fernet key
-    (but preferably not the same).
-
-- Section ``[edge]``
-
-  - ``api_enabled``: Must be set to true. It is disabled intentionally to not 
expose
-    the endpoint by default. This is the endpoint the worker connects to.
-    In a future release a dedicated API server can be started.
-  - ``api_url``: Must be set to the URL which exposes the web endpoint
-
-To kick off a worker, you need to setup Airflow and kick off the worker
-subcommand
-
-.. code-block:: bash
-
-    airflow edge worker
-
-Your worker should start picking up tasks as soon as they get fired in
-its direction. To stop a worker running on a machine you can use:
-
-.. code-block:: bash
-
-    airflow edge stop
-
-It will try to stop the worker gracefully by sending ``SIGINT`` signal to main
-process as and wait until all running tasks are completed.
-
-If you want to monitor the remote activity and worker, use the UI plugin which
-is included in the provider package and install it on the webserver and use the
-"Admin" - "Edge Worker Hosts" and "Edge Worker Jobs" pages.
-(Note: The plugin is not ported to Airflow 3.0 web UI at time of writing)
-
-If you want to check status of the worker via CLI you can use the command
-
-.. code-block:: bash
-
-    airflow edge status
-
-Some caveats:
-
-- Tasks can consume resources. Make sure your worker has enough resources to 
run ``worker_concurrency`` tasks
-- Make sure that the ``pool_slots`` of a Tasks matches with the 
``worker_concurrency`` of the worker
-- Queue names are limited to 256 characters
-
-See :doc:`apache-airflow:administration-and-deployment/modules_management` for 
details on how Python and Airflow manage modules.
-
-Current Limitations Edge Executor
----------------------------------
-
-If you plan to use the Edge Executor / Worker in the current stage you need to 
ensure you test properly
-before use. The following features have been initially tested and are working:
-
-- Some core operators
-
-  - ``BashOperator``
-  - ``PythonOperator``
-  - ``@task`` decorator
-  - ``@task.branch`` decorator
-  - ``@task.virtualenv`` decorator
-  - ``@task.bash`` decorator
-  - Dynamic Mapped Tasks
-  - XCom read/write
-  - Variable and Connection access
-  - Setup and Teardown tasks
-
-- Some known limitations
-
-  - Tasks that require DB access will fail - no DB connection from remote site 
is possible
-    (which is the default in Airflow 3.0)
-  - This also means that some direct Airflow API via Python is not possible 
(e.g. airflow.models.*)
-  - Log upload will only work if you use a single web server instance or they 
need to share one log file volume.
-    Logs are uploaded in chunks and are transferred via API. If you use 
multiple webservers w/o a shared log volume
-    the logs will be scattered across the webserver instances.
-  - Performance: No extensive performance assessment and scaling tests have 
been made. The edge executor package is
-    optimized for stability. This will be incrementally improved in future 
releases. Setups have reported stable
-    operation with ~50 workers until now. Note that executed tasks require 
more webserver API capacity.
-
-
-Architecture
-------------
-
-.. graphviz::
-
-    digraph A{
-        rankdir="TB"
-        node[shape="rectangle", style="rounded"]
-
-
-        subgraph cluster {
-            label="Cluster";
-            {rank = same; dag; database}
-            {rank = same; workers; scheduler; web}
-
-            workers[label="(Central) Workers"]
-            scheduler[label="Scheduler"]
-            web[label="Web server"]
-            database[label="Database"]
-            dag[label="DAG files"]
-
-            web->workers
-            web->database
-
-            workers->dag
-            workers->database
-
-            scheduler->dag
-            scheduler->database
-        }
-
-        subgraph edge_worker_subgraph {
-            label="Edge site";
-            edge_worker[label="Edge Worker"]
-            edge_dag[label="DAG files (Remote)"]
-
-            edge_worker->edge_dag
-        }
-
-        edge_worker->web[label="HTTP(s)"]
-    }
-
-Airflow consist of several components:
-
-* **Workers** - Execute the assigned tasks - most standard setup has local or 
centralized workers, e.g. via Celery
-* **Edge Workers** - Special workers which pull tasks via HTTP as provided as 
feature via this provider package
-* **Scheduler** - Responsible for adding the necessary tasks to the queue
-* **Web server** - HTTP Server provides access to DAG/task status information
-* **Database** - Contains information about the status of tasks, DAGs, 
Variables, connections, etc.
-
+To understand the setup of the Edge Executor, please also take a look to 
:doc:`architecture`.
 
 .. _edge_executor:queue:
 
@@ -197,6 +50,8 @@ could take thousands of tasks without a problem), or from an 
environment
 perspective (you want a worker running from a specific location where required
 infrastructure is available).
 
+.. _edge_executor:concurrency_slots:
+
 Concurrency slot handling
 -------------------------
 
@@ -239,93 +94,14 @@ Here is an example setting pool_slots for a task:
 
         task_with_template()
 
-Worker maintenance
-------------------
-
-Sometimes infrastructure needs to be maintained. The Edge Worker provides a
-maintenance mode to
-- Stop accepting new tasks
-- Drain all ongoing work gracefully
-
-Worker status can be checked via the web UI in the "Admin" - "Edge Worker 
Hosts" page.
-
-.. image:: img/worker_hosts.png
-
-.. note::
-
-    As of time of writing the web UI to see edge jobs and manage workers is 
not ported to Airflow 3.0
-
-
-Worker maintenance can also be triggered via the CLI command
-
-.. code-block:: bash
-
-    airflow edge maintenance --comments "Some comments for the maintenance" on
-
-This will stop the worker from accepting new tasks and will complete running 
tasks.
-If you add the command argument ``--wait`` the CLI will wait until all
-running tasks are completed before return.
-
-If you want to know the status of a worker while waiting on maintenance you can
-use the command
-
-.. code-block:: bash
-
-    airflow edge status
-
-This will show the status of the worker as JSON and the tasks running on it.
-
-The status and maintenance comments will also be shown in the web UI
-in the "Admin" - "Edge Worker Hosts" page.
-
-.. image:: img/worker_maintenance.png
-
-The worker can be started to fetch new tasks via the command
-
-.. code-block:: bash
-
-    airflow edge maintenance off
-
-This will start the worker again and it will start accepting tasks again.
-
-
-Feature Backlog of MVP to Release Readiness
--------------------------------------------
-
-The current version of the EdgeExecutor is a MVP (Minimum Viable Product). It 
will mature over time.
-
-The following features are known missing and will be implemented in increments:
-
-- API token per worker: Today there is a global API token available only
-- Edge Worker Plugin
-
-  - Overview about queues / jobs per queue
-  - Allow starting Edge Worker REST API separate to webserver
-  - Add some hints how to setup an additional worker
-
-- Edge Worker CLI
-
-  - Use WebSockets instead of HTTP calls for communication
-  - Send logs also to TaskFileHandler if external logging services are used
-  - Integration into telemetry to send metrics from remote site
-  - Publish system metrics with heartbeats (CPU, Disk space, RAM, Load)
-  - Be more liberal e.g. on patch version. Currently requires exact version 
match
-    (In current state if versions do not match, the worker will gracefully shut
-    down when jobs are completed, no new jobs will be started)
-
-- Tests
-
-  - Integration tests in Github
-  - Test/Support on Windows for Edge Worker
-
-- Scaling test - Check and define boundaries of workers/jobs. Today it is 
known to
-  scale into a range of 50 workers. This is not a hard limit but just an 
experience reported.
-- Load tests - impact of scaled execution and code optimization
-- Incremental logs during task execution can be served w/o shared log disk on 
webserver
+Current Limitations Edge Executor
+---------------------------------
 
-- Documentation
+- Some known limitations
 
-  - Describe more details on deployment options and tuning
-  - Provide scripts and guides to install edge components as service (systemd)
-  - Extend Helm-Chart for needed support
-  - Provide an example docker compose for worker setup
+  - Log upload will only work if you use a single web server instance or they 
need to share one log file volume.
+    Logs are uploaded in chunks and are transferred via API. If you use 
multiple webservers w/o a shared log volume
+    the logs will be scattered across the webserver instances.
+  - Performance: No extensive performance assessment and scaling tests have 
been made. The edge executor package is
+    optimized for stability. This will be incrementally improved in future 
releases. Setups have reported stable
+    operation with ~50 workers until now. Note that executed tasks require 
more webserver API capacity.
diff --git a/providers/edge3/docs/img/distributed_architecture.svg 
b/providers/edge3/docs/img/distributed_architecture.svg
new file mode 100644
index 00000000000..1bbb56c6fce
--- /dev/null
+++ b/providers/edge3/docs/img/distributed_architecture.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd";>
+<svg xmlns="http://www.w3.org/2000/svg"; style="background-color: rgb(255, 255, 
255);" xmlns:xlink="http://www.w3.org/1999/xlink"; version="1.1" width="1313px" 
height="650px" viewBox="-0.5 -0.5 1313 650" content="&lt;mxfile 
host=&quot;cwiki.apache.org&quot; modified=&quot;2025-04-27T20:09:54.050Z&quot; 
agent=&quot;Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:136.0) Gecko/20100101 
Firefox/136.0&quot; etag=&quot;FpFJgG-N56l8xpwPFAtq&quot; 
version=&quot;24.4.0&quot; type=&quot;atlas&quot; scale [...]
diff --git a/providers/edge3/docs/img/edge_package.svg 
b/providers/edge3/docs/img/edge_package.svg
new file mode 100644
index 00000000000..24408170c9c
--- /dev/null
+++ b/providers/edge3/docs/img/edge_package.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd";>
+<svg xmlns="http://www.w3.org/2000/svg"; style="background-color: rgb(255, 255, 
255);" xmlns:xlink="http://www.w3.org/1999/xlink"; version="1.1" width="1091px" 
height="561px" viewBox="-0.5 -0.5 1091 561" content="&lt;mxfile 
host=&quot;cwiki.apache.org&quot; modified=&quot;2025-04-27T20:17:45.388Z&quot; 
agent=&quot;Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:136.0) Gecko/20100101 
Firefox/136.0&quot; etag=&quot;gPkWIMjSOMdvw7VuzY7t&quot; 
version=&quot;24.4.0&quot; type=&quot;atlas&quot; scale [...]
diff --git a/providers/edge3/docs/index.rst b/providers/edge3/docs/index.rst
index 93904372e55..35328e9c98d 100644
--- a/providers/edge3/docs/index.rst
+++ b/providers/edge3/docs/index.rst
@@ -28,7 +28,11 @@
     Home <self>
     Changelog <changelog>
     Security <security>
-    Installation on Windows <install_on_windows>
+    Why using Edge <why_edge>
+    Architecture <architecture>
+    Edge Worker Deployment <deployment>
+    Edge UI Plugin <ui_plugin>
+    Worker on Windows <install_on_windows>
 
 
 .. toctree::
@@ -47,6 +51,13 @@
     Configuration <configurations-ref>
     CLI <cli-ref>
     Python API <_api/airflow/providers/edge3/index>
+
+.. toctree::
+    :hidden:
+    :maxdepth: 1
+    :caption: Resources
+
+    Example DAGs 
<https://github.com/apache/airflow/tree/providers-edge3/|version|/providers/edge3/src/airflow/providers/edge3/example_dags>
     PyPI Repository <https://pypi.org/project/apache-airflow-providers-edge3/>
     Installing from sources <installing-providers-from-sources>
 
diff --git a/providers/edge3/docs/install_on_windows.rst 
b/providers/edge3/docs/install_on_windows.rst
index 04841c5486b..5ac0cf61784 100644
--- a/providers/edge3/docs/install_on_windows.rst
+++ b/providers/edge3/docs/install_on_windows.rst
@@ -26,31 +26,25 @@ Install Edge Worker on Windows
     due to Python OS restrictions and if currently of Proof-of-Concept quality.
 
 
-The setup was tested on Windows 10 with Python 3.12.8, 64-bit.
+The setup was tested on Windows 10 with Python 3.12.8, 64-bit. Backend for 
tests was Airflow 2.10.5.
 To setup a instance of Edge Worker on Windows, you need to follow the steps 
below:
 
 1. Install Python 3.9 or higher.
-2. Create an empty folder as base to start with. In our example it is 
``C:\\Airflow``.
-3. Start Shell/Command Line in ``C:\\Airflow`` and create a new virtual 
environment via: ``python -m venv venv``
-4. Activate the virtual environment via: ``venv\\Scripts\\activate.bat``
+2. Create an empty folder as base to start with. In our example it is 
``C:\Airflow``.
+3. Start Shell/Command Line in ``C:\Airflow`` and create a new virtual 
environment via: ``python -m venv venv``
+4. Activate the virtual environment via: ``venv\Scripts\activate.bat``
 5. Install Edge provider using the Airflow constraints as of your Airflow 
version via
    ``pip install apache-airflow-providers-edge3 --constraint 
https://raw.githubusercontent.com/apache/airflow/constraints-2.10.5/constraints-3.12.txt``.
-   (or alternative build and copy the wheel of the edge provider to the folder 
``C:\\Airflow``.
-   This document used 
``apache_airflow_providers_edge-0.9.7rc0-py3-none-any.whl``, install the wheel 
file with the
-   Airflow constraints matching your Airflow and Python version:
-   ``pip install apache_airflow_providers_edge-0.9.7rc0-py3-none-any.whl 
apache-airflow==2.10.5 virtualenv --constraint 
https://raw.githubusercontent.com/apache/airflow/constraints-2.10.5/constraints-3.12.txt``)
-6. Create a new folder ``dags`` in ``C:\\Airflow`` and copy the relevant DAG 
files in it.
-   (At least the DAG files which should be executed on the edge alongside the 
dependencies. For testing purposes
-   the DAGs from the ``apache-airflow`` repository can be used located in
-   
<https://github.com/apache/airflow/tree/main/providers/edge3/src/airflow/providers/edge3/example_dags>.)
+6. Create a new folder ``dags`` in ``C:\Airflow`` and copy the relevant DAG 
files in it.
+   (At least the DAG files which should be executed on the edge alongside the 
dependencies.)
 7. Collect needed parameters from your running Airflow backend, at least the 
following:
 
   - ``edge`` / ``api_url``: The HTTP(s) endpoint where the Edge Worker 
connects to
-  - ``core`` / ``internal_api_secret_key``: The shared secret key between the 
webserver and the Edge Worker
+  - ``core`` / ``internal_api_secret_key``: The shared secret key between the 
api-server and the Edge Worker
   - Any proxy details if applicable for your environment.
 
 8. Create a worker start script to prevent repeated typing. Create a new file 
``start_worker.bat`` in
-   ``C:\\Airflow`` with the following content - replace with your settings:
+   ``C:\Airflow`` with the following content - replace with your settings:
 
 .. code-block:: bash
 
@@ -59,7 +53,7 @@ To setup a instance of Edge Worker on Windows, you need to 
follow the steps belo
     set AIRFLOW__LOGGING__BASE_LOG_FOLDER=edge_logs
     set 
AIRFLOW__EDGE__API_URL=https://your-hostname-and-port/edge_worker/v1/rpcapi
     set 
AIRFLOW__CORE__EXECUTOR=airflow.providers.edge3.executors.edge_executor.EdgeExecutor
-    set AIRFLOW__CORE__INTERNAL_API_SECRET_KEY=<steal this from your 
deployment...>
+    set AIRFLOW__CORE__INTERNAL_API_SECRET_KEY=<use this as configured 
centrally in api-server...>
     set AIRFLOW__CORE__LOAD_EXAMPLES=False
     set AIRFLOW_ENABLE_AIP_44=true
     @REM Add if needed: set http_proxy=http://my-company-proxy.com:3128
diff --git a/providers/edge3/docs/ui_plugin.rst 
b/providers/edge3/docs/ui_plugin.rst
new file mode 100644
index 00000000000..78a3ee8e3a3
--- /dev/null
+++ b/providers/edge3/docs/ui_plugin.rst
@@ -0,0 +1,66 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Edge UI Plugin and REST API
+===========================
+
+The Edge provider uses a Plugin to
+
+- Extend the REST API endpoints for connecting workers to the Airflow cluster
+- Provide a web UI for managing the workers and monitoring their status and 
tasks
+  (Note: The UI is currently only available in Airflow 2.10+, implementation 
for
+  Airflow 3.0 depends on completion of AIP-68)
+
+REST API endpoints
+------------------
+
+The Edge provider adds the following REST API endpoints to the Airflow API:
+
+- ``/api/v1/edge/health``: Check that the API endpoint is deployed and active
+- ``/api/v1/edge/jobs``: Endpoints to fetch jobs for workers and report state
+- ``/api/v1/edge/logs``: Endpoint to push log chunks from workers to the 
Airflow cluster
+- ``/api/v1/edge/workers``: Endpoints to register and manage workers, report 
heartbeat
+
+To see full documentation of the API endpoints open the Airflow web UI and 
navigate to
+the sub-path ``/edge_worker/v1/docs`` (Airflow 3.0) or ``/edge_worker/v1/ui`` 
(Airflow 2.10).
+
+Web UI Plugin (Airflow 2.10 only)
+---------------------------------
+
+.. note::
+
+    As of time of writing the web UI to see edge jobs and manage workers is 
not ported to Airflow 3.0.
+    Until this is available you can use the CLI commands as described in 
:ref:`deployment:maintenance-mgmt-cli`.
+
+The Edge provider adds a web UI plugin to the Airflow web UI. The plugin is
+made to be able to see job queue and Edge Worker status.
+
+Pending and processes tasks can be checked in "Admin" - "Edge Worker Jobs" 
page.
+
+Worker status can be checked via the web UI in the "Admin" - "Edge Worker 
Hosts" page.
+
+.. image:: img/worker_hosts.png
+
+Via the UI you can also set the status of the worker to "Maintenance" or 
"Active".
+
+The status and maintenance comments will also be shown in the web UI
+in the "Admin" - "Edge Worker Hosts" page.
+
+.. image:: img/worker_maintenance.png
+
+Note that maintenance mode can also be adjusted via CLI.
+See :ref:`deployment:maintenance` for more details.
diff --git a/providers/edge3/docs/why_edge.rst 
b/providers/edge3/docs/why_edge.rst
new file mode 100644
index 00000000000..c9b67e7c36c
--- /dev/null
+++ b/providers/edge3/docs/why_edge.rst
@@ -0,0 +1,53 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Why using Edge Worker?
+======================
+
+Apache Airflow implements a distributed execution architecture. The Airflow 
scheduler
+is responsible for scheduling tasks and sending them to the workers. The 
workers are
+responsible for executing the tasks. The Airflow scheduler and workers are 
typically
+deployed in the same data center.
+
+Most popular execution options for distributed setups are based on the 
CeleryExecutor or
+KubernetesExecutor. The CeleryExecutor is a distributed task queue that allows 
you to run
+tasks in parallel across multiple workers. These workers are connected via a 
task queue,
+typically using Redis or RabbitMQ.
+The KubernetesExecutor is a cloud-native execution option that allows you to 
run tasks in
+Kubernetes Pods. The KubernetesExecutor is a great option for organizations 
that are already
+using Kubernetes for their infrastructure. It allows you to take advantage of 
the scalability
+and flexibility of Kubernetes to run your Airflow tasks. However, it requires 
a Kubernetes
+cluster and a Kubernetes service account with the necessary permissions to 
create and manage
+Pods.
+
+The Edge Worker is a execution option that allows you to run Airflow tasks on 
edge devices.
+The Edge Worker is designed to be lightweight and easy to deploy. It allows 
you to run Airflow
+tasks on machines that are not part of your main data center, e.g. edge 
servers. This is
+especially useful when deployments need to cross multiple data centers or 
security perimeters
+like firewalls. For Celery for example a stable TCP connection is required 
between the task
+queue (e.g. Redis) and the workers which can be hard to operate on wide-area 
networks.
+To run Kubernetes Pods the scheduler needs access to API endpoints of the 
Kubernetes cluster
+which is not always possible in edge deployments. Alternatively sometimes it 
is possible to
+execute work on the edge devices via SSHOperator but this requires also a 
direct and stable
+TCP connection to the edge devices.
+
+Target of the Edge Worker is to have a lean setup that allows task execution 
on edge devices
+with only (outbound) HTTPS access. Edge Workers will be able connect, pull and 
execute tasks
+with a simple deployment.
+
+.. image:: img/distributed_architecture.svg
+   :alt: Distributed architecture
diff --git a/providers/edge3/provider.yaml b/providers/edge3/provider.yaml
index c75a9fa076d..867346ebda5 100644
--- a/providers/edge3/provider.yaml
+++ b/providers/edge3/provider.yaml
@@ -18,7 +18,19 @@
 package-name: apache-airflow-providers-edge3
 name: Edge Executor
 description: |
-  Handle edge workers on remote sites via HTTP(s) connection and orchestrates 
work over distributed sites
+  Handle edge workers on remote sites via HTTP(s) connection and orchestrates 
work over distributed sites.
+
+  When tasks need to be executed on remote sites where the connection need to 
pass through
+  firewalls or other network restrictions, the Edge Worker can be deployed. 
The Edge Worker
+  is a lightweight process with reduced dependencies. The worker only needs to 
be able to
+  communicate with the central Airflow site via HTTPS.
+
+  In the central Airflow site the EdgeExecutor is used to orchestrate the 
work. The EdgeExecutor
+  is a custom executor which is used to schedule tasks on the edge workers. 
The EdgeExecutor can co-exist
+  with other executors (for example CeleryExecutor or KubernetesExecutor) in 
the same Airflow site.
+
+  Additional REST API endpoints are provided to distribute tasks and manage 
the edge workers. The endpoints
+  are provided by the API server.
 
 state: ready
 source-date-epoch: 1741121867
diff --git a/providers/edge3/src/airflow/providers/edge3/get_provider_info.py 
b/providers/edge3/src/airflow/providers/edge3/get_provider_info.py
index af5c8f9696e..63479e7e482 100644
--- a/providers/edge3/src/airflow/providers/edge3/get_provider_info.py
+++ b/providers/edge3/src/airflow/providers/edge3/get_provider_info.py
@@ -25,7 +25,7 @@ def get_provider_info():
     return {
         "package-name": "apache-airflow-providers-edge3",
         "name": "Edge Executor",
-        "description": "Handle edge workers on remote sites via HTTP(s) 
connection and orchestrates work over distributed sites\n",
+        "description": "Handle edge workers on remote sites via HTTP(s) 
connection and orchestrates work over distributed sites.\n\nWhen tasks need to 
be executed on remote sites where the connection need to pass 
through\nfirewalls or other network restrictions, the Edge Worker can be 
deployed. The Edge Worker\nis a lightweight process with reduced dependencies. 
The worker only needs to be able to\ncommunicate with the central Airflow site 
via HTTPS.\n\nIn the central Airflow site the Ed [...]
         "plugins": [
             {
                 "name": "edge_executor",


Reply via email to