[ 
https://issues.apache.org/jira/browse/NIFI-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068621#comment-18068621
 ] 

Prabhjyot Singh commented on NIFI-8843:
---------------------------------------

Hi all,

I took a stab at implementing HA support for NiFi Registry and wanted to share 
my attempt here for feedback from the community.

The implementation adds full multi-node HA to NiFi Registry for the first time, 
with two selectable coordination backends (zookeeper and database) and zero 
behavioural change for existing single-node deployments.

What's covered:
- ZooKeeper Curator LeaderSelector (and a DB TTL fallback) for leader election
- Write replication via WriteReplicationFilter — followers return 307 Temporary 
Redirect to the leader (preserving mTLS client identity end-to-end), the leader 
fans out asynchronously to all followers
- Push-based cache coherency via ZK ZNode watchers (or CACHE_VERSION table 
polling in DB mode)
- Durable event delivery via ClusterAwareEventService with at-least-once 
semantics
- Bootstrap DB sync using H2's native SCRIPT/RUNSCRIPT — no external tooling 
needed
- Maintenance mode endpoint and cluster health indicator via Spring Boot 
Actuator
- ZooKeeper TLS support
- Flyway V9 migration adding CACHE_VERSION, CLUSTER_LEADER, and REGISTRY_EVENT 
tables

PR with full details, flow diagrams, and design notes: 
https://github.com/apache/nifi/pull/11046

I'm sure there are areas that need improvement — happy to iterate based on any 
feedback. Appreciate the community's time in reviewing this.

Thanks


> Maintenance mode switch via REST API for data backup
> ----------------------------------------------------
>
>                 Key: NIFI-8843
>                 URL: https://issues.apache.org/jira/browse/NIFI-8843
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: NiFi Registry
>            Reporter: Kevin Doran
>            Priority: Minor
>              Labels: HA
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, NiFi Registry does not offer High Availability (HA) out of the 
> box. One has to configure an environment around one or more NiFi Registry 
> instances to achieve the required level of recoverability and availability.
> This is not a requirement in many deployment scenarios as NiFi Registry is on 
> the critical path of most system architectures. That is, it is a place to 
> save and retrieve versions of flows and extensions, but if NiFi Registry is 
> temporarily offline, NiFi data flows deployed to NiFi and MiNiFi instances 
> continue to function just fine.
> However, a bigger concern is data availability and backup; that is, the 
> guarantee that data persisted to NiFi Registry is not lost due to an instance 
> failure. Eventually, it will be nice to offer a NiFi Registry HA solution 
> that allows for replicated data or external persistence providers (that 
> themselves can be HA).
> In the meantime, folks are looking for the best way to build their own data 
> backup and recovery solutions for NiFi Registry. A lot of possible solutions 
> and recommendations for backup and recovery or [cold-slave 
> failover|http://www.sonatype.org/nexus/2015/07/10/high-availability-ha-and-continuous-integration-ci-with-nexus-oss/]
>  require copying the data in the NiFi Registry's home directory host storage 
> to another location, where it could be used to create another NiFi Registry 
> with the same data on demand, e.g., in a cloud migration or disaster recovery 
> scenario.
> If the NiFi Registry service is running when this copy operation is 
> performed, one risks copying partially-written data/records/files that could 
> be corrupted when later loaded/read from disk. One solution for this today is 
> to stop the NiFi Registry, but this leaves it unavailable for users and 
> scripts, which is not ideal. For example, continuous deployment scripts for 
> NiFi data flows that read flows from NiFi registry would not be able to 
> access a required service.
> In the long-term, it would be nice to offer proper HA NiFi Registry solution 
> out of the box. However, in the short-term, in order to avoid having to 
> shutdown NiFi Registry in order to initiate a backup, it would be nice for 
> admins to be able to put a NiFi Registry instance into "read only maintenance 
> mode", during which the contents of the NiFi Registry home directory could be 
> more safely copied to a backup location or cold spare. (I say "more safely" 
> because some files in the home directory, such as the default location for 
> logs, would continue to be written too, but the most important files, such as 
> those used by the file-based database and persistence providers, would 
> stabilize after existing write operations are flushed to disk.)
> Implementation thoughts:
>  - endpoints for turning maintenance mode on/off would fit in nicely as 
> custom endpoints under Actuator (NIFIREG-134), and therefore could be access 
> controlled but Actuator authorization rules
>  - when maintenance mode is enabled, a custom Spring filter could intercept 
> any requests that modify persisted state (eg, by resource path and HTTP 
> method pattern matching) return a "503 Service Unavailable" status code 
> indicating that the resource is temporarily unavailable. A spring filter 
> checking HTTP methods against resources is an approach already used to 
> authorize access to certain resources, so there might be an opportunity for 
> code-reuse there (the maintenance mode filter would need to be dynamically, 
> programmatically enabled/disabled, and instead of returning a 403, we would 
> return a 503)
>  - when maintenance mode is enabled, the /actuator/health endpoint could also 
> indicate this, giving clients a way to check if a server is in maintenance 
> mode or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to