Re: Load distribution in cluster mode

Andrew Purtell Mon, 09 Feb 2015 15:44:55 -0800

>

*Automated Cluster Load Balancing*

I put your excellent response up as NIFI-337.


>

* Multiple Masters *

>  that single master is solely for command/control of changes to the
dataflow
> 
configuration and is a very lightweight process that does nothing more
> 
than that.  If the master dies then all nodes continue to do what they
> 
were doing and even site-to-site continues to distribute data.  It
> 
just does so without updates on current loading across the cluster.

Yes but imagine a NiFi installation, perhaps a hosted service built on top
of it, where DataFlow Managers expect the command and control aspect of the
system to be as robust and available as flow processing itself. If one or
more standby masters are waiting in the wings to take over service for the
failed active master then automated and unattended failover would be
possible, and likely to narrow the interval where administrative changes
may fail.



On Sun, Feb 8, 2015 at 2:24 PM, Joe Witt <[email protected]> wrote:

> Andrew
>
> 
> *Automated Cluster Load Balancing*
>
> The processors themselves are available and ready to run on all nodes
> at all times.  It's really just a question of whether they have data
> to run on.  We have always taken the view that 'if you want scalable
> dataflow' use scalable interfaces.  And I think that is the way to go
> in every case you can pull it off.  That generally meant one should
> use datasources which offer queueing semantics where multiple
> independent nodes can pull from the queue with 'at-least-once'
> guarantees.  In addition each node has back pressure so if it falls
> behind it slows its rate of pickup which  means other nodes in the
> cluster can pickup the slack.  This has worked extremely well.
>
> That said, I recognize that it simply isn't always possible to use
> scalable interfaces and given enough non-scalable datasources the
> cluster could become out of balance.  So this certainly seems like a
> good [valuable, fun, non-trivial] problem to tackle.  If we allow
> connections between processors to be auto-balanced then it will make
> for a pretty smooth experience as users won't really have to think too
> much about it.
>
> 
> So the key will be how to devise and implement an algorithm or
> approach to spreading that load intelligently and so data doesn't just
> bounce back and forth.  If anyone knows of good papers, similar
> systems, or approaches they can describe for how to think through this
> that would be great.  Things we'll have to think about here that come
> to mind:
>
> - When to start spreading the load (at what factor should we start
> spreading work across the cluster)
>
> - Whether it should auto-spread by default and the user can tell it
> not to in certain cases or whether it should not spread by default and
> the user can activate it
>
> - What the criteria are by which we should let a user control how data
> is partitioned (some key, round robin, etc..).  How to
> rebalance/re-assign partitions if a node dies or comes on-line
>
> There are 'counter cases' too that we must keep in mind such as
> aggregation or bin packing grouped by some key.  In those cases all
> data would need to be merged together at some point and thus all data
> needs to be accessible at some point. Whether that means we direct all
> data to a single node or whether we enable cross-cluster data
> addressing is also a topic there.
>
> 
> * Multiple Masters *
>
> So it is true that we have a single master model today.  But that
> single master is solely for command/control of changes to the dataflow
> configuration and is a very lightweight process that does nothing more
> than that.  If the master dies then all nodes continue to do what they
> were doing and even site-to-site continues to distribute data.  It
> just does so without updates on current loading across the cluster.
> Once the master is brought back on-line then the real-time command and
> control functions return.  Building support for a back-up master to
> offer HA of even the command/control side would probably also be a
> considerable effort.  This one I'd be curious to hear of cases where
> it was critical to make this part HA.
>
> Excited to be talking about this level of technical detail.
>
> Thanks
> Joe
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Load distribution in cluster mode

Reply via email to