Andrew

*Automated Cluster Load Balancing*

The processors themselves are available and ready to run on all nodes
at all times.  It's really just a question of whether they have data
to run on.  We have always taken the view that 'if you want scalable
dataflow' use scalable interfaces.  And I think that is the way to go
in every case you can pull it off.  That generally meant one should
use datasources which offer queueing semantics where multiple
independent nodes can pull from the queue with 'at-least-once'
guarantees.  In addition each node has back pressure so if it falls
behind it slows its rate of pickup which  means other nodes in the
cluster can pickup the slack.  This has worked extremely well.

That said, I recognize that it simply isn't always possible to use
scalable interfaces and given enough non-scalable datasources the
cluster could become out of balance.  So this certainly seems like a
good [valuable, fun, non-trivial] problem to tackle.  If we allow
connections between processors to be auto-balanced then it will make
for a pretty smooth experience as users won't really have to think too
much about it.

So the key will be how to devise and implement an algorithm or
approach to spreading that load intelligently and so data doesn't just
bounce back and forth.  If anyone knows of good papers, similar
systems, or approaches they can describe for how to think through this
that would be great.  Things we'll have to think about here that come
to mind:

- When to start spreading the load (at what factor should we start
spreading work across the cluster)

- Whether it should auto-spread by default and the user can tell it
not to in certain cases or whether it should not spread by default and
the user can activate it

- What the criteria are by which we should let a user control how data
is partitioned (some key, round robin, etc..).  How to
rebalance/re-assign partitions if a node dies or comes on-line

There are 'counter cases' too that we must keep in mind such as
aggregation or bin packing grouped by some key.  In those cases all
data would need to be merged together at some point and thus all data
needs to be accessible at some point. Whether that means we direct all
data to a single node or whether we enable cross-cluster data
addressing is also a topic there.

* Multiple Masters *

So it is true that we have a single master model today.  But that
single master is solely for command/control of changes to the dataflow
configuration and is a very lightweight process that does nothing more
than that.  If the master dies then all nodes continue to do what they
were doing and even site-to-site continues to distribute data.  It
just does so without updates on current loading across the cluster.
Once the master is brought back on-line then the real-time command and
control functions return.  Building support for a back-up master to
offer HA of even the command/control side would probably also be a
considerable effort.  This one I'd be curious to hear of cases where
it was critical to make this part HA.

Excited to be talking about this level of technical detail.

Thanks
Joe

Reply via email to