Yeah, state recovery is a bit more difficult with Heron's architecture. In
Storm, the task IDs are not just values used for routing they actually
equate to a task instance within the executor. An executor which currently
processes the keys 4-8 actually contains 5 task instances of the same
component. So for each task, they just save its state attached to the
single task ID and reassemble executors with the new task instances.
We don't want or have to do that with Heron instances but we would need to
have some way to have a state change tied to the task (or routing key if we
go to the key range idea). For something like a word count you might save
counts using a nested map like: { routing key : {word : count }}. The
routing key could be included in the Tuple instance. However, whether this
pattern would work for more generic state cases I don't know?
Tom Cooper
W: www.tomcooper.org.uk | Twitter: @tomncooper
<https://twitter.com/tomncooper>
On Fri, 4 May 2018 at 15:54, Neng Lu <[email protected]> wrote:
> +1 for this idea. As long as the predefined key space is large enough, it
> should work for most of the cases.
>
> Based on my experience with topologies, I never saw one component has more
> than 1000 instances in a topology.
>
> For recovering states from an update, there will be some problems though.
> Since the states stored in heron are strongly connected with each instance,
> we either need to have
> some resolver does the state repartitioning or stores states with the key
> instead of with each instance.
>
>
>
> On Fri, May 4, 2018 at 3:01 PM, Karthik Ramasamy <[email protected]>
> wrote:
>
> > Thanks for sharing. I like the Storm approach
> >
> > - keeps the implementation simpler
> > - state is deterministic across restarts
> > - makes it easy to reason and debug
> >
> > The hard limit is not a problem at all since most of the topologies will
> > be never that big.
> > If you can handle Twitter topologies cleanly, it is more that sufficient
> I
> > believe.
> >
> > cheers
> > /karthik
> >
> > > On May 4, 2018, at 2:31 PM, Thomas Cooper <[email protected]>
> > wrote:
> > >
> > > Hi all,
> > >
> > > A while ago I emailed about the issue of how fields (key) grouped
> routing
> > > in Heron was not consistent across an update and how this makes
> > preserving
> > > state across an update very difficult and also makes it
> > > difficult/impossible to analyse or predict tuple flows through a
> > > current/proposed topology physical plan.
> > >
> > > I suggested adopting Storms approach of pre-defining a routing key
> > > space for each component (eg 0-999), so that instead of an instance
> > having
> > > a single task id that gets reset at every update (eg 10) it has a range
> > of
> > > id's (eg 10-16) that changes depending on the parallelism of the
> > component.
> > > This has the advantage that a key will always hash to the same task ID
> > for
> > > the lifetime of the topology. Meaning recovering state for an instance
> > > after a crash or update is just a case of pulling the state linked to
> the
> > > keys in its task ID range.
> > >
> > > I know the above proposal has issues, not least of all placing a hard
> > upper
> > > limit on the scale out of a component, and that some alternative ideas
> > are
> > > being floated for solving the stateful update issue. However, I just
> > wanted
> > > to throw some more weight behind the Storm approach. There was a recent
> > > paper about high-performance network load balancing
> > > <https://blog.acolyer.org/2018/05/03/stateless-
> > datacenter-load-balancing-with-beamer/>that
> > > describes an approach using a fixed key space similar to Storm's (see
> the
> > > section called Stable Hashing - they assign a range 100x the expected
> > > connection pool size - which we could do with heron to prevent ever
> > hitting
> > > the upper scaling limit). Also, this new load balancer, Beamer, claims
> to
> > > be twice as fast as Google's Maglev
> > > <https://blog.acolyer.org/2016/03/21/maglev-a-fast-and-
> > reliable-software-network-load-balancer/>
> > > which again uses a pre-defined keyspace and ID ranges to create look-up
> > > tables deterministically.
> > >
> > > I know a load balancer is a different beast to a stream grouping but
> > there
> > > are some interesting ideas in those papers (The links point to summary
> > blog
> > > posts so you don't have to read the whole paper).
> > >
> > > Anyway, I just thought I would those papers out there and see what
> people
> > > think.
> > >
> > > Tom Cooper
> > > W: www.tomcooper.org.uk | Twitter: @tomncooper
> > > <https://twitter.com/tomncooper>
> >
> >
>