Re: [heka] State and future of Heka

Rob Miller Fri, 03 Jun 2016 11:03:32 -0700

As I mentioned in the first message, Hindsight carries forward a lot of the 
ideas that made Heka so useful, and pretty much any Lua code that was written 
for Heka will work in Hindsight with little or no change.


Another idea that I might explore is generating a stripped down subset of Heka 
that only includes inputs, along with a Hindsight input that works with this 
stripped down subset. It's not ideal in that it would be two processes instead 
of one, but it could provide a bridge so that any input code that hasn't yet 
been ported to Hindsight could still be used to feed into a Hindsight centered 
pipeline.

-r


On 06/03/2016 04:19 AM, Mac Stork wrote:

Hi all,

Ali, I share your opinion concerning Heka's strengths. I also think that
Heka stands out because of the flexibility of its filters. There are few
to none lightweight data collectors/shippers that allow to process
events with that many decoders/filters/encoders, with the possibility of
chaining them. The numerous filtering possibilities was what made us use
Heka.

Concerning the alternative to Heka, i.e elastic's Beats: there is
obviously a lack of outputs. However things might take a turn and you
should look (might even participate) at this recent ticket about having
community-maintained outputs:
https://github.com/elastic/beats/pull/1681

Vincent

On 2 June 2016 at 22:22, Ali <[email protected]
<mailto:[email protected]>> wrote:

    Thanks, Rob!

    I have to say, I'm EXTREMELY DISAPPOINTED to hear this.

    I have been away from Heka for a while (working on other projects at
    work) and am now able to refocus on designing our new data
    collection/analysis/reporting system.  Once I read this e-mail, I
    started looking around to see what else was out there and what has
    changed over the last several months.  Elastic's Beats
    <https://www.elastic.co/products/beats> project, particularly
    Filebeat
    
<https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html>,
    seemed like a really interesting and welcome development.  However,
    compared to the flexibility of Heka's ins and outs, Filebeats seems
    to be wanting badly.

    Suffice it to say, Heka still seems to stand alone in this space.
    Its flexibility is amazing.  (Again, mostly talking about inputs and
    outputs here.)  The closest I can come to it is nxlog
    <http://nxlog-ce.sourceforge.net/about>, and I just really dislike
    that it's not more transparent and open-source.

    Anyway, I understand the rationale behind this decision and am
    hopeful that another org will continue work on this project.  Thanks
    for all of your efforts, Rob et al!

    -Ali

    P.S.  If anyone's interested, here's my situation right now:
    
https://www.reddit.com/r/bigdata/comments/4m81vo/which_log_collectors_to_use_for_robust_handling/
 and
    https://discuss.elastic.co/t/how-can-i-get-data-from-filebeat-to-flume/51734


    On Fri, May 6, 2016 at 12:51 PM Rob Miller <[email protected]
    <mailto:[email protected]>> wrote:

        Hi everyone,

        I'm loooong overdue in sending out an update about the current
        state of
        and plans for Heka. Unfortunately, what I have to share here will
        probably be disappointing for many of you, and it might impact
        whether
        or not you want to continue using it, as all signs point to Heka
        getting
        less support and fewer updates moving forward.

        The short version is that Heka has some design flaws that make
        it hard
        to incrementally improve it enough to meet the high throughput and
        reliability goals that we were hoping to achieve. While it would be
        possible to do a major overhaul of the code to resolve most of these
        issues, I don't have the personal bandwidth to do that work,
        since most
        of my time is consumed working on Mozilla's immediate data
        processing
        needs rather than general purpose tools these days. Hindsight
        (https://github.com/trink/hindsight), built around the same Lua
        sandbox
        technology as Heka, doesn't have these issues, and internally we're
        using it more and more instead of Heka, so there's no organizational
        imperative for me (or anyone else) to spend the time required to
        overhaul the Go code base.

        Heka is still in use here, though, especially on our edge nodes,
        so it
        will see a bit more improvement and at least a couple more releases.
        Most notably, it's on my list to switch to using the most recent Lua
        sandbox code, which will move most of the protobuf processing to
        custom
        C code, and will likely improve performance as well as remove a
        lot of
        the problematic cgo code, which is what's currently keeping us from
        being able to upgrade to a recent Go version.

        Beyond that, however, Heka's future is uncertain. The code
        that's there
        will still work, of course, but I may not be doing any further
        improvements, and my ability to keep up with support requests
        and PRs,
        already on the decline, will likely continue to wane.

        So what are the options? If you're using a significant amount of Lua
        based functionality, you might consider transitioning to
        Hindsight. Any
        Lua code that works in Heka will work in Hindsight. Hindsight is
        a much
        leaner and more solid foundation. Hindsight has far fewer i/o
        plugins
        than Heka, though, so for many it won't be a simple transition.

        Also, if there's someone out there (an organization, most
        likely) that
        has a strong interest in keeping Heka's codebase alive, through
        funding
        or coding contributions, I'd be happy to support that endeavor. Some
        restrictions apply, however; the work that needs to be done to
        improve
        Heka's foundation is not beginner level work, and my time to help is
        very limited, so I'm only willing to support folks who
        demonstrate that
        they are up to the task. Please contact me off-list if you or your
        organization is interested.

        Anyone casually following along can probably stop reading here.
        Those of
        you interested in the gory details can read on to hear more
        about what
        the issues are and how they might be resolved.

        First, I'll say that I think there's a lot that Heka got right. The
        basic composition of the pipeline (input -> split -> decode ->
        route ->
        process -> encode -> output) seems to hit a sweet spot for
        composability
        and reuse. The Lua sandbox, and especially the use of LPEG for text
        parsing and transformation, has proven to be extremely efficient and
        powerful; it's the most important and valuable part of the Heka
        stack.
        The routing infrastructure is efficient and solid. And, perhaps most
        importantly, Heka is useful; there are a lot of you out there
        using it
        to get work done.

        There was one fundamental mistake made, however, which is that we
        shouldn't have used channels. There are many competing opinions
        about Go
        channels. I'm not going to get in to whether or not they're *ever* a
        good idea, but I will say unequivocally that their use as the
        means of
        pushing messages through the Heka pipeline was a mistake, for a
        number
        of reasons.

        First, they don't perform well enough. While Heka performs many
        tasks
        faster than some other popular tools, we've consistently hit a
        throughput ceiling thanks to all of the synchronization that
        channels
        require. And this ceiling, sadly, is generally lower than is
        acceptable
        for the amount of data that we at Mozilla want to push through our
        aggregators single system.

        Second, they make it very hard to prevent message loss. If
        unbuffered
        channels are used everywhere, performance plummets unacceptably
        due to
        context-switching costs. But using buffered channels means that many
        messages are in flight at a time, most of which are sitting in
        channels
        waiting to be processed. Keeping track of which messages have
        made it
        all the way through the pipeline requires complicated coordination
        between chunks of code that are conceptually quite far away from
        each other.

        Third, the buffered channels mean that Heka consumes much more
        RAM than
        would be otherwise needed, since we have to pre-allocate a pool of
        messages. If the pool size is too small, then Heka becomes
        susceptible
        to deadlocks, with all of the available packs sitting in channel
        queues,
        unable to be processed because some plugin is blocked on waiting
        for an
        available pack. But cranking up the pool size causes Heka to use
        more
        memory, even when it's idle.

        Hindsight avoids all of these problems by using disk queues
        instead of
        RAM buffers between all of the processing stages. It's a bit
        counterintuitive, but at high throughput performance is actually
        better
        than with RAM buffers, because a) there's no need for
        synchronization
        locks and b) the data is typically read quickly enough after it's
        written that it stays in the disk cache.

        There's much less chance of message loss, because every plugin is
        holding on to only one message in memory at a time, while using a
        written-to-disk cursor file to track the current position in the
        disk
        buffer. If the plug is pulled mid-process, some messages that were
        already processed might be processed again, but nothing will be
        lost,
        and there's no need for complex coordination between different
        stages of
        the pipeline.

        Finally, there's no need for a pool of messages. Each plugin is
        holding
        some small number of packs (possibly as few as one) in its own
        memory
        space, and those packs never escape that plugin's ownership. RAM
        usage
        doesn't grow, and pool exhaustion related deadlocks are a thing
        of the past.

        For Heka to have a viable future, it would basically need to be
        updated
        to work almost exactly like Hindsight. First, all of the APIs
        would need
        to be changed to no longer refer to channels. (The fact that we
        exposed
        channels to the APIs is another mistake we made... it's now
        generally
        frowned upon in Go land to expose channels as part of your
        public APIs.)
        There's already a non-channel based API for filters and outputs, but
        most of the plugins haven't yet been updated to use the new API,
        which
        would need to happen.

        Then the hard work would start; a major overhaul of Heka's
        internals, to
        switch from channel based message passing to disk queue based
        message
        passing. The work that's been done to support disk buffering for
        filters
        and outputs is useful, but not quite enough, because it's not
        scalable
        for each plugin to have its own queue; the number of open file
        descriptors would grow very quickly. Instead it would need to
        work like
        Hindsight, where there's one queue that all of the inputs write
        to, and
        another that filters write to. Each plugin reads through its
        specified
        input queue, looking for messages that match its message matcher,
        writing its location in the queue back to the shared cursors file.

        There would also be some complexity in reconciling Heka's
        breakdown of
        the input stage into input/splitter/decoder with Hindsight's
        encapsulation of all of these stages into a single sandbox.

        Ultimately I think this would be at least 2-3 months full time
        work for
        me. I'm not the fastest coder around, but I know where the
        bodies are
        buried, so I'd guess it would take anyone else at least as long,
        possibly longer if they're not already familiar with how
        everything is
        put together.

        And that's about it. If you've gotten this far, thanks for reading.
        Also, thanks to everyone who's contributed to Heka in any way,
        be it by
        code, doc fixes, bug reports, or even just appreciation. I'm
        sorry for
        those of you using it regularly that there's not a more stable
        future.

        Regards,

        -r
        _______________________________________________
        Heka mailing list
        [email protected] <mailto:[email protected]>
        https://mail.mozilla.org/listinfo/heka


    _______________________________________________
    Heka mailing list
    [email protected] <mailto:[email protected]>
    https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] State and future of Heka

Reply via email to