Re: Bringing e-divisive into Otava repo, take 2

Alexander Sorokoumov Tue, 23 Sep 2025 21:38:49 -0700

Thanks for kick-starting this thread, Henrik!

I do agree that reimplementing missing parts within the project gives us a
clear way to upgrade supported Python versions and overall evolve the
project without complex cross-project dependencies.


Best,
Alex

On Tue, Sep 23, 2025 at 5:58 PM Austin Bennett <[email protected]> wrote:

> I follow the thought, and seems reasonable.  Having the external
> dependency seems to limit options, and if unlikely to get a donation
> creating rest of components seems reasonable.
>
>
>
>
> Maybe related:  Ultimately, looking forward to Otava being deployed to a
> usable/current version of python [ to one that is at least currently
> supported/getting-updates ].
>
> On Mon, Sep 22, 2025 at 11:37 AM Henrik Ingo <[email protected]> wrote:
>
> > Ok so let me restart this discussion...
> >
> > After a successful first release as an ASF incubating project, we started
> > discussing what to do with the main dependency, the
> > signal_processing_algorithms repo. Driving motivation here is that it is
> > rather central to what Otava is doing, and long term it will be better
> for
> > development if we can easily make changes in both halves.
> >
> > The guidance from our mentors was that the repo is too large to just copy
> > paste into Otava (2600 lines). For such additions, ASF usually prefers to
> > receive a copyright transfer/donation in writing from the original
> author /
> > copyright holder.
> >
> > So the guidance was that someone from the Otava IPMC (me) should contact
> > MongoDB to find out whether they would be open to such a transfer.For
> > context, we were already in contact with MongoDB when we drafted the
> > project proposal a year ago. While they were mostly enthusiastic, in
> > hindsight it seems formally joining Apache Otava (incubating) wasn't a
> > priority so that it would have actually happened. So chances are, the
> same
> > dialogue would play out again: general excitement, but a high risk that
> the
> > legal department has other priorities, and in the end we just wasted time
> > on talking instead of programming.
> >
> >
> >
> > So, I looked deeper into what we have in front of us, and have discussed
> > this off-list with Alex.
> >
> > While all of the codebase in the signal_processing_algorithms is indeed
> > over 2k lines, most of that is code we don't use in Otava, or don't need.
> > Also, the number is inflated, because the repo contains multiple
> different
> > implementations, all doing exactly the same thing,
> >
> >
> >
> >
> >
> >
> >                       in particular:
> >
> >
> > * Piotr already replaced the significance test (which is like the latter
> > half of what e-divisive does) with a student t test.
> > * Also for the main part of the algorithm, Piotr introduced the windowing
> > approach, which is novel and not in the MongoDB code
> >
> > While Piotr's implementation kind of wraps around the original MongoDB
> > e-divisive implementation, it could have been done more elegantly and
> > efficiently if it was modifying the e-divisive code directly. So that's
> > where we get into the discussion about why don't we just do that then.
> >
> >
> > * Finally, in Otava we have my optimziation from last year, the
> incremental
> > e-divisive implementation, which is also novel and MongoDB code has
> nothing
> > like that. However, it still uses the very core part of the e-divisive
> > algorithm, so from a "code coverage" perspective, it no longer reduces
> the
> > amount of lines that we depend on in the signal processing repo.
> >
> > When all the above is accounted for, there's about 100 lines of code that
> > executes the very heart of e-divisive: pairwise comparison of the data
> > points in a time series. This could be rewritten by someone just by
> > implementing line by line the math from the Matteson & James (2013) paper
> > (formulas 5 and 6). Given that Piotr's and my work also optimizes the
> > amount of needed computation a lot, for a first version we don't need to
> > implement this in C, nor use fancy numpy functions, it could just be the
> > double for loop that you get when implementing the \sum ... \sum
> (xi-xj)^2
> > from the paper.
> >
> >
> > There aren't a lot of drawbacks with this idea. Ralistically we drop
> > support for the --orig-edivisive mode, as that by definition depends on
> the
> > original signal_proccessing code.
> >
> >
> >
> > Let me know what you think
> > henrik
> >
> >
> >
> >
> >
> > --
> > *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance*
> >
> > Henrik Ingo, CEO
> > [email protected]                               LinkedIn:
> > www.linkedin.com/in/heingo
> > +358 40 569 7354                                 Twitter:
> > twitter.com/h_ingo
> >
>

Re: Bringing e-divisive into Otava repo, take 2

Reply via email to