I follow the thought, and seems reasonable.  Having the external
dependency seems to limit options, and if unlikely to get a donation
creating rest of components seems reasonable.




Maybe related:  Ultimately, looking forward to Otava being deployed to a
usable/current version of python [ to one that is at least currently
supported/getting-updates ].

On Mon, Sep 22, 2025 at 11:37 AM Henrik Ingo <[email protected]> wrote:

> Ok so let me restart this discussion...
>
> After a successful first release as an ASF incubating project, we started
> discussing what to do with the main dependency, the
> signal_processing_algorithms repo. Driving motivation here is that it is
> rather central to what Otava is doing, and long term it will be better for
> development if we can easily make changes in both halves.
>
> The guidance from our mentors was that the repo is too large to just copy
> paste into Otava (2600 lines). For such additions, ASF usually prefers to
> receive a copyright transfer/donation in writing from the original author /
> copyright holder.
>
> So the guidance was that someone from the Otava IPMC (me) should contact
> MongoDB to find out whether they would be open to such a transfer.For
> context, we were already in contact with MongoDB when we drafted the
> project proposal a year ago. While they were mostly enthusiastic, in
> hindsight it seems formally joining Apache Otava (incubating) wasn't a
> priority so that it would have actually happened. So chances are, the same
> dialogue would play out again: general excitement, but a high risk that the
> legal department has other priorities, and in the end we just wasted time
> on talking instead of programming.
>
>
>
> So, I looked deeper into what we have in front of us, and have discussed
> this off-list with Alex.
>
> While all of the codebase in the signal_processing_algorithms is indeed
> over 2k lines, most of that is code we don't use in Otava, or don't need.
> Also, the number is inflated, because the repo contains multiple different
> implementations, all doing exactly the same thing,
>
>
>
>
>
>
>                       in particular:
>
>
> * Piotr already replaced the significance test (which is like the latter
> half of what e-divisive does) with a student t test.
> * Also for the main part of the algorithm, Piotr introduced the windowing
> approach, which is novel and not in the MongoDB code
>
> While Piotr's implementation kind of wraps around the original MongoDB
> e-divisive implementation, it could have been done more elegantly and
> efficiently if it was modifying the e-divisive code directly. So that's
> where we get into the discussion about why don't we just do that then.
>
>
> * Finally, in Otava we have my optimziation from last year, the incremental
> e-divisive implementation, which is also novel and MongoDB code has nothing
> like that. However, it still uses the very core part of the e-divisive
> algorithm, so from a "code coverage" perspective, it no longer reduces the
> amount of lines that we depend on in the signal processing repo.
>
> When all the above is accounted for, there's about 100 lines of code that
> executes the very heart of e-divisive: pairwise comparison of the data
> points in a time series. This could be rewritten by someone just by
> implementing line by line the math from the Matteson & James (2013) paper
> (formulas 5 and 6). Given that Piotr's and my work also optimizes the
> amount of needed computation a lot, for a first version we don't need to
> implement this in C, nor use fancy numpy functions, it could just be the
> double for loop that you get when implementing the \sum ... \sum (xi-xj)^2
> from the paper.
>
>
> There aren't a lot of drawbacks with this idea. Ralistically we drop
> support for the --orig-edivisive mode, as that by definition depends on the
> original signal_proccessing code.
>
>
>
> Let me know what you think
> henrik
>
>
>
>
>
> --
> *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance*
>
> Henrik Ingo, CEO
> [email protected]                               LinkedIn:
> www.linkedin.com/in/heingo
> +358 40 569 7354                                 Twitter:
> twitter.com/h_ingo
>

Reply via email to