Thanks for kick-starting this thread, Henrik! I do agree that reimplementing missing parts within the project gives us a clear way to upgrade supported Python versions and overall evolve the project without complex cross-project dependencies.
Best, Alex On Tue, Sep 23, 2025 at 5:58 PM Austin Bennett <[email protected]> wrote: > I follow the thought, and seems reasonable. Having the external > dependency seems to limit options, and if unlikely to get a donation > creating rest of components seems reasonable. > > > > > Maybe related: Ultimately, looking forward to Otava being deployed to a > usable/current version of python [ to one that is at least currently > supported/getting-updates ]. > > On Mon, Sep 22, 2025 at 11:37 AM Henrik Ingo <[email protected]> wrote: > > > Ok so let me restart this discussion... > > > > After a successful first release as an ASF incubating project, we started > > discussing what to do with the main dependency, the > > signal_processing_algorithms repo. Driving motivation here is that it is > > rather central to what Otava is doing, and long term it will be better > for > > development if we can easily make changes in both halves. > > > > The guidance from our mentors was that the repo is too large to just copy > > paste into Otava (2600 lines). For such additions, ASF usually prefers to > > receive a copyright transfer/donation in writing from the original > author / > > copyright holder. > > > > So the guidance was that someone from the Otava IPMC (me) should contact > > MongoDB to find out whether they would be open to such a transfer.For > > context, we were already in contact with MongoDB when we drafted the > > project proposal a year ago. While they were mostly enthusiastic, in > > hindsight it seems formally joining Apache Otava (incubating) wasn't a > > priority so that it would have actually happened. So chances are, the > same > > dialogue would play out again: general excitement, but a high risk that > the > > legal department has other priorities, and in the end we just wasted time > > on talking instead of programming. > > > > > > > > So, I looked deeper into what we have in front of us, and have discussed > > this off-list with Alex. > > > > While all of the codebase in the signal_processing_algorithms is indeed > > over 2k lines, most of that is code we don't use in Otava, or don't need. > > Also, the number is inflated, because the repo contains multiple > different > > implementations, all doing exactly the same thing, > > > > > > > > > > > > > > in particular: > > > > > > * Piotr already replaced the significance test (which is like the latter > > half of what e-divisive does) with a student t test. > > * Also for the main part of the algorithm, Piotr introduced the windowing > > approach, which is novel and not in the MongoDB code > > > > While Piotr's implementation kind of wraps around the original MongoDB > > e-divisive implementation, it could have been done more elegantly and > > efficiently if it was modifying the e-divisive code directly. So that's > > where we get into the discussion about why don't we just do that then. > > > > > > * Finally, in Otava we have my optimziation from last year, the > incremental > > e-divisive implementation, which is also novel and MongoDB code has > nothing > > like that. However, it still uses the very core part of the e-divisive > > algorithm, so from a "code coverage" perspective, it no longer reduces > the > > amount of lines that we depend on in the signal processing repo. > > > > When all the above is accounted for, there's about 100 lines of code that > > executes the very heart of e-divisive: pairwise comparison of the data > > points in a time series. This could be rewritten by someone just by > > implementing line by line the math from the Matteson & James (2013) paper > > (formulas 5 and 6). Given that Piotr's and my work also optimizes the > > amount of needed computation a lot, for a first version we don't need to > > implement this in C, nor use fancy numpy functions, it could just be the > > double for loop that you get when implementing the \sum ... \sum > (xi-xj)^2 > > from the paper. > > > > > > There aren't a lot of drawbacks with this idea. Ralistically we drop > > support for the --orig-edivisive mode, as that by definition depends on > the > > original signal_proccessing code. > > > > > > > > Let me know what you think > > henrik > > > > > > > > > > > > -- > > *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance* > > > > Henrik Ingo, CEO > > [email protected] LinkedIn: > > www.linkedin.com/in/heingo > > +358 40 569 7354 Twitter: > > twitter.com/h_ingo > > >
