Re: Otava e-divisive algorithm documentation

Henrik Ingo Wed, 11 Feb 2026 11:08:37 -0800

Hi Dimitar!

In Otava user-level, it is indeed the case that given a series of N
commits, indexed as [0, n-1] it is possible to find/observe a change point
at commits [1, n-1] but not 0. This follows from the use case: if there is
a change, that is a regression, between k and k+1, then it is the patch at
k+1 that has caused the regression/change and it is that patch that should
be analyzed and fixed.


More philosophically, the first point at index 0, or a single point, can
not be a change point, because there is no history or larger population
that a change could be relative to. In practice also two points cannot
cause a change point, since there isn't enough information to make such a
conclusion, but if there are N>2 points, then the point  at index=1 can be
a change point.

I realize now when writing this that all of the above is based on an
assumption that the change happens in relation to each points history, from
left to right so to speak.

...now, back to our particular implementation... When Denis wrote and I
reviewed the new implementation, we agreed that if we keep the indexes
fixed, then that would in some respects be clearer, but  otoh in these
lower math operations, the first row or column would always be all zeros.
So we agreed to drop the first element / column and work with indexes that
are shifted -1. But by the time that the algorithm returns a list of change
points to the user, they should be aligned with the above thinking: change
points are possible at indexes [1,n-1].

henrik

On Wed, Feb 11, 2026 at 2:23 PM Dimitar Dimitrov <[email protected]>
wrote:

> Hey Denis,
>
> I'm slowly getting myself acquainted with Otava and found your doc
> extremely helpful and very well written!
>
> I'll post here some of my reading notes, slightly reworded to read like
> comments in case they might be useful to you.
>
> >
>
> https://github.com/apache/otava/pull/126/files#diff-b38fe588c0481132e3185c6c184fcb9a1208b32ae5a03dd92b12df4b0dfa46eeR23
>
> Is there a practical reason to allow the first element in the data series
> to be a change point on its own, but not the last? If there is, it might be
> nice to highlight it, as it's pretty subtle just looking at the paragraph
> here.
>
> > This process yields a series of change points $$0 < \hat{\tau}_1 <
> \hat{\tau}_2 < \cdots < \hat{\tau}_k < T$$.
>
> nit:  $$1 \leq \hat{\tau}_1 will be consistent with the initial definition
> (for me it's also less confusing, as the sequence here is not
> zero-indexed).
>
> >
>
> https://github.com/apache/otava/pull/126/files#diff-b38fe588c0481132e3185c6c184fcb9a1208b32ae5a03dd92b12df4b0dfa46eeR32
> and
> > and the coefficient in front of the second and third terms in
> $$\hat{\mathcal{E}}$$ are binomial coefficients.
>
> Aren't the coefficients in front of these terms reciprocals of binomial
> coefficients?
>
> > Non-deterministicity of the results due to the permutation significance
> test
>
> "Result non-determinism" vs "Non-deterministicity of the results"?
>
> > Starting with a zero-indexed time series
>
> Is this fundamentally needed? The shift from the one-indexed concepts above
> might be slightly confusing and e.g. invalidates the boundaries for \tau
> and \kappa which we defined before and are using here without redefining.
> Maybe we should just redefine the boundaries again, alongside with s (even
> if it might seem obvious what it should be).
>
> >
>
> https://github.com/apache/otava/pull/126/files#diff-b38fe588c0481132e3185c6c184fcb9a1208b32ae5a03dd92b12df4b0dfa46eeR73
> and
> >
>
> https://github.com/apache/otava/pull/126/files#diff-b38fe588c0481132e3185c6c184fcb9a1208b32ae5a03dd92b12df4b0dfa46eeR75
>
> Should we also define base values for B(s, s + 1, \kappa) and C(s, \kappa -
> 1, \kappa) to avoid dividing by zero for one-element sequences? This also
> links to some extent with my confusion about having one-element sequences
> and allowing the first, but not last element to be a change point, as well
> as to Henrik's comment on the PR that the first element in the example
> image shouldn't qualify as a change point.
>
> Regards,
> Dimitar
>
> On Tue, 10 Feb 2026 at 06:13, Alexander Sorokoumov <
> [email protected]> wrote:
>
> > Thank you for this contribution, Denis! I've responded to the PR with a
> few
> > suggestions. It is an awesome improvement to our documentation that will
> > help with explaining how Otava works. After the PR is merged, I will also
> > publish it to the website.
> >
> > Best,
> > Alex
> >
> > On Sun, Feb 8, 2026 at 11:54 PM Denis Shchepakin <
> > [email protected]>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > Created a PR with an E-divisive algorithm doc as per issue here:
> > > Description
> > > of Underlining Statistic Methods Used · Issue #100 · apache/otava
> > > <https://github.com/apache/otava/issues/100>. Any feedback is welcome.
> > >
> > > Best,
> > > Denis
> > >
> >
>


-- 
*nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance*

Henrik Ingo, CEO
[email protected]                               LinkedIn:
www.linkedin.com/in/heingo
+358 40 569 7354                                 Twitter: twitter.com/h_ingo

Re: Otava e-divisive algorithm documentation

Reply via email to