Oh, I realize now I never sent this: Let me now follow up with a first attempt at defining a standard schema for the test result data and the change points.
I will suggest both a tabular/csv format, as well as a json format. They are different syntax to express the exact same thing. Generally a timeseries table or database, has 3 kinds of columns: The first column is a monotonically askending timestamp. The next columns are often called labels, or traditionally dimensions. and last but not least, the values aka metrics. In a table, these are even easy to recognize from their different data types: - a date & time value, in the native format for such a value (datetime.datetime in python) - the labels are essentially strings. They could also be ints, but for simplicity, let's assume strings for now. - values, let's assume they are all float - a final column called "extra_info", with an optional json payload This yields: timestamp, attr1, attr2, metric1, metric2.... ,extra_info the time column is populated with timestamps of *when the commit in question was appended to the given branch* Knowing the value of the timestamp to use is surprisngly difficult. The timestamp as shown by `git log` is NOT it. In a project that uses merge commits to main, then `git log --merges` is what you want. If otoh a project uses squash+rebase, then `git log --no-merges` is right. At MongoDB the CI included a timestamp service for this purpose. It would tag all commits with a timestamp of its own, plus an order number (which I've not included in this spec) Even so, from a data point of view it is straightforard: The column contains monotonically increasing values and this defines the sort order of the data. label: sample_val ------------------------- The following labels (aka attributes) are mandatory: repo_url: https://otava/apache (Note that . and ~/myrepo and file:///foo/bar are possible values) git_commit: 123adcdef The following labels are optional, but implementations should use these column names if they are needed: git_branch : main test_name: tpcc run: 1 (aka attempt) In addition, a number of other labels can be used (explained below): version: "2.0" concurrency: "64" os: linux arch: x64 ... To be discussed: The more general term for git_branch is actually git_ref, ..or as w e use above, git_commit. In git a commit can really be referenced by (the head of) a branch, tag or sha. But in practice, I find that myself and other people think of the branch as the entire series, and tags and sha's as individual commits. Anyway, whether the field is called git_branch or git_ref, then it must be that git_commit is in the history of git_branch/git_ref. As for the numeric values... ...they are the results, aka measurements of the benchmark. While the basic format is straightwforward: timestamp,label1,label2,...,latency, throughput, length, weight datetime(2026,2,17,22,31,59),...,0.09,234,181,75 ... There's just one problem: Where do we encode things like unit of each column, and direction? and what units are permissible in the first place? 1. For e-divisive unit and direction are unnecessary. They are not needed to compute change points, hence they should be in the manager component. 2. They can be encoded into the column name: `v latency (ms)` `^ throughput (q/s)` 3. Extend csv to have multiple rows for header content: , , , v , ^ , ... , , ,ms,q/s,.. timestamp,label1,label2,latency,throughput... datetime( My opinion: 1. True but a bit harsh 2. Kinda works now that I see it spelled out 3. clean and structured but a bit hard to read actually Now, an important note about the labels: A common question is, which labels are needed to uniquely define a timeseries consisting of git commits and their test results for each commit? Obviously git_repo and git_commit, but what else? The truth is that this is subjective! For example if you tested with multiple different OSes, then you probably need to add an `os` column. So the set of labels will be different for each use case, but in every case we have a single series defined by a "primary key" (label1, label2, label3 ..., time) and then the metrics columns selected by that key are all the metrics that came out of the same test, at that git commit, and that run. Note that run is optional (and until now not used in Otava). When it's not used, a rerun of a test essentially ovewrites a previous record. If it is used, then multiple runs of a test at the same commit, are separated by an increasing integer in this column. (In addition, meta data about each build can be stuffed into extra_info, such as time of running the build/test, which is not at all the time that is in the timestamp column.) Same in JSON In JSON we are not limited by above restrictions, { attributes:{ git_repo:... test_name:... ... }, timestamps: [ .... ], git_commits: [ .... ], runs: [ ... ], metrics:{ latency: { unit: ms, direction: lower_is_better (or -1 or v ...TBD) metric_name: latency, values: [1.0, 2.1, 1.5,...] } througput: { unit: direction: metric_name: values: [...] } } } The above could be more formally defined in JSON schema. The final discussion item is what Python container to use to store the above logical schema. I'll leave that to another email. henrik On Mon, Feb 16, 2026 at 6:58 PM Henrik Ingo <[email protected]> wrote: > This discussion has popped up in a few tickets, so I thought I might just > start a new email thread to approach it in a top down manner. > > On versioning: > > - The 0.7 branch is an ASF packaging of Hunter/Otava the way we knew it > until we joined the ASF incubator program. It depends on the original > signal_processing module and supports python versions 3.8-3.10. It is > possible we will not make more releases from this branch, but if necessary, > we could release new 0.7.x versions. > > - The current master branch is heading toward a series of 0.8.x and > possibly 0.9.x releases. It already contains upgrades to support all python > versions up to 3.14 and the core algorithm is a fresh rewrite that > completely replaced the external dependency. > > - As such, I think it would be valuable to get out a 0.8.0 release > with those improvements. Since we want to rotate the release manager role > (a requirement in the incubator program) this is largely between myself and > Sean to do. For me it will be a couple weeks until I have bandwidth to do > it, but it is in my backlog unless Sean beats me to it. > > - In my thinking, the refactoring proposed in this email is the major > remaining piece of work, after which we would release the 1.0.0 (and very > likely then start discussing graduating the incubator program, though that > is out of scope for the current discussion). > > - API and all kinds of other breakage is allowed and encouraged > between 0.7.x and 1.0.0. > > So, pulling together a few discussions from the past 3-6 months... It > seems we could refactor Otava into the following modular components: > > > > > > () = interface / schema > * = new, doesn't currently exist > ---------------------- > | releases & packaging | > --------------------- > > ( standard data structure ) --- > - tabular/csv and json | o | > - one can be primary format | t | > - python container classes | a | > - replace Series, AnalyzedSeries | v | > ------------------- | a | > | integrations | | | > | - data storage | | m | > | - ingest/events* | | a | > | - notifications | | n | > -------------------- | a | > | g |-(* HTTP API) > -------------- | e | > | otava-cli | | r | > | - csv | | | > | - json* | | | > -------------- | | > | (otava-lib) | | | > | - edivisive | --- > | - multidim* | > | - incremental| > | - core params| > -------------- > > > If you start reading from the bottom left, we have the core e-divisive > implementation, including the incremental mode (which is an optimization > invented purely within Otava). By my count this is pretty much there, > except we seem to have never implemented the multivariate version of > e-divisive, which should provide some marginal improvement in the > algorithm's accuracy. > > The core library does not depend on any database or other external system. > Thus it has a limited set of configuration parameters: > - From e-divisive implementation: p-value, and alpha. > (The latter we've essentially set to 1 in the previous implementation, > but it can have a float value 0 < alpha < 2. My understanding is that it > affects the "geometry" of the distance between points - analogous to how > the least square line fitting algorithm uses a power of two to emphasize > the weight of large distances/differences. So I'm guessing here a large > alpha would make the algorithm more sensitive to large change points and a > small alpha would give more weight to small differences?) > - window_size, if we decide to keep the window approach > - minimum threshold - even if it's not actually an parameter to the > algorithm itself > > - Note that it would be nice to support also an option to test against > different implementations and versions of e-divisive, such as currently we > have --orig-edivisive option. But this becomes impractical since we cannot > copy the old signal_processing dependency into the project, and otoh it > will not work on modern python versions. So in practice if someone wants to > compare different versions of MongoDB e-divisive, Datastax Hunter and ASF > Otava, then you would from now on have to actually install multiple > versions and run them separately. > > > The otava-cli is then a minimalist command line executable that exposes > via ConfigArgParse the options available in otava-lib. Note that this is a > subset of what is currently available in otava and a subset of the > functionality exposed in otava manager component. The operations on this > level are just to feed data in, find change points, get results out. For > example the proposal in one ticket to allow using different p-value for > different tests or metrics is not relevant of this level: you would in any > case just run otava-cli two separate times, and use whatever p-value you > want each time. > > Also, in otava-cli it would be the case that ConfigArgParse is the entire > space of configuration options. Anything too complicated to express via > ConfigArgParse, would by definition move into the larger modules. > > The integrations module, which could actually be 3 separate modules, is > responsible for connectivity to data stores, but also notifications. I > believe currently in Otava we have the code for slack notifications? In > Nyrkiö we also implemented Github notifications which is quite cool. There > has been also email notifications but I don't think that code is actually > in Otava at all? (In practice GitHub or hypothetical Jira integration will > indirectly cause emails to arrive in your inbox...) > > Main work item here is to design a plugin mechanism so that Otava doesn't > need to depend on every database and issue tracker in the world. > > Second, I've observed that the level of code re-use in the current data > store immporters is poor. So just a basic refactoring into better object > oriented hierarchy should make a big difference here. > > Final point: Currently the data store is typically the entry point for > test results coming into Otava for analysis = finding change points. There > could be a need for some kind of event or streaming API where new results > arrive. This is especially meaningful if you wanted to use the incremental > e-divisive - you need to know which is the old data an d which is the new > point(s). > > Finally, on top of all of this would be a more comprehensive user > interface, which binds all of these together. This is where you manage > passwords to the database and issue tracker, where you can define which > test names and metrics have which parameters, where their data is stored > and what SQL is used to fetch it. > > Also, this layer could grow into some kind of service/daemon if we feel > that is a useful direction. Currently it is just CLI, so in a way this is > just an extension of the otava-cli component, but I feel there's still a > clear separation between the core otava-cli focused on the math and the > larger manager component focusing on the messy real world realities. > > > Finally, I think we should define and document the data structure and > field names we use for this data. For example, each data store could then > use a compliant schema and column names. But this would also benefit > downstream users, like Nyrkiö, when we could directly use and extend a well > known data structure. I will return to this one in a separate message > another day, > > henrik > > > > > -- > *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance* > > Henrik Ingo, CEO > [email protected] LinkedIn: > www.linkedin.com/in/heingo > +358 40 569 7354 Twitter: > twitter.com/h_ingo > -- *nyrkio.com <http://nyrkio.com/>* ~ *git blame for performance* Henrik Ingo, CEO [email protected] LinkedIn: www.linkedin.com/in/heingo +358 40 569 7354 Twitter: twitter.com/h_ingo
