you mean this timeline, right?: http://speed.pypy.org/timeline/?ben=spectral-norm
Because the December 22 result is so high, the yaxis maximum goes up to 2.5, thus having less space for the more interesting < 1 range, right? Regarding mozilla, do you mean this site?: http://arewefastyet.com/ I can see their timelines have some holes, probably failed runs... I see a problem with the approach you suggest. Entering an arbitrary maximum yaxis number is not a good thing. I think the onus is there on the benchmark infrastructure to not send results that aren't statistically significant. See Javastats (http://www.elis.ugent.be/en/JavaStats), or ReBench (https://github.com/smarr/ReBench). Something that can be done on the Codespeed side is to treat differently points that have a too high stddev. In the aforementioned spectral-norm timeline, the stddev "floor" is around 0.0050, while the spike has a 0.30 stddev, much higher. A "strict" mode could be implemented that invalidates or hides statistically unsound data. Btw., I had written to the arewefastyet guys about the possibility of configuring a Codespeed instance for them. We may yet see collaboration there ;-) Miquel 2011/3/8 Maciej Fijalkowski <[email protected]>: > On Tue, Mar 8, 2011 at 8:14 AM, Laura Creighton <[email protected]> wrote: >> In a message of Tue, 08 Mar 2011 09:10:32 +0100, Miquel Torres writes: >>>Hi, >>> >>>I finished the changes to the speed.pypy.org home page last night, but >>>alas!, I didn't have time to deploy. I will do it later today and will >>>then ping you back. >>> >>>The extra info provided is really nice as an overview, you will see ;-) >>> >>> >> >> Ah good. Thank you very much. We spent yesterday afternoon with >> the Mozilla engineers, and I got to talk to the person who maintains >> the benchmarks for tracemonkey. He had timelines very much like ours. >> There is one feature he has that I would like to have. Take a look >> at the timeline for spectral.norm. There are two spikes there. >> Mozilla has lines like that too, though mostly it is because their >> jit decides that the whole benchmark is bogus and optimises out all the >> code. So it takes 0 time. oops. >> >> At any rate, aside from knowing that something went horribly wrong with >> that rev, you don't really need to know how wrong. And by making the >> graph display up to that point means that the dots where things really >> do matter get crammed closer together than would otherwise be the case. >> So he had a mode where things wehre displayed with an arbitrary value >> at the bottom (in our coase it would be the top) which he could specify. >> Then the graph would be replotted, with the outliers off the graph, but >> making it easier to read the dots for the more normal cases. >> >> Any chance we could do that too? > > Link maybe? > >> >> Laura >> _______________________________________________ >> [email protected] >> http://codespeak.net/mailman/listinfo/pypy-dev >> > _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
