On 5 August 2016 at 18:34, James Greenhalgh <james.greenha...@arm.com> wrote:
> I've given the 2012-2015 numbers below, just to show that (for the files > in gcc/*.[ch]) your hypothesis doesn't hold. The vast majority of > committers make <20 commits in a year. My hypothesis is that fewer people are increasingly doing most of the work. If the vast majority of committers do very few commits, that doesn't disprove (neither necessarily prove) my hypothesis. Either very few commiters do most of the commits, which is consistent with my hypothesis, or most of the commits are done by a very large number of committers who individually do very few commits, which contradicts my hypothesis. Either fact is consistent with your numbers. However, your numbers do not seem to indicate any sudden changes in trends, which argues against my hypothesis of "increasingly". Yet, I would argue that the trend not changing despite the continuous increase in code is itself worrying. But yes, the "increasingly" is definitely the weakest part of my hypothesis. >> * 100 commits is less than 2%. Quite a low threshold. Perhaps 1%, 25%, >> 50%, 75%, 90% are more informative. > > Again, just done for time. I've changed the last two buckets to 100-199 > and 200+ in this run. If you'd like to do, I'd be happy to see the > results. 200 is around 10% of commits. Thus, 2 people do at least 20% of commits. >> that is, most of the commits are done by smaller fraction of the >> total. > > For 2015 I found the 4 "25%" marks to be: > > 26% 1-4 > 25% 5-13 > 25% 14-39 > 23% 40+ > So 75% of the work is being done by people who commit fewer than 40 > patches in a year. Encouragingly 50% of the people who committed in > 2015 committed at least one patch per month (on average). Sorry, what is each column? If 26% of people commit between 1-4, this does not mean that 26% of the work is done by people who commit between 1-4 patches. > Personally, I think that looks like a fairly stable and healthy community, > but you're welcome to draw your own conclusions from the data. It looks more stable than what I would have expected. I'm not totally convinced about the thresholds, though. I believe the best way to measure is more similar to how openhub does: Sort committers by number of commits, then calculate their respective percentage w.r.t. to the total, and summarize at given thresholds the cumulative percentage, then print "%total commits" "%commiters". Do the same for every year. Exclude Ada, Fortran, and Go if possible. It would be interesting if someone could generate those numbers. If you get me the raw numbers of commits per committer per year, I can easily generate the stats and even nice plots if you wish. I'd be happy to see my hypothesis disproved. Cheers, Manuel.