Dear Barry, As far as I understand, you're telling us that having a bit of data mining does not harm whatever the data. Your example of pop music charts might support your point (although my ears disagree ...) but I think it is bad policy to indulge in white-noise analysis without a well-reasoned motive to do so. It might give bad ideas to potential "statistics patrons" (think a bit about the sorry state of financial markets :-().
More generally, I tend to be extremely wary about over-interpretation of belly grumbles as the Voice of the Spirit ... which is a very powerful urge of many statisticians and statistician's clients. Data mining can be fine for exploratory musings, but a serious study needs a model, i. e. a set of ideas and a way to reality-stress them. As far as I can see (but I might be nearsighted), I see no model linking package download to package use(s). Data may or may not become available with more or less of an effort, but I can't see the point. Emmanuel Charpentier Le dimanche 08 mars 2009 à 16:08 +0000, Barry Rowlingson a écrit : > > I think the situation is worse than messy. If a client comes in with data > > that doesn't address the question they're interested in, I think they are > > better served to be told that, than to be given an answer that is not > > actually valid. They should also be told how to design a study that > > actually does address their question. > > > > You (and others) have mentioned Google Analytics as a possible way to > > address the quality of data; that's helpful. But analyzing bad data will > > just give bad conclusions. > > As long as we say 'package Foo is the most downloaded package on > CRAN', and not 'package Foo is the most used package for R', we can > leave it to the user to decide if the latter conclusion follows from > the former. In the absence of actual usage data I would think it a > good approximation. Not that I would risk my life on it. > > Pop music charts are now based on download counts, but I wouldn't > believe they represent the songs that are listened to the most times. > Nor would I go so far as to believe they represent the quality of the > songs... > > Should R have a 'Would you like to tell CRAN every time you do > library(foo) so we can do usage counts (no personal data is > transmitted blah blah) ?'? I don't think so.... > > Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.