CRAN size has grown almost exponentially at least since 2001. R history was discussed by John Fox (2009) Aspects of the Social Organization and Trajectory of the R Project, R Journal (http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf). Below please find his data plus 5 additional points I added and R script I used to fit a models.

I won't defend the models fit in the script below. However, unless CRAN management changes dramatically in the next 5 years, it seems likely that CRAN will have 10,000 packages some time in 2018.


By the way, if you don't already use the sos package routinely, I encourage you to consider it. For me, it's by far the fastest literature search for anything statistical. In a very few minutes, I get an Excel file with a summary by package of the matches to almost any combination of search terms. (Shameless plug by the lead author of the package ;-)


      Best Wishes,
      Spencer Graves


date    packages
2001-06-21    110
2001-12-17    129
2002-06-12    162
2003-05-27    219
2003-11-16    273
2004-06-05    357
2004-10-12    406
2005-06-18    548
2005-12-16    647
2006-05-31    739
2006-12-12    911
2007-04-12    1000
2007-11-16    1300
2008-03-18    1427
2008-10-18    1614
2009-09-17    1952
2012-06-12    3786
2012-11-01    4082
2012-12-14    4210
2013-10-28    4960
2013-11-08    5000

library(gdata)

(CRANfile <- dir(pattern='s\\.xls$'))
#readLines(CRANfile)
str(CRANhist. <- read.xls(CRANfile, stringsAsFactors=FALSE,
                           header=TRUE))
tail(CRANhist., 11)
CRANhist <- CRANhist.[1:20, 1:2]

(dt. <- as.Date(CRANhist$date))
CRANhist$date <- dt.

(day1 <- min(CRANhist$date)) # 2001-06-21
str(ddate <- CRANhist$date-day1)
# difftime in days

CRANhist$CRANdays <- as.numeric(ddate)
(growth <- lm(log(packages)~CRANdays, CRANhist))

CRANhist$pred <- exp(predict(growth))
plot(packages~date, CRANhist, log='y')
lines(pred~date, CRANhist, pch='.')

fitLogLogis <- nls(log(packages) ~ a+b*CRANdays + log(1+exp(d+b*CRANdays)),
                   CRANhist, start=c(a=4.9, b=0.0009, d=0))
# Error ... singular gradient

library(drc)
CRANlogLogis <- drm(packages~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis, log='y') # very poor through 2005

CRANlogLogis. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.3())
plot(CRANlogLogis., log='y') # terrible:  far worse than CRANlogLogis

CRANlogLogis4 <- drm(packages~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4, log='y') # poor for 2001 but great otherwise

CRANlogLogis4. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.
abline(h=c(4200, 8400))

sapply(CRANhist, range)
pred.dTimes <- seq(0, 6000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
data.frame(Date=as.Date(day1+pred.dTimes), nPkgs=exp(CRANpred))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)

pred.dTimes <- seq(0, 10000, 100)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))

plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)
abline(h=c(4200, 8400))
abline(v=as.Date('2012-12-14'))
abline(v=as.Date('2017-09-30'))

#########################

abline(h=20000)
abline(h=70000)

pred.dTimes <- seq(0, 1000000, 10000)
CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes))
plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y')
points(packages~date, CRANhist)


On 11/8/2013 4:43 PM, William Dunlap wrote:
"Currently, the CRAN package repository features 5001 available packages."

Going from 4000 to 5000 packages took 14.5 months - that's one new package
every 10.5 hours. Behind every package there are real people. These
user-contributed packages are maintained by ~2900 people [2] - that's 350
new maintainers and many more contributors. More people to thank than ever
before - don't forget about them, e.g. cite properly when publishing.
Congratulations!

I have often wondered about the natural history of R packages: how often they
are created and shared, how long they are used, how many people use them,
how long they are maintained, etc.  The usage numbers are hard to get, but the
"Last modified" dates in the CRAN archives do give some information on how
often new packages are shared and how long they are maintained.

Here are some summaries of derived from those dates.  The code to get the
data and calculate (and plot) the summaries follows.

newPkgsByYear
1997-01-01 1998-01-01 1999-01-01 2000-01-01
          2         12         56         41
2001-01-01 2002-01-01 2003-01-01 2004-01-01
         65         66        101        144
2005-01-01 2006-01-01 2007-01-01 2008-01-01
        209        280        329        374
2009-01-01 2010-01-01 2011-01-01 2012-01-01
        502        546        702        809
2013-01-01
        439
table(nUpdatesSinceSep2011) # number of recent updates (not including original 
submission)
nUpdatesSinceSep2011
    0    1    2    3    4    5    6    7    8    9
2079  963  528  332  238  166   75   79   50   43
   10   11   12   13   14   15   16   17   18   19
   23   22   13   14    8    9   12    5    4    1
   20   21   22   24   26   27   31   32   34
    1    3    1    2    1    2    1    1    1

The code I used is:

library(XML)
getArchiveList <- function(site = 
"http://cran.r-project.org/src/contrib/Archive/";) {
     retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
     retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
     retval$Name <- gsub("/$", "", retval$Name)
     retval$"Last modified" <- as.Date(retval$"Last modified", 
format="%d-%b-%Y")
     retval
}
getArchiveEntry <- function(Name, site = 
"http://cran.r-project.org/src/contrib/Archive/";) {
     retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]]
     retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ]
     retval$"Last modified" <- as.Date(retval$"Last modified", 
format="%d-%b-%Y")
     retval
}

al <- getArchiveList()
# The next may bog down the CRAN archive server - do not do it often
# ae <- lapply(structure(al$Name, names=al$Name),
#              function(Name)tryCatch(getArchiveEntry(Name),
#                                     error=function(e)data.frame(Name=character(), 
"Last Modified" = as.Date(character()))))

initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), 
origin=as.Date("1970-01-01"))
lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), 
origin=as.Date("1970-01-01"))

mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")

par(ask=TRUE)

newPkgsByMonth <-  table(cut(initialSubmissionDate, mths))
newPkgsByYear <-  table(cut(initialSubmissionDate, yrs))
plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by 
month") # number of additions each month

yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, 
units="days")/365.25
hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration")

newPkgsByYear
table(floor(yearsOfMaintainanceActivity))

nUpdatesSinceSep2011 <- vapply(ae, function(e){
     Lm <- e[["Last modified"]]
     sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
table(nUpdatesSinceSep2011) # number of recent updates (not including original 
submission)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-----Original Message-----
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
Of Henrik Bengtsson
Sent: Friday, November 08, 2013 1:59 PM
To: R Development Mailing List
Subject: [Rd] Milestone: 5000 packages on CRAN

Here we go again...

Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]:

"Currently, the CRAN package repository features 5001 available packages."

Going from 4000 to 5000 packages took 14.5 months - that's one new package
every 10.5 hours. Behind every package there are real people. These
user-contributed packages are maintained by ~2900 people [2] - that's 350
new maintainers and many more contributors. More people to thank than ever
before - don't forget about them, e.g. cite properly when publishing.

Milestones:

2013-11-08: 5000 packages [this post]
2012-08-23: 4000 packages [7]
2011-05-12: 3000 packages [6]
2009-10-04: 2000 packages [5]
2007-04-12: 1000 packages [4]
2004-10-01: 500 packages [3,4]
2003-04-01: 250 packages [3,4]

[1] http://cran.r-project.org/web/packages/
[2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
[3] Private data.
[4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html

/Henrik

PS. These data are for CRAN only. There are more packages elsewhere, e.g.
R-Forge, Bioconductor, Github etc.

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to