On Tue, 9 Dec 2008, tyler wrote:

Hi,

I'm analyzing a large number of large simulation datasets, and I've
isolated one of the bottlenecks. Any help in speeding it up would be
appreciated.


Cast the neighborhoods as an indicator matrix, then use matrix multiplications:

 system.time(tmp <- time.test(dat))
   user  system elapsed
   1.18    0.00    1.20
system.time({
+ mn <- with(dat,outer(1:25,1:25, function(i,j) abs(X[i]-X[j])<2 & abs(Y[i]-Y[j])<2 
& i!=j ))
+ print(all.equal(rowSums(mn%*%as.matrix(dat[,-(1:2)])>0),tmp))})
[1] TRUE
   user  system elapsed
      0       0       0


HTH,

Chuck




`dat` is a dataframe of samples from a regular grid. The first two
columns are the spatial coordinates of the samples, the remaining 20
columns are the abundances of species in each cell. I need to calculate
the species richness in adjacent cells for each cell in the sample.
For example, if I have nine cells in my dataframe (X = 1:3, Y = 1:3):

 a b c
 d e f
 g h i

I need to calculate the neighbour-richness for each cell; for a, this is
the richness of cells b, d and e combined. The neighbour richness of
cell e would be the combined richness of all the other eight cells.

The following code does what I what, but it's slow. The sample dataset
'dat', below, represents a 5x5 grid, 25 samples. It takes about 1.5
seconds on my computer. The largest samples I am working with have a 51
x 51 grid (2601 cells) and take 4.5 minutes. This is manageable, but
since I have potentially hundreds of these analyses to run, trimming
that down would be very helpful.

After loading the function and the data, the call

 system.time(tmp <- time.test(dat))

Will run the code. Note that I've excised this from a larger, more
general function, after determining that for large datasets this section
is responsible for a slowdown from 10-12 seconds to ca. 250 seconds.

Thanks for your patience,

Tyler


time.test <- function(dat) {

 cen <- dat
 grps <- 5
 n.rich <- numeric(grps^2)
 n.ind <- 1

 for(i in 1:grps)
   for (j in 1:grps) {
     n.cen <- numeric(ncol(cen) - 2)
     neighbours <- expand.grid((j-1):(j+1), (i-1):(i+1))
     neighbours <- neighbours[-5,]
     neighbours <- neighbours[which(neighbours[,1] %in% 1:grps &
                                    neighbours[,2] %in% 1:grps),]

     for (k in 1:nrow(neighbours))
       n.cen <- n.cen + cen[cen$X == neighbours[k,1] &
                            cen$Y == neighbours[k,2], -c(1:2)]

     n.rich[n.ind] <- sum(as.logical(n.cen))
     n.ind <- n.ind + 1
   }

   return(n.rich)
}

`dat` <- structure(list(
 X = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1,
 2, 3, 4, 5), Y = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4,
 4, 4, 4, 4, 5, 5, 5, 5, 5), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 45L, 131L, 0L, 0L, 34L,
 481L, 1744L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 1L, 88L, 0L, 70L, 101L, 13L, 634L, 0L, 0L, 71L, 640L, 1636L), V3
 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 49L, 3L, 113L,
 1L, 44L, 167L, 336L, 933L, 0L, 14L, 388L, 1180L, 1709L), V4 = c(0L,
 0L, 0L, 0L, 0L, 0L, 3L, 12L, 0L, 0L, 2L, 1L, 36L, 45L, 208L, 7L,
 221L, 213L, 371L, 1440L, 26L, 211L, 389L, 1382L, 1614L), V5 = c(96L,
 7L, 0L, 0L, 0L, 10L, 17L, 0L, 5L, 0L, 0L, 11L, 151L, 127L, 160L,
 27L, 388L, 439L, 1117L, 1571L, 81L, 598L, 1107L, 1402L, 891L), V6 =
 c(16L, 30L, 13L, 0L, 0L, 10L, 195L, 60L, 29L, 29L, 1L, 107L, 698L,
 596L, 655L, 227L, 287L, 677L, 1477L, 1336L, 425L, 873L, 961L, 1360L,
 1175L), V7 = c(249L, 101L, 69L, 0L, 18L, 186L, 331L, 291L, 259L,
 248L, 336L, 404L, 642L, 632L, 775L, 455L, 801L, 697L, 1063L, 978L,
 626L, 686L, 1204L, 1138L, 627L), V8 = c(300L, 163L, 65L, 145L, 377L,
 257L, 690L, 655L, 420L, 288L, 346L, 461L, 1276L, 897L, 633L, 812L,
 1018L, 1337L, 1295L, 1163L, 550L, 1104L, 768L, 933L, 433L), V9 =
 c(555L, 478L, 374L, 349L, 357L, 360L, 905L, 954L, 552L, 438L, 703L,
 984L, 1616L, 1732L, 1234L, 1213L, 1518L, 1746L, 1191L, 967L, 1394L,
 1722L, 1706L, 610L, 169L), V10 = c(1527L, 1019L, 926L, 401L, 830L,
 833L, 931L, 816L, 1126L, 1232L, 1067L, 1169L, 1270L, 1277L, 1145L,
 1159L, 1072L, 1534L, 997L, 391L, 1328L, 1414L, 1037L, 444L, 1L), V11
 = c(1468L, 1329L, 1013L, 603L, 1096L, 1237L, 1488L, 1189L, 1064L,
 1303L, 1258L, 1479L, 1421L, 1365L, 1101L, 1415L, 1145L, 1329L,
 1325L, 236L, 1379L, 1199L, 729L, 328L, 0L), V12 = c(983L, 1459L,
 791L, 898L, 911L, 1215L, 1528L, 960L, 1172L, 1286L, 1358L, 722L,
 857L, 1478L, 1452L, 1502L, 1013L, 745L, 455L, 149L, 1686L, 917L,
 1013L, 84L, 0L), V13 = c(1326L, 1336L, 1110L, 1737L, 1062L, 1578L,
 1382L, 1537L, 1366L, 1308L, 1301L, 1357L, 746L, 622L, 934L, 1132L,
 954L, 460L, 270L, 65L, 957L, 699L, 521L, 18L, 1L), V14 = c(1047L,
 1315L, 1506L, 1562L, 1254L, 1336L, 1106L, 1213L, 1220L, 1457L, 858L,
 1606L, 590L, 726L, 598L, 945L, 732L, 258L, 45L, 6L, 937L, 436L, 43L,
 0L, 0L), V15 = c(845L, 935L, 1295L, 1077L, 1400L, 1049L, 802L,
 1247L, 1449L, 1046L, 1134L, 877L, 327L, 352L, 470L, 564L, 461L,
 166L, 0L, 0L, 230L, 110L, 29L, 0L, 0L), V16 = c(784L, 675L, 1157L,
 1488L, 1511L, 1004L, 420L, 523L, 733L, 724L, 833L, 542L, 171L, 116L,
 384L, 357L, 197L, 0L, 0L, 0L, 246L, 0L, 0L, 0L, 0L), V17 = c(444L,
 873L, 530L, 596L, 448L, 431L, 109L, 446L, 378L, 243L, 284L, 148L,
 69L, 30L, 6L, 71L, 32L, 131L, 0L, 0L, 120L, 0L, 0L, 0L, 0L), V18 =
 c(307L, 128L, 823L, 566L, 383L, 454L, 62L, 90L, 99L, 261L, 351L,
 94L, 81L, 0L, 37L, 113L, 28L, 0L, 0L, 0L, 15L, 17L, 0L, 0L, 0L), V19
 = c(53L, 138L, 207L, 451L, 282L, 40L, 31L, 7L, 128L, 137L, 166L,
 38L, 0L, 1L, 0L, 0L, 19L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V20 =
 c(0L, 14L, 121L, 127L, 71L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 7L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("X", "Y", "V1",
 "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
 "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"), row.names =
 c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 15L, 16L, 17L,
 19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L), class =
 "data.frame")

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]                  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to