Re: [R] How can I eliminate a loop over a data.table?

Blaser Nello Tue, 19 Mar 2013 00:54:52 -0700

It seems like this is what you want to do, although there is probably a better 
way to do it.


A.DT <- data.table(a1 = A.DT[,a1], 
                   a2=sort(ifelse(B.DT[,b2] <= N/2 & B.DT[,b1] < 
A.DT[nrow(A.DT):1,a1], 
                                  B.DT[nrow(A.DT):1,b1], 
                                  A.DT[nrow(A.DT):1, a2]), na.last=FALSE))


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Matteo Richiardi
Sent: Dienstag, 19. März 2013 08:24
To: r-help@r-project.org
Subject: [R] How can I eliminate a loop over a data.table?

I've two data.tables as shown below:
***
N = 10
A.DT <- data.table(a1 = c(rnorm(N,0,1)), a2 = NA)) B.DT <- data.table(b1 = 
c(rnorm(N,0,1)), b2 = 1:N)
setkey(A.DT,a1)
setkey(B.DT,b1)
***

I tried to change my previous data.frame implementation to a data.table 
implementation by changing the for-loop as shown below:
***
for (i in 1:nrow(B.DT)) {
  for (j in nrow(A.DT):1) {
    if (B.DT[i,b2] <= N/2
        && B.DT[i,b1] < A.DT[j,a1]) {
      A.DT[j,]$a2 <- B.DT[i,]$b1
      break
    }
  }
}
***

I get the following error message:
***
Error in `[<-.data.table`(`*tmp*`, j, a2, value = -0.391987468746123) :
  object "a2" not found
***

I think the way I access data.table is not quite right. I am new to it. I guess 
there is a quicker way of doing it than cycling up and down the two datatables.
I'd like to know if the loop shown above could be simplified/vectorised.
The data.table data for copy/paste reads:

***
# A.DT
    a1  a2
1   -1.4917779  NA
2   -1.0731161  NA
3   -0.7533091  NA
4   -0.3673273  NA
5   -0.159569   NA
6   -0.1551948  NA
7   -0.0430574  NA
8   0.1783496   NA
9   0.4276034   NA
10  1.0697412   NA

# B.DT
    b1  b2
1   0.64229018  1
2   1.00527902  2
3   0.24746294  3
4   -0.50288835 4
5   0.34447791  5
6   -0.22205129 6
7   0.60099079  7
8   -0.70242284 8
9   0.6298599   9
10  0.08917988  10
***

The output I expect is:
***
# OUTPUT
    a1  a2
1   -1.4917779  NA
2   -1.0731161  NA
3   -0.7533091  NA
4   -0.3673273  NA
5   -0.159569   NA
6   -0.1551948  NA
7   -0.0430574  NA
8   0.1783496   -0.50288835
9   0.4276034   0.24746294
10  1.0697412   0.64229018
***

The algorithm goes down one table, and for each row go up the other table, 
check some conditions and modify values accordingly. More specifically, it goes 
down B.DT, and for each row in B.DT goes up A.DT and assigns to a2 the first 
value of b1 such that b1 is smaller than a1. An additional condition is checked 
before assignment (b2 being equal or smaller than 5 in this example).

0.64229018 is the first value in B.DT, and it is assigned to the last unit of 
A.DT.
1.00527902 is the second value in B.DT, but it is left unassigned because it is 
bigger than all other values in A.DT.
0.24746294 is the third value in B.DT, and it is assigned to the second last 
unit in A.DT.
-0.50288835 is the fourth value in B.DT, and it is assigned to unit #8 in A.DT
0.34447791 is the fifth value in B.DT, and it is left unassigned because it is 
too big.

This is of course a simplified problem (and therefore may not make much sense). 
Thanks for your time and input.

Matteo

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I eliminate a loop over a data.table?

Reply via email to