HI,
You could also use ?data.table() 

n<- 300000
set.seed(51)
 mat1<- as.matrix(data.frame(REC.TYPE= 
sample(c("SAO","FAO","FL-1","FL-2","FL-15"),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE))
 dat1<- as.data.frame(mat1,stringsAsFactors=FALSE)
 table(mat1[,1])
#
 # FAO  FL-1 FL-15  FL-2   SAO 
#60046 60272 59669 59878 60135 
system.time(x1 <- subset(mat1, grepl("(SAO|FL-15)", mat1[, "REC.TYPE"])))
 #user  system elapsed 
 # 0.076   0.004   0.082 
 system.time(x2 <- subset(mat1, mat1[, "REC.TYPE"] %in% c("SAO", "FL-15")))
 #  user  system elapsed 
 # 0.028   0.000   0.030 

system.time(x3 <- mat1[match(mat1[, "REC.TYPE"]
                            , c("SAO", "FL-15")
                            , nomatch = 0) != 0
                            ,, drop = FALSE]
            )
#user  system elapsed 
#  0.028   0.000   0.028 
 table(x3[,1])
#
#FL-15   SAO 
#59669 60135 


library(data.table)

dat2<- data.table(dat1) 
 system.time(x4<- dat2[match(REC.TYPE,c("SAO", 
"FL-15"),nomatch=0)!=0,,drop=FALSE])
  # user  system elapsed 
  #0.024   0.000   0.025 
 table(x4$REC.TYPE)

#FL-15   SAO 
#59669 60135 
A.K.








----- Original Message -----
From: jim holtman <jholt...@gmail.com>
To: Matt Borkowski <mathias1...@yahoo.com>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Sent: Sunday, March 3, 2013 11:52 AM
Subject: Re: [R] Help searching a matrix for only certain records

If you are using matrices, then here is several ways of doing it for
size 300,000.  You can determine if the difference of 0.1 seconds is
important in terms of the performance you are after.  It is taking you
more time to type in the statements than it is taking them to execute:

> n <- 300000
> testdata <- matrix(
+     sample(c("SAO ", "FL-15", "Other"), n, TRUE, prob = c(1,2,1000))
+     , nrow = n
+     , dimnames = list(NULL, "REC.TYPE")
+     )
> table(testdata[, "REC.TYPE"])

FL-15  Other   SAO
   562 299151    287
> system.time(x1 <- subset(testdata, grepl("(SAO |FL-15)", testdata[, 
> "REC.TYPE"])))
   user  system elapsed
   0.17    0.00    0.17
> system.time(x2 <- subset(testdata, testdata[, "REC.TYPE"] %in% c("SAO ", 
> "FL-15")))
   user  system elapsed
   0.05    0.00    0.05
> system.time(x3 <- testdata[match(testdata[, "REC.TYPE"]
+                             , c("SAO ", "FL-15")
+                             , nomatch = 0) != 0
+                             ,, drop = FALSE]
+             )
   user  system elapsed
   0.03    0.00    0.03
> identical(x1, x2)
[1] TRUE
> identical(x2, x3)
[1] TRUE
>


On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman <jholt...@gmail.com> wrote:
> there are way "more efficient" ways of doing many of the operations , but you 
> probably won't see any differences unless you have very large objects 
> (several hunfred thousand entries), or have to do it a lot of times.  My 
> background is in computer performance and for the most part I have found that 
> the easiest/mostbstraight forward ways are fine most of the time.
>
> a more efficient way might be:
>
> testdata <- testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]
>
> you can always use 'system.time' to determine how long actions take.
>
> for multiple comparisons use %in%
>
> Sent from my iPad
>
> On Mar 3, 2013, at 9:22, Matt Borkowski <mathias1...@yahoo.com> wrote:
>
>> Thank you for your response Jim! I will give this one a try! But a couple 
>> followup questions...
>>
>> In my search for a solution, I had seen something stating match() is much 
>> more efficient than subset() and will cut down significantly on computing 
>> time. Is there any truth to that?
>>
>> Also, I found the following solution which works for matching a single 
>> condition, but I couldn't quite figure out how to  modify it it to search 
>> for both my acceptable conditions...
>>
>>> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]
>>
>> -Matt
>>
>>
>>
>>
>> --- On Sun, 3/3/13, jim holtman <jholt...@gmail.com> wrote:
>>
>> From: jim holtman <jholt...@gmail.com>
>> Subject: Re: [R] Help searching a matrix for only certain records
>> To: "Matt Borkowski" <mathias1...@yahoo.com>
>> Cc: r-help@r-project.org
>> Date: Sunday, March 3, 2013, 8:00 AM
>>
>> Try this:
>>
>> dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
>>
>>
>> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1...@yahoo.com> wrote:
>>> Let me start by saying I am rather new to R and generally consider myself 
>>> to be a novice programmer...so don't assume I know what I'm doing :)
>>>
>>> I have a large matrix, approximately 300,000 x 14. It's essentially a 
>>> 20-year dataset of 15-minute data. However, I only need the rows where the 
>>> column I've named REC.TYPE contains the string "SAO  " or "FL-15".
>>>
>>> My horribly inefficient solution was to search the matrix row by row, test 
>>> the REC.TYPE column and essentially delete the row if it did not match my 
>>> criteria. Essentially...
>>>
>>>> j <- 1
>>>> for (i in 1:nrow(dataset)) {
>>>>     if(dataset$REC.TYPE[j] != "SAO  " && dataset$RECTYPE[j] != "FL-15") {
>>>>       dataset <- dataset[-j,]  }
>>>>     else {
>>>>       j <- j+1  }
>>>> }
>>>
>>> After watching my code get through only about 10% of the matrix in an hour 
>>> and slowing with every row...I figure there must be a more efficient way of 
>>> pulling out only the records I need...especially when I need to repeat this 
>>> for another 8 datasets.
>>>
>>> Can anyone point me in the right direction?
>>>
>>> Thanks!
>>>
>>> Matt
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to