there are way "more efficient" ways of doing many of the operations , but you
probably won't see any differences unless you have very large objects (several
hunfred thousand entries), or have to do it a lot of times. My background is
in computer performance and for the most part I have found that the
easiest/mostbstraight forward ways are fine most of the time.
a more efficient way might be:
testdata <- testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]
you can always use 'system.time' to determine how long actions take.
for multiple comparisons use %in%
Sent from my iPad
On Mar 3, 2013, at 9:22, Matt Borkowski <[email protected]> wrote:
> Thank you for your response Jim! I will give this one a try! But a couple
> followup questions...
>
> In my search for a solution, I had seen something stating match() is much
> more efficient than subset() and will cut down significantly on computing
> time. Is there any truth to that?
>
> Also, I found the following solution which works for matching a single
> condition, but I couldn't quite figure out how to modify it it to search for
> both my acceptable conditions...
>
>> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]
>
> -Matt
>
>
>
>
> --- On Sun, 3/3/13, jim holtman <[email protected]> wrote:
>
> From: jim holtman <[email protected]>
> Subject: Re: [R] Help searching a matrix for only certain records
> To: "Matt Borkowski" <[email protected]>
> Cc: [email protected]
> Date: Sunday, March 3, 2013, 8:00 AM
>
> Try this:
>
> dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
>
>
> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <[email protected]> wrote:
>> Let me start by saying I am rather new to R and generally consider myself to
>> be a novice programmer...so don't assume I know what I'm doing :)
>>
>> I have a large matrix, approximately 300,000 x 14. It's essentially a
>> 20-year dataset of 15-minute data. However, I only need the rows where the
>> column I've named REC.TYPE contains the string "SAO " or "FL-15".
>>
>> My horribly inefficient solution was to search the matrix row by row, test
>> the REC.TYPE column and essentially delete the row if it did not match my
>> criteria. Essentially...
>>
>>> j <- 1
>>> for (i in 1:nrow(dataset)) {
>>> if(dataset$REC.TYPE[j] != "SAO " && dataset$RECTYPE[j] != "FL-15") {
>>> dataset <- dataset[-j,] }
>>> else {
>>> j <- j+1 }
>>> }
>>
>> After watching my code get through only about 10% of the matrix in an hour
>> and slowing with every row...I figure there must be a more efficient way of
>> pulling out only the records I need...especially when I need to repeat this
>> for another 8 datasets.
>>
>> Can anyone point me in the right direction?
>>
>> Thanks!
>>
>> Matt
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.