On 01/06/2014 11:14 AM, Sarah Goslee wrote:
Hi Walter,

I can't reproduce your results. Please provide some data that
demonstrates the problem.

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

subset() and [ differ in their handling of NA values, and you don't
need the dd$ in the arguments to subset().

But those don't explain your result given the information provided.
Please provide more information.

Sarah


On Mon, Jan 6, 2014 at 12:06 PM, Walter Anderson <wandrso...@gmail.com> wrote:
I have a data frame that I am extracting some records from and noticed the
following issue

I originally used tmp <- subset(dd, dd$EVYEAR==2012 & dd$EVMONTH=='02')

and noticed that I wasn't ending up with all of the records I should have;
however, when I used

tmp <- dd[dd$EVYEAR==2012 & dd$EVMONTH=='02',]

I did get all of the records I should have.

I thought the two forms were equivalent, am I mistaken?

Thanks everyone for the response. I didn't provide a reproducible test, since the data I experienced this issue with was quite large (> 40MB) and I have not been able to reproduce the problem with any other data set. I have also performed the subset using Microsoft Access on the original dbf file I use for the data frame and confirmed that the second query format (dd[QUERY,]) is producing the correct results. It doesn't appear that any of the impacted (or any in the data frame) contain NA records.

I am not really looking for any particular solution, but was surprised by the different results from what I presumed to be the same query. If it is believed to be a possible bug, I would be glad to package up the data that is generating the issue, but not sure where to place such a large data set.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to