... actually, FWIW, I would say that this little discussion mostly demonstrates why the OP's request is probably not a good idea in the first place. Usually, NA's should be left as NA's to be dealt with properly by R and packages. In biological measurements, for example, NA's often mean "below the ability to reliably measure." Biologists with whom I've worked over many years often want to convert these to 0 or omit the cases, both of which lead to biased estimates and/or underestimates of variability and excess claims of "statistical significance" (for those who belong to this religious persuasion). One should never say never, but I suspect that there are relatively few circumstances where the conversion the OP requested is actually wise.
Feel free to ignore/reject such extraneous comments of course. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson <dcarl...@tamu.edu> wrote: > Good point. I did not think about factors. Also your example raises another > issue since column c is logical, but gets silently converted to numeric. This > would seem to get the job done assuming the conversion is intended for > numeric columns only: > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) >> sapply(test, class) > a b c > "numeric" "factor" "logical" >> num <- sapply(test, is.numeric) >> test[, num][is.na(test[, num])] <- 0 >> test > a b c > 1 1 A NA > 2 0 b NA > 3 2 <NA> NA > > David C > > -----Original Message----- > From: Bert Gunter [mailto:bgunter.4...@gmail.com] > Sent: Thursday, June 23, 2016 1:48 PM > To: David L Carlson > Cc: Ivan Calandra; R Help > Subject: Re: [R] Subscripting problem with is.na() > > Not in general, David: > > e.g. > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > >> is.na(test) > a b c > [1,] FALSE FALSE TRUE > [2,] TRUE FALSE TRUE > [3,] FALSE TRUE TRUE > >> test[is.na(test)] > [1] NA NA NA NA NA > >> test[is.na(test)] <- 0 > Warning message: > In `[<-.factor`(`*tmp*`, thisvar, value = 0) : > invalid factor level, NA generated > >> test > a b c > 1 1 A 0 > 2 0 b 0 > 3 2 <NA> 0 > > > The problem is the default conversion to factors and the replacement > operation for factors. So: > >> test <- data.frame(a=c(1,NA,2), b = I(c("A","b",NA_character_)), c= >> rep(NA,3)) >> class(test$b) > [1] "AsIs" ## so NOT a factor > >> test[is.na(test)] <- 0 # now works as you describe >> test > a b c > 1 1 A 0 > 2 0 b 0 > 3 2 0 0 > > Of course the OP (and you) probably had a data frame of all numerics > in mind, so the problem doesn't arise. But I think one needs to make > the distinction and issue clear. > > Cheers, > Bert > > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Jun 23, 2016 at 8:46 AM, David L Carlson <dcarl...@tamu.edu> wrote: >> The function is.na() returns a matrix when applied to a data.frame so you >> can easily convert all the NAs to 0's: >> >>> ds_test >> var1 var2 >> 1 1 1 >> 2 2 2 >> 3 3 3 >> 4 NA NA >> 5 5 5 >> 6 6 6 >> 7 7 7 >> 8 NA NA >> 9 9 9 >> 10 10 10 >>> is.na(ds_test) >> var1 var2 >> [1,] FALSE FALSE >> [2,] FALSE FALSE >> [3,] FALSE FALSE >> [4,] TRUE TRUE >> [5,] FALSE FALSE >> [6,] FALSE FALSE >> [7,] FALSE FALSE >> [8,] TRUE TRUE >> [9,] FALSE FALSE >> [10,] FALSE FALSE >>> ds_test[is.na(ds_test)] <- 0 >>> ds_test >> var1 var2 >> 1 1 1 >> 2 2 2 >> 3 3 3 >> 4 0 0 >> 5 5 5 >> 6 6 6 >> 7 7 7 >> 8 0 0 >> 9 9 9 >> 10 10 10 >> >> ------------------------------------- >> David L Carlson >> Department of Anthropology >> Texas A&M University >> College Station, TX 77840-4352 >> >> -----Original Message----- >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra >> Sent: Thursday, June 23, 2016 10:14 AM >> To: R Help >> Subject: Re: [R] Subscripting problem with is.na() >> >> Thank you Bert for this clarification. It is indeed an important point. >> >> Ivan >> >> -- >> Ivan Calandra, PhD >> Scientific Mediator >> University of Reims Champagne-Ardenne >> GEGENAA - EA 3795 >> CREA - 2 esplanade Roland Garros >> 51100 Reims, France >> +33(0)3 26 77 36 89 >> ivan.calan...@univ-reims.fr >> -- >> https://www.researchgate.net/profile/Ivan_Calandra >> https://publons.com/author/705639/ >> >> Le 23/06/2016 à 17:06, Bert Gunter a écrit : >>> Sorry, Ivan, your statement is incorrect: >>> >>> "When you use a single bracket on a list with only one argument in >>> between, then R extracts "elements", i.e. columns in the case of a >>> data.frame. This explains your errors. " >>> >>> e.g. >>> >>>> ex <- data.frame(a = 1:3, b = letters[1:3]) >>>> a <- 1:3 >>>> identical(ex[1], a) >>> [1] FALSE >>> >>>> class(ex[1]) >>> [1] "data.frame" >>>> class(a) >>> [1] "integer" >>> >>> Compare: >>> >>>> identical(ex[[1]], a) >>> [1] TRUE >>> >>> Why? Single bracket extraction on a list results in a list; double >>> bracket extraction results in the element of the list ( a "column" in >>> the case of a data frame, which is a specific kind of list). The >>> relevant sections of ?Extract are: >>> >>> "Indexing by [ is similar to atomic vectors and selects a **list** of >>> the specified element(s). >>> >>> Both [[ and $ select a **single element of the list**. " >>> >>> >>> Hope this clarifies this often-confused issue. >>> >>> >>> Cheers, >>> Bert >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra >>> <ivan.calan...@univ-reims.fr> wrote: >>>> My statement "Using a single bracket '[' on a data.frame does the same as >>>> for matrices: you need to specify rows and columns" was not correct. >>>> >>>> >>>> When you use a single bracket on a list with only one argument in between, >>>> then R extracts "elements", i.e. columns in the case of a data.frame. This >>>> explains your errors. >>>> >>>> But it is possible to use a single bracket on a data.frame with 2 arguments >>>> (rows, columns) separated by a comma, as with matrices. This is the >>>> solution >>>> you received. >>>> >>>> Ivan >>>> >>>> >>>> -- >>>> Ivan Calandra, PhD >>>> Scientific Mediator >>>> University of Reims Champagne-Ardenne >>>> GEGENAA - EA 3795 >>>> CREA - 2 esplanade Roland Garros >>>> 51100 Reims, France >>>> +33(0)3 26 77 36 89 >>>> ivan.calan...@univ-reims.fr >>>> -- >>>> https://www.researchgate.net/profile/Ivan_Calandra >>>> https://publons.com/author/705639/ >>>> >>>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit : >>>>> Dear Georg, >>>>> >>>>> You need to learn a bit more about the subsetting methods, depending on >>>>> the object structure you're trying to subset. >>>>> >>>>> More specifically, when you run this: ds_test[is.na(ds_test$var1)] >>>>> you get this error: "Error in `[.data.frame`(ds_test, is.na(ds_test$var1)) >>>>> : undefined columns selected" >>>>> >>>>> This means that R does not understand which column you're trying to >>>>> select. But you're actually trying to select rows. >>>>> >>>>> Using a single bracket '[' on a data.frame does the same as for matrices: >>>>> you need to specify rows and columns, like this: >>>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma >>>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns because you >>>>> didn't specify any after the comma >>>>> >>>>> If you want it only for "var1", then you need to specify the column: >>>>> ds_test[is.na(ds_test$var1), "var1"] <- 0 >>>>> >>>>> It's the same problem with your 2nd and 4th tries (4th one has other >>>>> problems). Your 3rd try does not change ds_test at all. >>>>> >>>>> HTH, >>>>> Ivan >>>>> >>>>> -- >>>>> Ivan Calandra, PhD >>>>> Scientific Mediator >>>>> University of Reims Champagne-Ardenne >>>>> GEGENAA - EA 3795 >>>>> CREA - 2 esplanade Roland Garros >>>>> 51100 Reims, France >>>>> +33(0)3 26 77 36 89 >>>>> ivan.calan...@univ-reims.fr >>>>> -- >>>>> https://www.researchgate.net/profile/Ivan_Calandra >>>>> https://publons.com/author/705639/ >>>>> >>>>> Le 23/06/2016 à 15:57, g.maub...@weinwolf.de a écrit : >>>>>> Hi All, >>>>>> >>>>>> I would like to recode my NAs to 0. Using a single vector everything is >>>>>> fine. >>>>>> >>>>>> But if I use a data.frame things go wrong: >>>>>> >>>>>> -- cut -- >>>>>> >>>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10) >>>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10) >>>>>> ds_test <- >>>>>> data.frame(var1, var2) >>>>>> >>>>>> test <- var1 >>>>>> test[is.na(test)] <- 0 >>>>>> test # NA recoded OK >>>>>> >>>>>> # First try >>>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG >>>>>> >>>>>> # Second try >>>>>> ds_test[is.na("var1")] <- 0 >>>>>> ds_test$var1 # not recoded WRONG >>>>>> >>>>>> # Third try: to me the most intuitive approach >>>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one element in >>>>>> integerOneIndex WRONG >>>>>> >>>>>> # Fourth try >>>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns WRONG >>>>>> >>>>>> -- cut -- >>>>>> How can I do it correctly? >>>>>> >>>>>> Where could I have found something about it? >>>>>> >>>>>> Kind regards >>>>>> >>>>>> Georg >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.