Yes, even > summary(NA_real_) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's NA NA NA NaN NA NA 1
which is presumably because the mean is an empty sum (= 0) divided by a zero count, and 0/0 = NaN. Notice also the differenc between > mean(NA_real_) [1] NA > mean(NA_real_, na.rm=TRUE) [1] NaN > On 3 Sep 2021, at 09:59 , Luigi Marongiu <marongiu.lu...@gmail.com> wrote: > > Fair enough, I'll check the actual data to see if there are indeed any > NaN (which should not, since the data are categories, not generated by > math). > Thanks! > > On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr <petr.pi...@precheza.cz> wrote: >> >> Hi Luigi. >> >> Weird. But maybe it is the desired behaviour of summary when calculating >> mean of numeric column full of NAs. >> >> See example >> >> dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110)) >> >> # change all values in second column to NA >> dat[,2] <- NA >> # change some of them to NAN >> dat[5:6, 2:3] <- 0/0 >> >> # see summary >> summary(dat) >> x y z >> Mode:logical Min. : NA Min. :-1.9798 >> NA's:110 1st Qu.: NA 1st Qu.:-0.4729 >> Median : NA Median : 0.1745 >> Mean :NaN Mean : 0.1856 >> 3rd Qu.: NA 3rd Qu.: 0.8017 >> Max. : NA Max. : 2.5075 >> NA's :110 NA's :2 >> >> # change NAN values to NA >> dat[sapply(dat, is.nan)] <- NA >> ************************* >> >> #summary is same >> summary(dat) >> x y z >> Mode:logical Min. : NA Min. :-1.9798 >> NA's:110 1st Qu.: NA 1st Qu.:-0.4729 >> Median : NA Median : 0.1745 >> Mean :NaN Mean : 0.1856 >> 3rd Qu.: NA 3rd Qu.: 0.8017 >> Max. : NA Max. : 2.5075 >> NA's :110 NA's :2 >> >> # but no NAN value in data >> dat[1:10,] >> x y z >> 1 NA NA -0.9148696 >> 2 NA NA 0.7110570 >> 3 NA NA -0.1901676 >> 4 NA NA 0.5900650 >> 5 NA NA NA >> 6 NA NA NA >> 7 NA NA 0.7987658 >> 8 NA NA -0.5225229 >> 9 NA NA 0.7673103 >> 10 NA NA -0.5263897 >> >> So my "nice compact command" >> dat[sapply(dat, is.nan)] <- NA >> >> works as expected, but summary gives as mean NAN. >> >> Cheers >> Petr >> >>> -----Original Message----- >>> From: R-help <r-help-boun...@r-project.org> On Behalf Of Luigi Marongiu >>> Sent: Thursday, September 2, 2021 3:46 PM >>> To: Andrew Simmons <akwsi...@gmail.com> >>> Cc: r-help <r-help@r-project.org> >>> Subject: Re: [R] How to globally convert NaN to NA in dataframe? >>> >>> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still >> get >>> NaN when using the summary function, for instance one of the columns give: >>> ``` >>> Min. : NA >>> 1st Qu.: NA >>> Median : NA >>> Mean :NaN >>> 3rd Qu.: NA >>> Max. : NA >>> NA's :110 >>> ``` >>> I tried to implement the second solution but: >>> ``` >>> df <- lapply(x, function(xx) { >>> xx[is.nan(xx)] <- NA >>> }) >>>> str(df) >>> List of 1 >>> $ sd_ef_rash_loc___palm: logi NA >>> ``` >>> What am I getting wrong? >>> Thanks >>> >>> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons <akwsi...@gmail.com> >>> wrote: >>>> >>>> Hello, >>>> >>>> >>>> I would use something like: >>>> >>>> >>>> x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |> >>>> as.data.frame() x[] <- lapply(x, function(xx) { >>>> xx[is.nan(xx)] <- NA_real_ >>>> xx >>>> }) >>>> >>>> >>>> This prevents attributes from being changed in 'x', but accomplishes the >>> same thing as you have above, I hope this helps! >>>> >>>> On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu <marongiu.lu...@gmail.com> >>> wrote: >>>>> >>>>> Hello, >>>>> I have some NaN values in some elements of a dataframe that I would >>>>> like to convert to NA. >>>>> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise. >>>>> Is there an alternative for the global modification at once of all >>>>> instances? >>>>> I have seen from >>>>> https://stackoverflow.com/questions/18142117/how-to-replace-nan- >>> value >>>>> -with-zero-in-a-huge-data-frame/18143097#18143097 >>>>> that once could use: >>>>> ``` >>>>> >>>>> is.nan.data.frame <- function(x) >>>>> do.call(cbind, lapply(x, is.nan)) >>>>> >>>>> data123[is.nan(data123)] <- 0 >>>>> ``` >>>>> replacing o with NA, but I got >>>>> ``` >>>>> str(df) >>>>>> logi NA >>>>> ``` >>>>> when modifying my dataframe df. >>>>> What would be the correct syntax? >>>>> Thank you >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Luigi >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> Best regards, >>> Luigi >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting- >>> guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Best regards, > Luigi > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.