Hi Luigi.
Weird. But maybe it is the desired behaviour of summary when calculating
mean of numeric column full of NAs.
See example
dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))
# change all values in second column to NA
dat[,2] <- NA
# change some of them to NAN
dat[5:6, 2:3]
Abou,
I am not trying to be negative. Assuming you are a professor of Statistics,
your request seems odd as what you are asking about is very routine in much of
statistical work where you want to make a model or something using just part of
your data and need to reserve some to check if you
Hi Abou,
One way is to shuffle the original data frame using sample(). and
split up the result into three equal parts.
I was going to provide example code, but Avi's response popped up and
I kind of agree with him.
Jim
On Fri, Sep 3, 2021 at 11:31 AM AbouEl-Makarim Aboueissa
wrote:
>
> Dear All:
Sorry, please forget about it. I believe that I am very serious when I
posted my question.
with thanks
abou
__
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern
What is stopping you Abou?
Some of us here start wondering if we have better things to do than homework
for others. Help is supposed to be after they try and encounter issues that we
may help with.
So think about your problem. You supplied data in a file that is NOT in CSV
format but is in Tab
Dear All:
How to split a column data *randomly* into three groups. Please see the
attached data. I need to split column #2 titled "Data"
with many thanks
abou
__
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of
Hi Eliza
This seems to work:
plot(BFA3[,1],BFA3[,4],
pch=16, xlab = "", ylab = "",col=(BFA3[,2]==BFA3[,3])+2,axes=FALSE)
but I have no idea what you are trying to do with the
as.numeric(as.Date(...))
business.
Jim
On Fri, Sep 3, 2021 at 8:44 AM Eliza Botto wrote:
>
> Dear useRs,
>
> For the
Dear useRs,
For the following dataset,
dput(BFA3)
structure(c(17532, 17533, 17534, 17535, 17536, 17537, 17538,
17539, 17540, 17541, 17542, 17543, 17544, 17545, 17546, 17547,
17548, 17549, 17550, 17551, 17552, 17553, 17554, 17555, 17556,
17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564, 175
Thanks, that is perfect!
On Thu, Sep 2, 2021 at 7:02 PM Deepayan Sarkar
wrote:
>
> On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann
> wrote:
> >
> > On Thu, 02 Sep 2021, Luigi Marongiu writes:
> >
> > > Hello, is it possible to show only the header (that is: `'data.frame':
> > > x obs. of y vari
Hello,
I believe but do not have references that str was meant for interactive
use, not for use in a script or package. If this is the case, then it
should be rare to have to output to an object such as a character vector.
As for my solution, it is far from perfect, I try to avoid
capture.ou
On 02/09/2021 3:20 p.m., Greg Minshall wrote:
Andrew,
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
is different from
x <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
indeed, the two are different -- but some ignorance of mine is exposed.
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the key is
learning how to create the column that is the same for all records
corresponding to the time interval of interest.
If you convert the sampdate to
On Thu, 2 Sep 2021, Andrew Simmons wrote:
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind",
Andrew,
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> is different from
>
> x <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
indeed, the two are different -- but some ignorance of mine is exposed.
i wonder, can you explain why t
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds <-
On Thu, 2 Sep 2021, Rich Shepard wrote:
If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either
On Mon, 30 Aug 2021, Richard O'Keefe wrote:
x <- rnorm(samples.per.day * 365)
length(x)
[1] 105120
Reshape the fake data into a matrix where each row represents one
24-hour period.
m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Richard,
Now I understand the need to keep the date and tim
On Thu, 2 Sep 2021, Enrico Schumann wrote:
There is no column 'ht'.
Enrico,
New eyeballs caught my change in variable name that I kept missing.
Thanks very much,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://st
On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann wrote:
>
> On Thu, 02 Sep 2021, Luigi Marongiu writes:
>
> > Hello, is it possible to show only the header (that is: `'data.frame':
> > x obs. of y variables:` part) of the str function?
> > Thank you
>
> Perhaps one more solution. You could limit th
On Thu, 02 Sep 2021, Rich Shepard writes:
> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running the
On Thu, 02 Sep 2021, Rich Shepard writes:
> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running the
Thanks for the interesting method Rui. So that is a way to do a redirect of
output not to a sinkfile but to an in-memory variable as a textConnection.
Of course, one has to wonder why the makers of str thought it would be too
inefficient to have an option that returns the output in a form that c
The first three commands in the script are:
stage <- read.csv('../data/water/gauge-ht.dat', header = TRUE, sep = ',',
stringsAsFactors = FALSE)
stage$sampdate <- as.Date(stage$sampdate)
stage$ht <- as.numeric(stage$ht, length = 6)
Running the script produces this error:
source('stage.R')
Erro
On Thu, 02 Sep 2021, Luigi Marongiu writes:
> Hello, is it possible to show only the header (that is: `'data.frame':
> x obs. of y variables:` part) of the str function?
> Thank you
Perhaps one more solution. You could limit the number
of list components to be printed, though it will leave
a "tr
Luigi,
If you are sure you are looking at something like a data.frame, and all you
want o know is how many rows and how many columns are in it, then str() is
perhaps too detailed a tool.
The functions nrow() and ncol() tell you what you want and you can get both
together with dim(). You can, of c
Thank you!
On Thu, Sep 2, 2021 at 4:17 PM Andrew Simmons wrote:
>
> It seems like you might've missed one more thing, you need the brackets next
> to 'x' to get it to work.
>
>
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> is different from
>
> x <- lapply(
It seems like you might've missed one more thing, you need the brackets
next to 'x' to get it to work.
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
is different from
x <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
Also, if all of your data is
Sorry,
still I don't get it:
```
> dim(df)
[1] 302 626
> # clean
> df <- lapply(x, function(xx) {
+ xx[is.nan(xx)] <- NA
+ xx
+ })
> dim(df)
NULL
```
On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons wrote:
>
> You removed the second line 'xx' from the function, put it back and it should
> work
Hello,
In the particular case you have, to change to NA based on condition, use
`is.na<-`.
Here is some test data, 3 times the same df.
set.seed(2021)
df3 <- df2 <- df1 <- data.frame(
x = c(0, 0, 1, 2, 3),
y = c(1, 2, 3, 0, 0),
z = rbinom(5, 1, prob = c(0.25, 0.75)),
a = letters[1:5]
Hi
you could operate with whole data frame (sometimes)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2
You removed the second line 'xx' from the function, put it back and it
should work
On Thu, Sep 2, 2021, 09:45 Luigi Marongiu wrote:
> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
> still get NaN when using the summary function, for instance one of the
> columns give:
> ```
`data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
still get NaN when using the summary function, for instance one of the
columns give:
```
Min. : NA
1st Qu.: NA
Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
NA's :110
```
I tried to implement the second solution but:
```
df <
Hello,
it is possible to select the columns of a dataframe in sequence with:
```
for(i in 1:ncol(df)) {
df[ , i]
}
# or
for(i in 1:ncol(df)) {
df[ i]
}
```
And change all values with, for instance:
```
for(i in 1:ncol(df)) {
df[ , i] <- df[ , i] + 10
}
```
Is it possible to apply a condition?
Hello,
I would use something like:
x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
as.data.frame()
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
This prevents attributes from being changed in 'x', but accomplishes the
same thing as you have abov
Hi
what about
data[sapply(data, is.nan)] <- NA
Cheers
Petr
> -Original Message-
> From: R-help On Behalf Of Luigi Marongiu
> Sent: Thursday, September 2, 2021 3:18 PM
> To: r-help
> Subject: [R] How to globally convert NaN to NA in dataframe?
>
> Hello,
> I have some NaN values in so
Hello,
I have some NaN values in some elements of a dataframe that I would
like to convert to NA.
The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
Is there an alternative for the global modification at once of all
instances?
I have seen from
https://stackoverflow.com/questions
Thank you! better than dim() anyway.
Best regards
Luigi
On Thu, Sep 2, 2021 at 1:31 PM Rui Barradas wrote:
>
> Hello,
>
> Not perfect but works for data.frames:
>
>
> header_str <- function(x){
>capture.output(str(x))[[1]]
> }
> header_str(iris)
> header_str(AirPassengers)
> header_str(1:10)
Hello,
Not perfect but works for data.frames:
header_str <- function(x){
capture.output(str(x))[[1]]
}
header_str(iris)
header_str(AirPassengers)
header_str(1:10)
Hope this helps,
Rui Barradas
Às 12:02 de 02/09/21, Luigi Marongiu escreveu:
Hello, is it possible to show only the header (t
Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of y variables:` part) of the str function?
Thank you
--
Best regards,
Luigi
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailma
On Wed, 1 Sep 2021 19:29:32 -0400
Duncan Murdoch wrote:
> I don't know the header of your foo() method, but let's suppose foo()
> is
>
>foo <- function(x, data, ...) {
> UseMethod("foo")
>}
>
> with
>
>foo.formula <- function(x, data, ...) {
> # do something with the
Hello,
With the new data, here are two ways.
The first with a for loop. I find it simple and readable.
for(b in unique(B[,1])){
A[which(A[,1] == b), 2] <- B[which(B[,1] == b), 2]
}
na <- is.na(A[,2])
A[!na, 2]
sum(!na) # [1] 216
sum(A[,1] %in% B[,1]) # [1] 216
# Another way,
Thank you, Eric. Very useful.
From: Eric Berger
Sent: Wednesday, September 1, 2021 12:31 PM
To: cag...@gmail.com
Cc: R mailing list
Subject: Re: [R] how to install npsm package
Instructions can be found at https://github.com/kloke/npsm
On Wed, Sep 1, 2021 at 6:27 PM mailto:cag...
Dear useRs,
I'm having a problem to combine geom_boxplot and geom_point with jitter.
It is difficult to explain but the code and result should make it clear
(the example dataset is long so I copy it at the end of the email):
p <- ggplot(my_data, aes(x = Diet, y = value, color = Software))
p <
Thank you.
el
On 02/09/2021 00:41, Bill Dunlap wrote:
z <- tibble(Code=c("NA","NZ",NA), Name=c("Namibia","New Zealand","?"))
z
# A tibble: 3 x 2
Code Name
1 NANamibia
2 NZNew Zealand
3 ?
subset(z, Code=="NA")
# A tibble: 1 x 2
Code Name
1 NANamibia
subset(z,
44 matches
Mail list logo