urday, August 31, 2024 7:15 AM
To: r-help@R-project.org
Subject: [R] aggregating data with quality control
Dear R-list users,
I deal with semi-hourly data from automatic meteorological stations.
They have to pass a manual validation; suppose that status = "C" stands for
correct and s
Sofia
Oggetto: Re: [R] aggregating data with quality control
[Non ricevi spesso messaggi di posta elettronica da ikry...@disroot.org. Per
informazioni sull'importanza di questo fatto, visita
https://aka.ms/LearnAboutSenderIdentification.]
� Sat, 31 Aug 2024 11:15:10 +
Ste
Às 12:15 de 31/08/2024, Stefano Sofia escreveu:
Dear R-list users,
I deal with semi-hourly data from automatic meteorological stations.
They have to pass a manual validation; suppose that status = "C" stands for correct and
status = "D" for discarded.
Here a simple example with "Snow height"
В Sat, 31 Aug 2024 11:15:10 +
Stefano Sofia пишет:
> Evaluating the daily mean indipendently from the status is very easy:
>
> aggregate(mydf$hs, by=list(format(mydf$data_POSIX, "%Y"),
> format(mydf$data_POSIX, "%m"), format(mydf$data_POSIX, "%d")),
> my.mean)
>
>
> Things become more comp
Dear R-list users,
I deal with semi-hourly data from automatic meteorological stations.
They have to pass a manual validation; suppose that status = "C" stands for
correct and status = "D" for discarded.
Here a simple example with "Snow height" (HS):
mydf <- data.frame(data_POSIX=seq(as.POSIX
R-Help
Please disregard as I figure something out, unless there is a more elegant
way ...
myData.sum <- aggregate(x =
myData[c("s72","s79","s82","s83","s116","s119")],
FUN = sum,
by = list(Group.date = myData$shortdate))
> head(myData.sum)
Group.date s7
R-Help
I created a "shortdate" for the purpose of aggregating each var (S72 .S119)
by daily sum , but not sure how to handle using a POSIXlt object.
> myData$shortdate <- strftime(myData$time, format="%Y/%m/%d")
> head(myData)
time s72 s79 s82 s83 s116 s119 shortdate
1 2016-10-
check this out
http://www.r-bloggers.com/pivot-tables-in-r/
--
View this message in context:
http://r.789695.n4.nabble.com/Aggregating-data-help-tp3923138p3923397.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mai
Hi:
Here's a way using the reshape2 package.
library('reshape2')
rsub <- subset(rtest, concept %in% c('8.2.D', '8.3.A', '8.3.B'))
# want year ahead of concept in the variable list
rsub <- rsub[, c(1:4, 9, 5:8)]
cast(rsub, id + test + subject + grade + year ~ concept, value_var = 'per_corr')
# Us
Hello,
I have a dataset with student performance on a math test. There are
multiple cases for each student (identified by id) and the concept as a
variable.
> rtest
id test subject gradeconcept correct tested per_corr
year
11 83 Mathema 8 8.2.D 1
1",
> "10001", "10002", "10003", "10004"))
>
>
>
> I'd like to aggregate the data by the date. I'd like to have a table with
> the median C_lo and C_hi values grouped by date.
> I'd also like to plot these points with d
with(results, points(date, C_lo))
with(results, points(date, C_hi))
--
David.
For plyr, would it be something like: ddply(results, .(date),median,
na.rm=T)
I tried making a for loop to get the medians, but that doesn't work
either.
splitresults = split (results, results$date
ed making a for loop to get the medians, but that doesn't work either.
splitresults = split (results, results$date, drop=T)
mediann <- matrix (,seq_along(splitresults),2)
for (i in seq_along(splitresults)) {
piece <- splitresults[[i]]
mediann [i,1] <- unique(piece$date)
mediann [i,2
Hi:
This is the type of problem at which the plyr package excels. Write a
utility function that produces the plot you want using a data frame as
its input argument, and then do something like
library('plyr')
d_ply(results, .(a, b, c), plotfun)
where plotfun is a placeholder for the name of the n
split() might be useful.
On Fri, Aug 5, 2011 at 12:55 PM, Jeffrey Joh wrote:
>
>
> I aggregated my data: aggresults <-aggregate(results, by=list(results$a,
> results$b, results$c), FUN=mean, na.rm=TRUE)
>
>
>
> results has about 8000 lines of data, and aggresults has about 80 lines. I
> woul
I aggregated my data: aggresults <-aggregate(results, by=list(results$a,
results$b, results$c), FUN=mean, na.rm=TRUE)
results has about 8000 lines of data, and aggresults has about 80 lines. I
would like to create a separate variable for each of the 80 aggregates, each
containing the 100
Hi,
You can get it with "by":
foo <- function(x)c(length(x$probe), mean(x$exp))
res <- by(df[c('exp', 'probe')], df['gene'], FUN=foo)
do.call(rbind, res)
Bye,
Oscar.
--
Oscar Perpiñán Lamigueiro
Dpto. Ingeniería Eléctrica
EUITI-UPM
http://procomun.wordpress.com
El Thu, 30 Jun 2011 17:28:02
If you have a large datatable, you might consider using 'data.table'
which is better performing than 'plyr'
> x <- read.table(textConnection("Gene ProbeID
> Expression_Level
+ A 1 0.34
+ A 2 0.21
+ E 3 0
ne), median(df$exp))
gene V1V2
1A 3 0.210
2E 1 0.110
3F 2 0.685
best
iain
--- On Thu, 30/6/11, Max Mariasegaram wrote:
> From: Max Mariasegaram
> Subject: [R] aggregating data
> To: "r-help@r-project.org"
> Date: Thursday, 30 June, 2011, 8:28
> Hi,
&
ain
--- On Thu, 30/6/11, Max Mariasegaram wrote:
> From: Max Mariasegaram
> Subject: [R] aggregating data
> To: "r-help@r-project.org"
> Date: Thursday, 30 June, 2011, 8:28
> Hi,
>
> I am interested in using the cast function in R to perform
> some aggrega
Hi,
I am interested in using the cast function in R to perform some aggregation. I
did once manage to get it working, but have now forgotten how I did this. So
here is my dilemma. I have several thousands of probes (about 180,000)
corresponding to each gene; what I'd like to do is obtain is a f
Have a look at match and merge.
Hadley
On Wednesday, September 8, 2010, Michael Haenlein
wrote:
> Dear all,
>
> I'm working with two data frames.
>
> The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
> ID for each row and agg_data[,2] contains a continuous variable.
>
>
Dear all,
I'm working with two data frames.
The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
ID for each row and agg_data[,2] contains a continuous variable.
The second data frame (geo_data) consists of several columns. One of these
columns (geo_data$ZCTA) corresponds
23 matches
Mail list logo