Stefano,
I see you already have an answer that works for you.
Sometimes you want to step back and see if some modification makes a problem
easier to solve.
I often simply switch to using tools in the tidyverse such as dplyr for parts
of the job albeit much of the same can be done using functio
Sofia
Oggetto: Re: [R] aggregating data with quality control
[Non ricevi spesso messaggi di posta elettronica da ikry...@disroot.org. Per
informazioni sull'importanza di questo fatto, visita
https://aka.ms/LearnAboutSenderIdentification.]
� Sat, 31 Aug 2024 11:15:10 +
Ste
Às 12:15 de 31/08/2024, Stefano Sofia escreveu:
Dear R-list users,
I deal with semi-hourly data from automatic meteorological stations.
They have to pass a manual validation; suppose that status = "C" stands for correct and
status = "D" for discarded.
Here a simple example with "Snow height"
В Sat, 31 Aug 2024 11:15:10 +
Stefano Sofia пишет:
> Evaluating the daily mean indipendently from the status is very easy:
>
> aggregate(mydf$hs, by=list(format(mydf$data_POSIX, "%Y"),
> format(mydf$data_POSIX, "%m"), format(mydf$data_POSIX, "%d")),
> my.mean)
>
>
> Things become more comp
check this out
http://www.r-bloggers.com/pivot-tables-in-r/
--
View this message in context:
http://r.789695.n4.nabble.com/Aggregating-data-help-tp3923138p3923397.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mai
Hi:
Here's a way using the reshape2 package.
library('reshape2')
rsub <- subset(rtest, concept %in% c('8.2.D', '8.3.A', '8.3.B'))
# want year ahead of concept in the variable list
rsub <- rsub[, c(1:4, 9, 5:8)]
cast(rsub, id + test + subject + grade + year ~ concept, value_var = 'per_corr')
# Us
1",
> "10001", "10002", "10003", "10004"))
>
>
>
> I'd like to aggregate the data by the date. I'd like to have a table with
> the median C_lo and C_hi values grouped by date.
> I'd also like to plot these points with d
with(results, points(date, C_lo))
with(results, points(date, C_hi))
--
David.
For plyr, would it be something like: ddply(results, .(date),median,
na.rm=T)
I tried making a for loop to get the medians, but that doesn't work
either.
splitresults = split (results, results$date
ed making a for loop to get the medians, but that doesn't work either.
splitresults = split (results, results$date, drop=T)
mediann <- matrix (,seq_along(splitresults),2)
for (i in seq_along(splitresults)) {
piece <- splitresults[[i]]
mediann [i,1] <- unique(piece$date)
mediann [i,2
Hi:
This is the type of problem at which the plyr package excels. Write a
utility function that produces the plot you want using a data frame as
its input argument, and then do something like
library('plyr')
d_ply(results, .(a, b, c), plotfun)
where plotfun is a placeholder for the name of the n
split() might be useful.
On Fri, Aug 5, 2011 at 12:55 PM, Jeffrey Joh wrote:
>
>
> I aggregated my data: aggresults <-aggregate(results, by=list(results$a,
> results$b, results$c), FUN=mean, na.rm=TRUE)
>
>
>
> results has about 8000 lines of data, and aggresults has about 80 lines. I
> woul
Hi,
You can get it with "by":
foo <- function(x)c(length(x$probe), mean(x$exp))
res <- by(df[c('exp', 'probe')], df['gene'], FUN=foo)
do.call(rbind, res)
Bye,
Oscar.
--
Oscar Perpiñán Lamigueiro
Dpto. Ingeniería Eléctrica
EUITI-UPM
http://procomun.wordpress.com
El Thu, 30 Jun 2011 17:28:02
If you have a large datatable, you might consider using 'data.table'
which is better performing than 'plyr'
> x <- read.table(textConnection("Gene ProbeID
> Expression_Level
+ A 1 0.34
+ A 2 0.21
+ E 3 0
oops last reply was only half the solution:
library(plyr)
df <- data.frame(gene=c('A', 'A', 'E', 'A', 'F', 'F'), probe = c(1,2,3,4,5,6),
exp = c(0.34, 0.21, 0.11, 0.21, 0.56, 0.81))
ddply(df, .(gene), function(df)c(length(df$gene), median(df$exp))
gene V1V2
1A 3 0.210
2E 1 0.110
Hi Max
Using plyr instead of rehsape:
library(plyr)
df <- data.frame(gene=c('A', 'A', 'E', 'A', 'F', 'F'), probe = c(1,2,3,4,5,6))
ddply(df, .(gene), function(df)length(df$gene))
gene V1
1A 3
2E 1
3F 2
best
iain
--- On Thu, 30/6/11, Max Mariasegaram wrote:
> From: Max Mar
Have a look at match and merge.
Hadley
On Wednesday, September 8, 2010, Michael Haenlein
wrote:
> Dear all,
>
> I'm working with two data frames.
>
> The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
> ID for each row and agg_data[,2] contains a continuous variable.
>
>
16 matches
Mail list logo