On 04/11/14 17:02, David Winsemius wrote:
On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
On 04/11/14 16:13, PIKAL Petr wrote:
Hi
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of CJ Davies
Sent: Tuesday, November 04, 2014 2:50 PM
To: Jim Lemon; r-help@r-project.org
Subject: Re: [R] Variance of multiple non-contiguous time periods?
On 04/11/14 09:11, Jim Lemon wrote:
On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
...
On 30/10/14 21:33, Jim Lemon wrote:
If I understand, you mean to calculate deviations for each
individual
'chunk' of each transition & then aggregate the results? This is
what
I'd been thinking about, but is there a sensible manner within R to
achieve this, or is it something for which it would be easier to
preprocess the data in an external tool? Is there some way to subset
the
data such that I can work over just contiguous 'chunks'?
Exactly. If there is some combination of existing variables that can
be combined to make a set of unique values for each "chunk", you can
calculate the deviations within each "chunk", then average the
squared
deviations for each type of "chunk", weighting by the duration of the
"chunks" so that you don't bias the pooled variance toward the longer
"chunks".
Jim
I am stumped for a way of automating this process though. Each line of
log data looks like this;
2406 55.4 (-11.2, 1.0, -0.9) (-4.1, 1.0, 0.0) 7.077912
0.9203392 (0.0,
0.7, -0.1, 0.7) 8.129684 89.41537 -8.212769 (0.0,
0.7, -0.1,
0.7)
8.129684 89.41537 351.7872 1 0 0 False
0.15 3
37.76761 True False 0
transition 1
First you need to import it to R which could be tricky based on above line.
Some values will probably need to process through regular expression.
If I understand correctly number after transition is a signal which estimets
continuous chunks. If it is true then
?rle is a function which can estimate length of chunks.
Cheers
Petr
Where the last variable defines which transition is currently active.
However to separate these data into 'chunks' would involve making a
comparison between each line of data & the preceding line of data to
determine whether it is part of the same contiguous 'chunk'. Is this
something that would be better achieved using external preprocessing
written in a language I am more familiar with, as I haven't the
foggiest how I would approach this within R?
Regards,
CJ Davies
______________________________________________
snipped
Importing into R wasn't an issue; some of the fields contain spaces & symbols,
but all the fields are tab separated so I can simply use;
foo <- read.csv("bar",header=T,sep="\t")
I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a
separate list & I think I'll then calculate these particular values using Java's
Math class rather than trying to come up with a sensible way to import these 'chunks'
back into R. When it comes to string/list manipulation like this I think my knowledge
in Java & lack of knowledge in R makes the former the better option!
If you had offered the output of dput(head(foo, 20) ) and explained what defined a
"chunk-defining transition", it would have been fairly easy to show you how to
use cumsum in an ave() call to construct a grouping variable.
Regards,
CJ Davies
______________________________
David Winsemius
Alameda, CA, USA
Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K
The final value on each line, under the header "environment", is always
one of ["real", "transition 1", "transition 2", "transition 3",
"transition 4"]. A 'chunk-defining transition' is when this value changes.
If there is a way to do this in R in a more elegant fashion than my
hacky Java, then I would be glad to learn.
Regards,
CJ Davies
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.