Re: [R] Variance of multiple non-contiguous time periods?

CJ Davies Tue, 04 Nov 2014 09:24:30 -0800

On 04/11/14 17:02, David Winsemius wrote:


On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:

On 04/11/14 16:13, PIKAL Petr wrote:

Hi

-----Original Message-----
From: [email protected] [mailto:r-help-bounces@r-
project.org] On Behalf Of CJ Davies
Sent: Tuesday, November 04, 2014 2:50 PM
To: Jim Lemon; [email protected]
Subject: Re: [R] Variance of multiple non-contiguous time periods?

On 04/11/14 09:11, Jim Lemon wrote:

On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:

...
On 30/10/14 21:33, Jim Lemon wrote:
If I understand, you mean to calculate deviations for each

individual

'chunk' of each transition & then aggregate the results? This is

what

I'd been thinking about, but is there a sensible manner within R to
achieve this, or is it something for which it would be easier to
preprocess the data in an external tool? Is there some way to subset

the

data such that I can work over just contiguous 'chunks'?

Exactly. If there is some combination of existing variables that can
be combined to make a set of unique values for each "chunk", you can
calculate the deviations within each "chunk", then average the

squared

deviations for each type of "chunk", weighting by the duration of the
"chunks" so that you don't bias the pooled variance toward the longer
"chunks".

Jim


I am stumped for a way of automating this process though. Each line of
log data looks like this;

2406  55.4    (-11.2, 1.0, -0.9)      (-4.1, 1.0, 0.0)        7.077912
       0.9203392       (0.0,
0.7, -0.1, 0.7)       8.129684        89.41537        -8.212769       (0.0, 
0.7, -0.1,
0.7)
8.129684      89.41537        351.7872        1       0       0       False   
0.15    3
       37.76761        True    False   0
transition 1


First you need to import it to R which could be tricky based on above line.
Some values will probably need to process through regular expression.

If I understand correctly number after transition is a signal which estimets 
continuous chunks. If it is true then

?rle is a function which can estimate length of chunks.

Cheers
Petr


Where the last variable defines which transition is currently active.
However to separate these data into 'chunks' would involve making a
comparison between each line of data & the preceding line of data to
determine whether it is part of the same contiguous 'chunk'. Is this
something that would be better achieved using external preprocessing
written in a language I am more familiar with, as I haven't the
foggiest how I would approach this within R?

Regards,
CJ Davies

______________________________________________

snipped


Importing into R wasn't an issue; some of the fields contain spaces & symbols, 
but all the fields are tab separated so I can simply use;

foo <- read.csv("bar",header=T,sep="\t")

I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a 
separate list & I think I'll then calculate these particular values using Java's 
Math class rather than trying to come up with a sensible way to import these 'chunks' 
back into R. When it comes to string/list manipulation like this I think my knowledge 
in Java & lack of knowledge in R makes the former the better option!


If you had offered the output of dput(head(foo, 20) ) and explained what defined a 
"chunk-defining transition", it would have been fairly easy to show you how to 
use cumsum in an ave() call to construct a grouping variable.

Regards,
CJ Davies

______________________________



David Winsemius
Alameda, CA, USA


Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K

The final value on each line, under the header "environment", is alwaysone of ["real", "transition 1", "transition 2", "transition 3","transition 4"]. A 'chunk-defining transition' is when this value changes.

If there is a way to do this in R in a more elegant fashion than myhacky Java, then I would be glad to learn.


Regards,
CJ Davies

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance of multiple non-contiguous time periods?

Reply via email to