On 04/11/14 17:42, David Winsemius wrote: > On Nov 4, 2014, at 9:16 AM, CJ Davies wrote: > >> On 04/11/14 17:02, David Winsemius wrote: >>> On Nov 4, 2014, at 8:35 AM, CJ Davies wrote: >>> >>>> On 04/11/14 16:13, PIKAL Petr wrote: >>>>> Hi >>>>> >>>>>> -----Original Message----- >>>>>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >>>>>> project.org] On Behalf Of CJ Davies >>>>>> Sent: Tuesday, November 04, 2014 2:50 PM >>>>>> To: Jim Lemon; r-help@r-project.org >>>>>> Subject: Re: [R] Variance of multiple non-contiguous time periods? >>>>>> >>>>>> On 04/11/14 09:11, Jim Lemon wrote: >>>>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote: >>>>>>>> ... >>>>>>>> On 30/10/14 21:33, Jim Lemon wrote: >>>>>>>> If I understand, you mean to calculate deviations for each >>>>>> individual >>>>>>>> 'chunk' of each transition & then aggregate the results? This is >>>>>> what >>>>>>>> I'd been thinking about, but is there a sensible manner within R to >>>>>>>> achieve this, or is it something for which it would be easier to >>>>>>>> preprocess the data in an external tool? Is there some way to subset >>>>>>> the >>>>>>>> data such that I can work over just contiguous 'chunks'? >>>>>>>> >>>>>>> Exactly. If there is some combination of existing variables that can >>>>>>> be combined to make a set of unique values for each "chunk", you can >>>>>>> calculate the deviations within each "chunk", then average the >>>>>> squared >>>>>>> deviations for each type of "chunk", weighting by the duration of the >>>>>>> "chunks" so that you don't bias the pooled variance toward the longer >>>>>>> "chunks". >>>>>>> >>>>>>> Jim >>>>>>> >>>>>> I am stumped for a way of automating this process though. Each line of >>>>>> log data looks like this; >>>>>> >>>>>> 2406 55.4 (-11.2, 1.0, -0.9) (-4.1, 1.0, 0.0) 7.077912 >>>>>> 0.9203392 (0.0, >>>>>> 0.7, -0.1, 0.7) 8.129684 89.41537 -8.212769 >>>>>> (0.0, 0.7, -0.1, >>>>>> 0.7) >>>>>> 8.129684 89.41537 351.7872 1 0 0 >>>>>> False 0.15 3 >>>>>> 37.76761 True False 0 >>>>>> transition 1 >>>>> First you need to import it to R which could be tricky based on above >>>>> line. >>>>> Some values will probably need to process through regular expression. >>>>> >>>>> If I understand correctly number after transition is a signal which >>>>> estimets continuous chunks. If it is true then >>>>> >>>>> ?rle is a function which can estimate length of chunks. >>>>> >>>>> Cheers >>>>> Petr >>>>> >>>>>> Where the last variable defines which transition is currently active. >>>>>> However to separate these data into 'chunks' would involve making a >>>>>> comparison between each line of data & the preceding line of data to >>>>>> determine whether it is part of the same contiguous 'chunk'. Is this >>>>>> something that would be better achieved using external preprocessing >>>>>> written in a language I am more familiar with, as I haven't the >>>>>> foggiest how I would approach this within R? >>>>>> >>>>>> Regards, >>>>>> CJ Davies >>>>>> >>>>>> ______________________________________________ >>> snipped >>>> Importing into R wasn't an issue; some of the fields contain spaces & >>>> symbols, but all the fields are tab separated so I can simply use; >>>> >>>> foo <- read.csv("bar",header=T,sep="\t") >>>> >>>> I've just written a hacky bit of Java that gives me the lines of each >>>> 'chunk' as a separate list & I think I'll then calculate these particular >>>> values using Java's Math class rather than trying to come up with a >>>> sensible way to import these 'chunks' back into R. When it comes to >>>> string/list manipulation like this I think my knowledge in Java & lack of >>>> knowledge in R makes the former the better option! >>>> >>> If you had offered the output of dput(head(foo, 20) ) and explained what >>> defined a "chunk-defining transition", it would have been fairly easy to >>> show you how to use cumsum in an ave() call to construct a grouping >>> variable. >>> >>> >>>> Regards, >>>> CJ Davies >>>> >>>> ______________________________ >>> >>> David Winsemius >>> Alameda, CA, USA >>> >> Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K >> >> The final value on each line, under the header "environment", is always one >> of ["real", "transition 1", "transition 2", "transition 3", "transition 4"]. >> A 'chunk-defining transition' is when this value changes. >> >> If there is a way to do this in R in a more elegant fashion than my hacky >> Java, then I would be glad to learn. > That pasted material does not appear to preserve the tabs. Input with your > suggested code "does not work" in the sense that it brings in an object like > this. > >> download.file("http://paste2.org/2LZVGP5K", "bar.txt") > trying URL 'http://paste2.org/2LZVGP5K' > Content type 'text/html; charset=UTF-8' length unknown > opened URL > .......... .......... ........ > downloaded 28 Kb > >> foo <- read.csv("bar.txt",header=T,sep="\t") >> str(foo) > 'data.frame': 2829 obs. of 1 variable: > $ X..DOCTYPE.html.: Factor w/ 669 levels ""," ",..: 106 104 219 233 > 220 222 221 215 217 79 ... > > I SAY AGAIN: > > Need ; output of dput(head(foo, 100) ) > > >> Regards, >> CJ Davies > David Winsemius > Alameda, CA, USA > That was a pastebin URI, so what you downloaded was HTML instead of raw text. This is the raw text;
http://cjdavies.org/foo Regards, CJ Davies ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.