Thanks Hadley for Your interest, below some code without environments use 
(using timeSeries); I also made some experiments with .parallel = TRUE in daply 
to crate timeSeries objects and then bind them together but I have some 
problems.

Thank You in advance,
Daniele Amberti

set.seed(123)
N <- 10000
X <- data.frame(
  ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
  DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4)),
  VALUE = runif(N*4), stringsAsFactors = FALSE)
X <- X[sample(1:(N*4), N*4),]
str(X)
head(X)

#define a variable in global env
ATS <- NULL

buildTimeSeriesFromDataFrame <- function(x)
{
  library(timeSeries)
  if(!is.null(ATS)) # in global env
  {
    # assign in global env
    ATS <<- cbind(ATS,
      timeSeries(x$VALUE, x$DATE,
        format = '%Y-%m-%d %H:%M:%S',
        zone = 'GMT', units = as.character(x$ID[1])))
  } else
  {
    # assign in global env
    ATS <<- timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
      zone = 'GMT', units = as.character(x$ID[1]))
  }
  return(TRUE)
}

tsDaply <- function(...)
{
  # assign in global env, to clean previous run
  ATS <<- NULL
  library(plyr)
  res <- daply(X, "ID", buildTimeSeriesFromDataFrame)
  return(res)
}

tsDaply(X, X$ID)
head(ATS)

#performance tests
Time <- replicate(100,
  system.time(tsDaply(X, X$ID))[[1]])
median(Time)
hist(Time)

###
#some multithread tests:
###

library(doSMP)
w <- startWorkers(workerCount = 2)
registerDoSMP(w)

# do not cbint ts, just create
buildTimeSeriesFromDataFrame2 <- function(x)
{
  library(timeSeries  )
  xx <- timeSeries:::timeSeries(x$VALUE, x$DATE,
    format = '%Y-%m-%d %H:%M:%S',
    zone = 'GMT', units = as.character(x$ID[1]))
  return(xx)
}

#tsDaply2 <- function(...)
#{
#  library(plyr)
#  res <- daply(X, "ID", buildTimeSeriesFromDataFrame2, .parallel = TRUE)
#  return(res)
#}

# tsDaply2 .parallel = TRUE return error:
#Error in do.ply(i) : task 4 failed - "subscript out of bounds"
#In addition: Warning messages:
#1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'
#2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'


tsDaply2 <- function(...)
{
  library(plyr)
  res <- daply(X, "ID", buildTimeSeriesFromDataFrame2, .parallel = FALSE)
  return(res)
}
# tsDaply2 .parallel = FALSE work but list discart timeSeries class

# bind after ts creation
res <- tsDaply2(X, X$ID)
# list is not a timeSeries object
str(cbind(t(res)))
res <- as.timeSeries(cbind(t(res)))

stopWorkers(w)


-----Original Message-----
From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley 
Wickham
Sent: 14 March 2011 12:48
To: Daniele Amberti
Cc: r-help@r-project.org
Subject: Re: [R] dataframe to a timeseries object - [ ] Message is from an 
unknown sender

Well, I'd start by removing all explicit use of environments, which
makes you code very hard to follow.

Hadley

On Monday, March 14, 2011, Daniele Amberti <daniele.ambe...@ors.it> wrote:
> I found that plyr:::daply is more efficient than base:::by (am I doing 
> something wrong?), below updated code for comparison (I also fixed a couple 
> things).
> Function daply from plyr package has also a .parallel argument and I wonder 
> if creating timeseries objects in parallel and then combining them would be 
> faster (Windows XP platform); does someone has experience with this topic? I 
> found only very simple examples about plyr and parallel computations and I do 
> not have a working example for such kind of implementation (daply that return 
> a list of timeseries objects).
>
> Thanks in advance,
> Daniele Amberti
>
>
> set.seed(123)
>
> N <- 10000
> X <- data.frame(
>   ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
>   DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4)),
>   VALUE = runif(N*4), stringsAsFactors = FALSE)
> X <- X[sample(1:(N*4), N*4),]
> str(X)
>
> library(timeSeries)
> buildTimeSeriesFromDataFrame <- function(x, env)
> {
>   {
>     if(exists("xx", envir = env))
>       assign("xx",
>         cbind(get("xx", env), timeSeries(x$VALUE, x$DATE,
>           format = '%Y-%m-%d %H:%M:%S',
>           zone = 'GMT', units = as.character(x$ID[1]))),
>         envir = env)
>     else
>       assign("xx",
>         timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
>           zone = 'GMT', units = as.character(x$ID[1])),
>         envir = env)
>
>     return(TRUE)
>   }
> }
>
> tsBy <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- by(X, X$ID, buildTimeSeriesFromDataFrame,
>       env = e1, simplify = TRUE)
>   return(get("xx", e1))
> }
>
> Time01 <- replicate(100,
>   system.time(tsBy(X, X$ID, simplify = TRUE))[[1]])
> median(Time01)
> hist(Time01)
> ATS <- tsBy(X, X$ID, simplify = TRUE)
>
>
> library(xts)
> buildXtsFromDataFrame <- function(x, env)
> {
>   {
>     if(exists("xx", envir = env))
>       assign("xx",
>         cbind(get("xx", env), xts(x$VALUE,
>           as.POSIXct(x$DATE, tz = "GMT",
>             format = '%Y-%m-%d %H:%M:%S'),
>           tzone = 'GMT')),
>         envir = env)
>     else
>       assign("xx",
>         xts(x$VALUE, as.POSIXct(x$DATE, tz = "GMT",
>             format = '%Y-%m-%d %H:%M:%S'),
>           tzone = 'GMT'),
>         envir = env)
>
>     return(TRUE)
>   }
> }
>
> xtsBy <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- by(X, X$ID, buildXtsFromDataFrame,
>       env = e1, simplify = TRUE)
>   return(get("xx", e1))
> }
>
> Time02 <- replicate(100,
>   system.time(xtsBy(X, X$ID,simplify = TRUE))[[1]])
> median(Time02)
> hist(Time02)
> AXTS <- xtsBy(X, X$ID, simplify = TRUE)
>
> plot(density(Time02), col = "red",
>   xlim = c(min(c(Time02, Time01)), max(c(Time02, Time01))))
> lines(density(Time01), col = "blue")
> #check equal, a still a problem with names
> AXTS2 <- as.timeSeries(AXTS)
> names(AXTS2) <- names(ATS)
> identical(getDataPart(ATS), getDataPart(AXTS2))
> identical(time(ATS), time(AXTS2))
>
> # with plyr library and daply instead of by:
> library(plyr)
>
> tsDaply <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- daply(X, "ID", buildTimeSeriesFromDataFrame,
>       env = e1)
>   return(get("xx", e1))
> }
>
> Time03 <- replicate(100,
>   system.time(tsDaply(X, X$ID))[[1]])
> median(Time03)
> hist(Time03)
>
> xtsDaply <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- daply(X, "ID", buildXtsFromDataFrame,
>       env = e1)
>   return(get("xx", e1))
> }
>
> Time04 <- replicate(100,
>   system.time(xtsDaply(X, X$ID))[[1]])
>
> median(Time04)
> hist(Time04)
>
> plot(density(Time04), col = "red",
>   xlim = c(
>     min(c(Time02, Time01, Time03, Time04)),
>     max(c(Time02, Time01, Time03, Time04))),
>   ylim = c(0,100))
> lines(density(Time03), col = "blue")
> lines(density(Time02))
> lines(density(Time01))
>
>
>
>
>
> -----Original Message-----
> From: Daniele Amberti
> Sent: 11 March 2011 14:44
> To: r-help@r-project.org
> Subject: dataframe to a timeseries object
>
> I'm wondering which is the most efficient (time, than memory usage) way to 
> obtain a multivariate time series object from a data frame (the easiest data 
> structure to get data from a database trough RODBC).
> I have a starting point using timeSeries or xts library (these libraries can 
> handle time zones), below you can find code to test.
> Merging parallelization (cbind) is something I'm thinking at (suggestions 
> from users with experience on this topic is highly appreciated), any 
> suggestion is welcome.
> My platform is Windows XP, R 2.12.1, latest available packages on CRAN for 
> timeSeries and xts.
>
>
> set.seed(123)
>
> N <- 9000
> X <- data.frame(
>   ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
>   DATE = rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4),
>   VALUE = runif(N*4))
>
> library(timeSeries)
> buildTimeSeriesFromDataFrame <- function(x, env)
> {
>   {
>     if(exists("xx", envir = env))
>       assign("xx",
>         cbind(get("xx", env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d 
> %H:%M:%S',
>           zone = 'GMT', units = as.character(x$ID[1]))),
>         envir = env)
>     else
>       assign("xx",
>         timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
>           zone = 'GMT', units = as.character(x$ID[1])),
>         envir = env)
>
>     return(TRUE)
>   }
> }
>
>
> fooBy <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- by(X, X$ID, buildTimeSeriesFromDataFrame,
>       env = e1, simplify = TRUE)
>   return(get("xx", e1))
> }
>
> Time01 <- replicate(100,
>   system.time(fooBy(X,
>     X$ID, buildTimeSeriesFromDataFrame,
>     simplify = TRUE))[[1]])
>
> median(Time01)
> hist(Time01)
>
> library(xts)
>
> buildXtsFromDataFrame <- function(x, env)
> {
>   {
>     if(exists("xx", envir = env))
>       assign("xx",
>         cbind(get("xx", env), xts(x$VALUE,
>           as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'),
>           tzone = 'GMT')),
>         envir = env)
>     else
>       assign("xx",
>         xts(x$VALUE, as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'),
>           tzone = 'GMT'),
>         envir = env)
>
>     return(TRUE)
>   }
> }
>
> fooBy <- function(...)
> {
>   e1 <- new.env(parent = baseenv())
>   res <- by(X, X$ID, buildXtsFromDataFrame,
>       env = e1, simplify = TRUE)
>   return(get("xx", e1))
> }
>
> Time02 <- replicate(100,
>   system.time(fooBy(X,
>     X$ID, buildTimeSeriesFromDataFrame,
>     simplify = TRUE))[[1]])
>
> median(Time02)
> hist(Time02)
>
> plot(density(Time02), xlim = c(min(c(Time02, Time01)), max(c(Time02, 
> Time01))))
> lines(density(Time01))
>
>
> Best regards,
> Daniele Amberti
>
> ORS Srl
>
> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
> Tel. +39 0173 620211
> Fax. +39 0173 620299 / +39 0173 433111
> Web Site www.ors.it
>
> ------------------------------------------------------------------------------------------------------------------------
> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati 
> è vietato e potrebbe costituire reato.
> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
> provvedesse alla distruzione dello stesso
> e degli eventuali allegati.
> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
> siano relative alle attività e/o
> alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
> società stessa, né la impegnano in alcun modo.
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
/

ORS Srl

Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
Tel. +39 0173 620211
Fax. +39 0173 620299 / +39 0173 433111
Web Site www.ors.it

------------------------------------------------------------------------------------------------------------------------
Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati è 
vietato e potrebbe costituire reato.
Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
provvedesse alla distruzione dello stesso
e degli eventuali allegati.
Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
siano relative alle attività e/o
alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
società stessa, né la impegnano in alcun modo.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to