Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread David Terk
I've isolated the bug.  When the seg fault was produced there was an error
that memory had not been mapped.  Here is the odd part of the bug.  If you
comment out certain code and get a full run than comment in the code which
is causing the problem it will actually run.   So I think it is safe to
assume something wrong is taking place with memory allocation.  Example.
While testing, I have been able to get to a point where the code will run.
But if I reboot the machine and try again, the code will not run.

The bug itself is happening somewhere in XTS or ZOO.  I will gladly upload
the data files.  It is happening on the 10th data file which is only 225k
lines in size.

Below is the simplified code.  The call to either

dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))

is what is causing R to hang or crash.  I have been able to replicate this
on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to consistently
replicate from R Studio.

The code below will consistently replicate when the appropriate files are
used.

parseTickDataFromDir = function(tickerDir, per, subper) {
  tickerAbsFilenames = list.files(tickerDir,full.names=T)
  tickerNames = list.files(tickerDir,full.names=F)
  tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
  pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3)
  
  for(i in 1:length(tickerAbsFilenames)) {
dat.i = parseTickData(tickerAbsFilenames[i])
dates <- unique(substr(as.character(index(dat.i)), 1,10))
times <- rep("09:30:00", length(dates))
openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
templateTimes <- NULL

for (j in 1:length(openDateTimes)) {
  if (is.null(templateTimes)) {
templateTimes <- openDateTimes[j] + 0:23400
  } else {
templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
  }
}

templateTimes <- as.xts(templateTimes)
dat.i <- merge(dat.i, templateTimes, all=T)
if (is.na(dat.i[1])) {
  dat.i[1] <- -1
}
dat.i <- na.locf(dat.i)
dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
index(dat.i) <- index(to.period(templateTimes, period=per,
k=subper))
setTxtProgressBar(pb, i)
  }
  close(pb)
}

parseTickData <- function(inputFile) {
  DAT.list <- scan(file=inputFile,
sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
  index <- as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
%H:%M:%S")
  DAT.xts <- xts(DAT.list$Close,index)
  DAT.xts <- make.index.unique(DAT.xts)
  return(DAT.xts)
}

DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Sunday, July 22, 2012 4:48 PM
To: David Terk
Cc: r-devel@r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

On 12-07-22 3:54 PM, David Terk wrote:
> I am reading several hundred files.  Anywhere from 50k-400k in size.  
> It appears that when I read these files with R 2.15.1 the process will 
> hang or seg fault on the scan() call.  This does not happen on R 2.14.1.

The code below doesn't do anything other than define a couple of functions.
Please simplify it to code that creates a file (or multiple files), reads it
or them, and shows a bug.

If you can't do that, then gradually add the rest of the stuff from these
functions into the mix until you figure out what is really causing the bug.

If you don't post code that allows us to reproduce the crash, it's really
unlikely that we'll be able to fix it.

Duncan Murdoch

>
>
>
> This is happening on the precise build of Ubuntu.
>
>
>
> I have included everything, but the issue appears to be when 
> performing the scan in the method parseTickData.
>
>
>
> Below is the code.  Hopefully this is the right place to post.
>
>
>
> parseTickDataFromDir = function(tickerDir, per, subper, fun) {
>
>tickerAbsFilenames = list.files(tickerDir,full.names=T)
>
>tickerNames = list.files(tickerDir,full.names=F)
>
>tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>
>pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
> style = 3)
>
>
>
>for(i in 1:length(tickerAbsFilenames)) {
>
>
>
>  # Grab Raw Tick Data
>
>  dat.i = parseTickData(tickerAbsFilenames[i])
>
>  #Sys.sleep(1)
>
>  # Create Template
>
>  dates <- unique(substr(as.character(index(dat.i)), 1,10))
>
>  times <- rep("09:30:00", length(dates))
>
>  openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
>
>  templateTimes <- NULL
>
>
>
>  for (j in 1:length(openDateTimes)) {
>
>if (is.null(templateTimes)) {
>
>  templateTimes <- openDateTimes[j] + 0:23400
>
>} else {
>
>  templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>
>}
>
>  }
>
>
>
>  # Convert templateTimes to XTS, merge wit

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread David Terk
Looks like the call to:

dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)

If what is causing the issue.  If variable name is not set, or set to any
value other than NULL.  Than no hang occurs.  

-Original Message-
From: David Terk [mailto:david.t...@gmail.com] 
Sent: Monday, July 23, 2012 1:25 AM
To: 'Duncan Murdoch'
Cc: 'r-devel@r-project.org'
Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

I've isolated the bug.  When the seg fault was produced there was an error
that memory had not been mapped.  Here is the odd part of the bug.  If you
comment out certain code and get a full run than comment in the code which
is causing the problem it will actually run.   So I think it is safe to
assume something wrong is taking place with memory allocation.  Example.
While testing, I have been able to get to a point where the code will run.
But if I reboot the machine and try again, the code will not run.

The bug itself is happening somewhere in XTS or ZOO.  I will gladly upload
the data files.  It is happening on the 10th data file which is only 225k
lines in size.

Below is the simplified code.  The call to either

dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))

is what is causing R to hang or crash.  I have been able to replicate this
on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to consistently
replicate from R Studio.

The code below will consistently replicate when the appropriate files are
used.

parseTickDataFromDir = function(tickerDir, per, subper) {
  tickerAbsFilenames = list.files(tickerDir,full.names=T)
  tickerNames = list.files(tickerDir,full.names=F)
  tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
  pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3)
  
  for(i in 1:length(tickerAbsFilenames)) {
dat.i = parseTickData(tickerAbsFilenames[i])
dates <- unique(substr(as.character(index(dat.i)), 1,10))
times <- rep("09:30:00", length(dates))
openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
templateTimes <- NULL

for (j in 1:length(openDateTimes)) {
  if (is.null(templateTimes)) {
templateTimes <- openDateTimes[j] + 0:23400
  } else {
templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
  }
}

templateTimes <- as.xts(templateTimes)
dat.i <- merge(dat.i, templateTimes, all=T)
if (is.na(dat.i[1])) {
  dat.i[1] <- -1
}
dat.i <- na.locf(dat.i)
dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
index(dat.i) <- index(to.period(templateTimes, period=per,
k=subper))
setTxtProgressBar(pb, i)
  }
  close(pb)
}

parseTickData <- function(inputFile) {
  DAT.list <- scan(file=inputFile,
sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
  index <- as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
%H:%M:%S")
  DAT.xts <- xts(DAT.list$Close,index)
  DAT.xts <- make.index.unique(DAT.xts)
  return(DAT.xts)
}

DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Sunday, July 22, 2012 4:48 PM
To: David Terk
Cc: r-devel@r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

On 12-07-22 3:54 PM, David Terk wrote:
> I am reading several hundred files.  Anywhere from 50k-400k in size.  
> It appears that when I read these files with R 2.15.1 the process will 
> hang or seg fault on the scan() call.  This does not happen on R 2.14.1.

The code below doesn't do anything other than define a couple of functions.
Please simplify it to code that creates a file (or multiple files), reads it
or them, and shows a bug.

If you can't do that, then gradually add the rest of the stuff from these
functions into the mix until you figure out what is really causing the bug.

If you don't post code that allows us to reproduce the crash, it's really
unlikely that we'll be able to fix it.

Duncan Murdoch

>
>
>
> This is happening on the precise build of Ubuntu.
>
>
>
> I have included everything, but the issue appears to be when 
> performing the scan in the method parseTickData.
>
>
>
> Below is the code.  Hopefully this is the right place to post.
>
>
>
> parseTickDataFromDir = function(tickerDir, per, subper, fun) {
>
>tickerAbsFilenames = list.files(tickerDir,full.names=T)
>
>tickerNames = list.files(tickerDir,full.names=F)
>
>tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>
>pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
> style = 3)
>
>
>
>for(i in 1:length(tickerAbsFilenames)) {
>
>
>
>  # Grab Raw Tick Data
>
>  dat.i = parseTickData(tickerAbsFilenames[i])
>
>  #Sys.sleep(1)
>
>  # Create Template
>
>  dates <- unique(substr(as.character(index(dat.i)), 1,10))
>
>

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread Joshua Ulrich
David,

You still haven't provided a reproducible example.  As Duncan already
said, "if you don't post code that allows us to reproduce the crash,
it's really unlikely that we'll be able to fix it."

And R-devel is not the appropriate venue to discuss this if it's truly
an issue with xts/zoo.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 12:41 AM, David Terk  wrote:
> Looks like the call to:
>
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>
> If what is causing the issue.  If variable name is not set, or set to any
> value other than NULL.  Than no hang occurs.
>
> -Original Message-
> From: David Terk [mailto:david.t...@gmail.com]
> Sent: Monday, July 23, 2012 1:25 AM
> To: 'Duncan Murdoch'
> Cc: 'r-devel@r-project.org'
> Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug
> in R 2.15.1 64-bit Ubuntu
>
> I've isolated the bug.  When the seg fault was produced there was an error
> that memory had not been mapped.  Here is the odd part of the bug.  If you
> comment out certain code and get a full run than comment in the code which
> is causing the problem it will actually run.   So I think it is safe to
> assume something wrong is taking place with memory allocation.  Example.
> While testing, I have been able to get to a point where the code will run.
> But if I reboot the machine and try again, the code will not run.
>
> The bug itself is happening somewhere in XTS or ZOO.  I will gladly upload
> the data files.  It is happening on the 10th data file which is only 225k
> lines in size.
>
> Below is the simplified code.  The call to either
>
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>
> is what is causing R to hang or crash.  I have been able to replicate this
> on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to consistently
> replicate from R Studio.
>
> The code below will consistently replicate when the appropriate files are
> used.
>
> parseTickDataFromDir = function(tickerDir, per, subper) {
>   tickerAbsFilenames = list.files(tickerDir,full.names=T)
>   tickerNames = list.files(tickerDir,full.names=F)
>   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3)
>
>   for(i in 1:length(tickerAbsFilenames)) {
> dat.i = parseTickData(tickerAbsFilenames[i])
> dates <- unique(substr(as.character(index(dat.i)), 1,10))
> times <- rep("09:30:00", length(dates))
> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
> templateTimes <- NULL
>
> for (j in 1:length(openDateTimes)) {
>   if (is.null(templateTimes)) {
> templateTimes <- openDateTimes[j] + 0:23400
>   } else {
> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>   }
> }
>
> templateTimes <- as.xts(templateTimes)
> dat.i <- merge(dat.i, templateTimes, all=T)
> if (is.na(dat.i[1])) {
>   dat.i[1] <- -1
> }
> dat.i <- na.locf(dat.i)
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
> index(dat.i) <- index(to.period(templateTimes, period=per,
> k=subper))
> setTxtProgressBar(pb, i)
>   }
>   close(pb)
> }
>
> parseTickData <- function(inputFile) {
>   DAT.list <- scan(file=inputFile,
> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>   index <- as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
> %H:%M:%S")
>   DAT.xts <- xts(DAT.list$Close,index)
>   DAT.xts <- make.index.unique(DAT.xts)
>   return(DAT.xts)
> }
>
> DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)
>
> -Original Message-
> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
> Sent: Sunday, July 22, 2012 4:48 PM
> To: David Terk
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
> in R 2.15.1 64-bit Ubuntu
>
> On 12-07-22 3:54 PM, David Terk wrote:
>> I am reading several hundred files.  Anywhere from 50k-400k in size.
>> It appears that when I read these files with R 2.15.1 the process will
>> hang or seg fault on the scan() call.  This does not happen on R 2.14.1.
>
> The code below doesn't do anything other than define a couple of functions.
> Please simplify it to code that creates a file (or multiple files), reads it
> or them, and shows a bug.
>
> If you can't do that, then gradually add the rest of the stuff from these
> functions into the mix until you figure out what is really causing the bug.
>
> If you don't post code that allows us to reproduce the crash, it's really
> unlikely that we'll be able to fix it.
>
> Duncan Murdoch
>
>>
>>
>>
>> This is happening on the precise build of Ubuntu.
>>
>>
>>
>> I have included everything, but the issue appears to be when
>> performing the scan in the method parseTickData.
>>
>>
>>
>> Below is the code.  Hopefully 

[Rd] duplicated() variation that goes both ways to capture all duplicates

2012-07-23 Thread Liviu Andronic
Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.

To take the example from the man page:
> data(iris)
> iris[duplicated(iris), ]  ##duplicates while searching "fromFirst"
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143  5.8 2.7  5.1 1.9 virginica
> iris[duplicated(iris, fromLast=T), ]  ##duplicates while searching "fromLast"
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica


To extract all the copies of the concerned items ("original" and
duplicates) one would need to do something like this:
> iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ]  ##duplicates while 
> searching "bothWays"
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


Unfortunately this is unnecessarily long and convoluted. Short of a
'bothWays' argument in duplicated(), I came up with a small wrapper
that simplifies the above:
duplicated2 <-
function(x, bothWays=TRUE, ...)
{
if(!bothWays) {
return(duplicated(x, ...))
} else if(bothWays) {
return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...)))
}
}


Now the above can be achieved simply via:
> iris[duplicated2(iris), ]  ##duplicates while searching "bothWays"
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


So here's my inquiry: Would the R Core consider adding such
functionality in 'base' R? Either the---suitably cleaned
up---duplicated2() function above, or a "bothWays" argument in
duplicated() itself? Either of the two would improve user convenience
and reduce confusion. (In my case it took some time before I
understood the correct approach to this problem.)

Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated() variation that goes both ways to capture all duplicates

2012-07-23 Thread Duncan Murdoch

On 23/07/2012 8:49 AM, Liviu Andronic wrote:

Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.

To take the example from the man page:
> data(iris)
> iris[duplicated(iris), ]  ##duplicates while searching "fromFirst"
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143  5.8 2.7  5.1 1.9 virginica
> iris[duplicated(iris, fromLast=T), ]  ##duplicates while searching "fromLast"
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica


To extract all the copies of the concerned items ("original" and
duplicates) one would need to do something like this:
> iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ]  ##duplicates while searching 
"bothWays"
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


Unfortunately this is unnecessarily long and convoluted. Short of a
'bothWays' argument in duplicated(), I came up with a small wrapper
that simplifies the above:
duplicated2 <-
 function(x, bothWays=TRUE, ...)
 {
 if(!bothWays) {
 return(duplicated(x, ...))
 } else if(bothWays) {
 return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, 
...)))
 }
 }


Now the above can be achieved simply via:
> iris[duplicated2(iris), ]  ##duplicates while searching "bothWays"
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


So here's my inquiry: Would the R Core consider adding such
functionality in 'base' R? Either the---suitably cleaned
up---duplicated2() function above, or a "bothWays" argument in
duplicated() itself? Either of the two would improve user convenience
and reduce confusion. (In my case it took some time before I
understood the correct approach to this problem.)


I can't speak for all of R core, but I don't see the need for this in 
base R -- your solution looks fine to me.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread Joshua Ulrich
Well, you still haven't convinced anyone but yourself that it's
definitely an xts problem, since you have not provided any
reproducible example...
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 8:14 AM, David Terk  wrote:
> Where should this be discussed since it is definitely XTS related?  I will
> gladly upload the simplified script + data files to whoever is maintaining
> this part of the code.  Fortunately there is a workaround here.
>
> -Original Message-
> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
> Sent: Monday, July 23, 2012 8:15 AM
> To: David Terk
> Cc: Duncan Murdoch; r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
> in R 2.15.1 64-bit Ubuntu
>
> David,
>
> You still haven't provided a reproducible example.  As Duncan already said,
> "if you don't post code that allows us to reproduce the crash, it's really
> unlikely that we'll be able to fix it."
>
> And R-devel is not the appropriate venue to discuss this if it's truly an
> issue with xts/zoo.
>
> Best,
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
>
>
> On Mon, Jul 23, 2012 at 12:41 AM, David Terk  wrote:
>> Looks like the call to:
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>
>> If what is causing the issue.  If variable name is not set, or set to
>> any value other than NULL.  Than no hang occurs.
>>
>> -Original Message-
>> From: David Terk [mailto:david.t...@gmail.com]
>> Sent: Monday, July 23, 2012 1:25 AM
>> To: 'Duncan Murdoch'
>> Cc: 'r-devel@r-project.org'
>> Subject: RE: [Rd] Reading many large files causes R to crash -
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> I've isolated the bug.  When the seg fault was produced there was an
>> error that memory had not been mapped.  Here is the odd part of the
>> bug.  If you comment out certain code and get a full run than comment in
> the code which
>> is causing the problem it will actually run.   So I think it is safe to
>> assume something wrong is taking place with memory allocation.  Example.
>> While testing, I have been able to get to a point where the code will run.
>> But if I reboot the machine and try again, the code will not run.
>>
>> The bug itself is happening somewhere in XTS or ZOO.  I will gladly
>> upload the data files.  It is happening on the 10th data file which is
>> only 225k lines in size.
>>
>> Below is the simplified code.  The call to either
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>>
>> is what is causing R to hang or crash.  I have been able to replicate
>> this on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to
>> consistently replicate from R Studio.
>>
>> The code below will consistently replicate when the appropriate files
>> are used.
>>
>> parseTickDataFromDir = function(tickerDir, per, subper) {
>>   tickerAbsFilenames = list.files(tickerDir,full.names=T)
>>   tickerNames = list.files(tickerDir,full.names=F)
>>   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>>   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames),
>> style = 3)
>>
>>   for(i in 1:length(tickerAbsFilenames)) {
>> dat.i = parseTickData(tickerAbsFilenames[i])
>> dates <- unique(substr(as.character(index(dat.i)), 1,10))
>> times <- rep("09:30:00", length(dates))
>> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
>> templateTimes <- NULL
>>
>> for (j in 1:length(openDateTimes)) {
>>   if (is.null(templateTimes)) {
>> templateTimes <- openDateTimes[j] + 0:23400
>>   } else {
>> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>>   }
>> }
>>
>> templateTimes <- as.xts(templateTimes)
>> dat.i <- merge(dat.i, templateTimes, all=T)
>> if (is.na(dat.i[1])) {
>>   dat.i[1] <- -1
>> }
>> dat.i <- na.locf(dat.i)
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per,
>> k=subper))
>> setTxtProgressBar(pb, i)
>>   }
>>   close(pb)
>> }
>>
>> parseTickData <- function(inputFile) {
>>   DAT.list <- scan(file=inputFile,
>> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>>   index <-
>> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
>> %H:%M:%S")
>>   DAT.xts <- xts(DAT.list$Close,index)
>>   DAT.xts <- make.index.unique(DAT.xts)
>>   return(DAT.xts)
>> }
>>
>> DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)
>>
>> -Original Message-
>> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
>> Sent: Sunday, July 22, 2012 4:48 PM
>> To: David Terk
>> Cc: r-devel@r-project.org
>> Subject: Re: [Rd] Reading many large files causes R to crash -
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> On 12-07-22 3:54 PM, David Terk wrote:
>>> I am r

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread David Terk
Where should this be discussed since it is definitely XTS related?  I will
gladly upload the simplified script + data files to whoever is maintaining
this part of the code.  Fortunately there is a workaround here.

-Original Message-
From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] 
Sent: Monday, July 23, 2012 8:15 AM
To: David Terk
Cc: Duncan Murdoch; r-devel@r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

David,

You still haven't provided a reproducible example.  As Duncan already said,
"if you don't post code that allows us to reproduce the crash, it's really
unlikely that we'll be able to fix it."

And R-devel is not the appropriate venue to discuss this if it's truly an
issue with xts/zoo.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 12:41 AM, David Terk  wrote:
> Looks like the call to:
>
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>
> If what is causing the issue.  If variable name is not set, or set to 
> any value other than NULL.  Than no hang occurs.
>
> -Original Message-
> From: David Terk [mailto:david.t...@gmail.com]
> Sent: Monday, July 23, 2012 1:25 AM
> To: 'Duncan Murdoch'
> Cc: 'r-devel@r-project.org'
> Subject: RE: [Rd] Reading many large files causes R to crash - 
> Possible Bug in R 2.15.1 64-bit Ubuntu
>
> I've isolated the bug.  When the seg fault was produced there was an 
> error that memory had not been mapped.  Here is the odd part of the 
> bug.  If you comment out certain code and get a full run than comment in
the code which
> is causing the problem it will actually run.   So I think it is safe to
> assume something wrong is taking place with memory allocation.  Example.
> While testing, I have been able to get to a point where the code will run.
> But if I reboot the machine and try again, the code will not run.
>
> The bug itself is happening somewhere in XTS or ZOO.  I will gladly 
> upload the data files.  It is happening on the 10th data file which is 
> only 225k lines in size.
>
> Below is the simplified code.  The call to either
>
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>
> is what is causing R to hang or crash.  I have been able to replicate 
> this on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to 
> consistently replicate from R Studio.
>
> The code below will consistently replicate when the appropriate files 
> are used.
>
> parseTickDataFromDir = function(tickerDir, per, subper) {
>   tickerAbsFilenames = list.files(tickerDir,full.names=T)
>   tickerNames = list.files(tickerDir,full.names=F)
>   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
> style = 3)
>
>   for(i in 1:length(tickerAbsFilenames)) {
> dat.i = parseTickData(tickerAbsFilenames[i])
> dates <- unique(substr(as.character(index(dat.i)), 1,10))
> times <- rep("09:30:00", length(dates))
> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
> templateTimes <- NULL
>
> for (j in 1:length(openDateTimes)) {
>   if (is.null(templateTimes)) {
> templateTimes <- openDateTimes[j] + 0:23400
>   } else {
> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>   }
> }
>
> templateTimes <- as.xts(templateTimes)
> dat.i <- merge(dat.i, templateTimes, all=T)
> if (is.na(dat.i[1])) {
>   dat.i[1] <- -1
> }
> dat.i <- na.locf(dat.i)
> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
> index(dat.i) <- index(to.period(templateTimes, period=per,
> k=subper))
> setTxtProgressBar(pb, i)
>   }
>   close(pb)
> }
>
> parseTickData <- function(inputFile) {
>   DAT.list <- scan(file=inputFile,
> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>   index <- 
> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
> %H:%M:%S")
>   DAT.xts <- xts(DAT.list$Close,index)
>   DAT.xts <- make.index.unique(DAT.xts)
>   return(DAT.xts)
> }
>
> DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)
>
> -Original Message-
> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
> Sent: Sunday, July 22, 2012 4:48 PM
> To: David Terk
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - 
> Possible Bug in R 2.15.1 64-bit Ubuntu
>
> On 12-07-22 3:54 PM, David Terk wrote:
>> I am reading several hundred files.  Anywhere from 50k-400k in size.
>> It appears that when I read these files with R 2.15.1 the process 
>> will hang or seg fault on the scan() call.  This does not happen on R
2.14.1.
>
> The code below doesn't do anything other than define a couple of
functions.
> Please simplify it to code that creates a file (or multiple files), 
> reads it or them, and shows a bug.
>
> If you can'

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread David Terk
I'm attaching a runnable script and corresponding data files.  This will
freeze at 83%.

I'm not sure how much simpler to get than this. 

-Original Message-
From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] 
Sent: Monday, July 23, 2012 9:17 AM
To: David Terk
Cc: Duncan Murdoch; r-devel@r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

Well, you still haven't convinced anyone but yourself that it's definitely
an xts problem, since you have not provided any reproducible example...
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 8:14 AM, David Terk  wrote:
> Where should this be discussed since it is definitely XTS related?  I 
> will gladly upload the simplified script + data files to whoever is 
> maintaining this part of the code.  Fortunately there is a workaround
here.
>
> -Original Message-
> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
> Sent: Monday, July 23, 2012 8:15 AM
> To: David Terk
> Cc: Duncan Murdoch; r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - 
> Possible Bug in R 2.15.1 64-bit Ubuntu
>
> David,
>
> You still haven't provided a reproducible example.  As Duncan already 
> said, "if you don't post code that allows us to reproduce the crash, 
> it's really unlikely that we'll be able to fix it."
>
> And R-devel is not the appropriate venue to discuss this if it's truly 
> an issue with xts/zoo.
>
> Best,
> --
> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  |  
> www.fosstrading.com
>
>
> On Mon, Jul 23, 2012 at 12:41 AM, David Terk  wrote:
>> Looks like the call to:
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>
>> If what is causing the issue.  If variable name is not set, or set to 
>> any value other than NULL.  Than no hang occurs.
>>
>> -Original Message-
>> From: David Terk [mailto:david.t...@gmail.com]
>> Sent: Monday, July 23, 2012 1:25 AM
>> To: 'Duncan Murdoch'
>> Cc: 'r-devel@r-project.org'
>> Subject: RE: [Rd] Reading many large files causes R to crash - 
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> I've isolated the bug.  When the seg fault was produced there was an 
>> error that memory had not been mapped.  Here is the odd part of the 
>> bug.  If you comment out certain code and get a full run than comment 
>> in
> the code which
>> is causing the problem it will actually run.   So I think it is safe to
>> assume something wrong is taking place with memory allocation.  Example.
>> While testing, I have been able to get to a point where the code will
run.
>> But if I reboot the machine and try again, the code will not run.
>>
>> The bug itself is happening somewhere in XTS or ZOO.  I will gladly 
>> upload the data files.  It is happening on the 10th data file which 
>> is only 225k lines in size.
>>
>> Below is the simplified code.  The call to either
>>
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>>
>> is what is causing R to hang or crash.  I have been able to replicate 
>> this on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to 
>> consistently replicate from R Studio.
>>
>> The code below will consistently replicate when the appropriate files 
>> are used.
>>
>> parseTickDataFromDir = function(tickerDir, per, subper) {
>>   tickerAbsFilenames = list.files(tickerDir,full.names=T)
>>   tickerNames = list.files(tickerDir,full.names=F)
>>   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>>   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
>> style = 3)
>>
>>   for(i in 1:length(tickerAbsFilenames)) {
>> dat.i = parseTickData(tickerAbsFilenames[i])
>> dates <- unique(substr(as.character(index(dat.i)), 1,10))
>> times <- rep("09:30:00", length(dates))
>> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
>> templateTimes <- NULL
>>
>> for (j in 1:length(openDateTimes)) {
>>   if (is.null(templateTimes)) {
>> templateTimes <- openDateTimes[j] + 0:23400
>>   } else {
>> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
>>   }
>> }
>>
>> templateTimes <- as.xts(templateTimes)
>> dat.i <- merge(dat.i, templateTimes, all=T)
>> if (is.na(dat.i[1])) {
>>   dat.i[1] <- -1
>> }
>> dat.i <- na.locf(dat.i)
>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>> index(dat.i) <- index(to.period(templateTimes, period=per,
>> k=subper))
>> setTxtProgressBar(pb, i)
>>   }
>>   close(pb)
>> }
>>
>> parseTickData <- function(inputFile) {
>>   DAT.list <- scan(file=inputFile,
>> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
>>   index <-
>> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
>> %H:%M:%S")
>>   DAT.xts <- xts(DAT.list$Close,index)
>>   DAT.xts <- make.index.unique(DAT.xts

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread Joshua Ulrich
David,

Thank you for providing something reproducible.

This line:
templateTimes <- as.xts(templateTimes)

creates a zero-width xts object (i.e. the coredata is a zero-length
vector, but there is a non-zero-length index). So, the
to.period(templateTimes) call returns OHLC data of random memory
locations.  This is the likely cause of the segfaults.

Since aggregating "no data" doesn't make sense, I have patched
to.period to throw an error when run on zero-width/length objects
(revision 690 on R-Forge).  The attached file works with the CRAN
version of xts because it avoids the issue entirely.

Your script will still "hang" on the BAC_0.csv file because
as.character.POSIXt can take a long time.  Better to just call
format() directly (as I do in the attached file).

If you have any follow-up questions, please send them to R-SIG-Finance.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 8:41 AM, David Terk  wrote:
> I'm attaching a runnable script and corresponding data files.  This will
> freeze at 83%.
>
> I'm not sure how much simpler to get than this.
>
> -Original Message-
> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
> Sent: Monday, July 23, 2012 9:17 AM
> To: David Terk
> Cc: Duncan Murdoch; r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
> in R 2.15.1 64-bit Ubuntu
>
> Well, you still haven't convinced anyone but yourself that it's definitely
> an xts problem, since you have not provided any reproducible example...
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
>
>
> On Mon, Jul 23, 2012 at 8:14 AM, David Terk  wrote:
>> Where should this be discussed since it is definitely XTS related?  I
>> will gladly upload the simplified script + data files to whoever is
>> maintaining this part of the code.  Fortunately there is a workaround
> here.
>>
>> -Original Message-
>> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
>> Sent: Monday, July 23, 2012 8:15 AM
>> To: David Terk
>> Cc: Duncan Murdoch; r-devel@r-project.org
>> Subject: Re: [Rd] Reading many large files causes R to crash -
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> David,
>>
>> You still haven't provided a reproducible example.  As Duncan already
>> said, "if you don't post code that allows us to reproduce the crash,
>> it's really unlikely that we'll be able to fix it."
>>
>> And R-devel is not the appropriate venue to discuss this if it's truly
>> an issue with xts/zoo.
>>
>> Best,
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  |
>> www.fosstrading.com
>>
>>
>> On Mon, Jul 23, 2012 at 12:41 AM, David Terk  wrote:
>>> Looks like the call to:
>>>
>>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>>
>>> If what is causing the issue.  If variable name is not set, or set to
>>> any value other than NULL.  Than no hang occurs.
>>>
>>> -Original Message-
>>> From: David Terk [mailto:david.t...@gmail.com]
>>> Sent: Monday, July 23, 2012 1:25 AM
>>> To: 'Duncan Murdoch'
>>> Cc: 'r-devel@r-project.org'
>>> Subject: RE: [Rd] Reading many large files causes R to crash -
>>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>>
>>> I've isolated the bug.  When the seg fault was produced there was an
>>> error that memory had not been mapped.  Here is the odd part of the
>>> bug.  If you comment out certain code and get a full run than comment
>>> in
>> the code which
>>> is causing the problem it will actually run.   So I think it is safe to
>>> assume something wrong is taking place with memory allocation.  Example.
>>> While testing, I have been able to get to a point where the code will
> run.
>>> But if I reboot the machine and try again, the code will not run.
>>>
>>> The bug itself is happening somewhere in XTS or ZOO.  I will gladly
>>> upload the data files.  It is happening on the 10th data file which
>>> is only 225k lines in size.
>>>
>>> Below is the simplified code.  The call to either
>>>
>>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))
>>>
>>> is what is causing R to hang or crash.  I have been able to replicate
>>> this on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to
>>> consistently replicate from R Studio.
>>>
>>> The code below will consistently replicate when the appropriate files
>>> are used.
>>>
>>> parseTickDataFromDir = function(tickerDir, per, subper) {
>>>   tickerAbsFilenames = list.files(tickerDir,full.names=T)
>>>   tickerNames = list.files(tickerDir,full.names=F)
>>>   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
>>>   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames),
>>> style = 3)
>>>
>>>   for(i in 1:length(tickerAbsFilenames)) {
>>> dat.i = parseTickData(tickerAbsFilenames[i])
>>> dates <- unique(substr(as.character(index(dat.i)), 1,10))
>>> times <- rep("09:30:00", length(dates))

Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread David Terk
Thank you for getting this done so quickly.  This will process now.

One quick question regarding a call to as.character.POSIXt.  When using
scan, since scan reads line by line, would it make sense to have the ability
to perform a char -> POSIXct conversion on each line that is read, rather
than after all lines have been read?  Perhaps this already exists somewhere
and I am not aware of it.

-Original Message-
From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] 
Sent: Monday, July 23, 2012 12:00 PM
To: David Terk
Cc: Duncan Murdoch; r-devel@r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

David,

Thank you for providing something reproducible.

This line:
templateTimes <- as.xts(templateTimes)

creates a zero-width xts object (i.e. the coredata is a zero-length vector,
but there is a non-zero-length index). So, the
to.period(templateTimes) call returns OHLC data of random memory locations.
This is the likely cause of the segfaults.

Since aggregating "no data" doesn't make sense, I have patched to.period to
throw an error when run on zero-width/length objects (revision 690 on
R-Forge).  The attached file works with the CRAN version of xts because it
avoids the issue entirely.

Your script will still "hang" on the BAC_0.csv file because
as.character.POSIXt can take a long time.  Better to just call
format() directly (as I do in the attached file).

If you have any follow-up questions, please send them to R-SIG-Finance.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Mon, Jul 23, 2012 at 8:41 AM, David Terk  wrote:
> I'm attaching a runnable script and corresponding data files.  This 
> will freeze at 83%.
>
> I'm not sure how much simpler to get than this.
>
> -Original Message-
> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
> Sent: Monday, July 23, 2012 9:17 AM
> To: David Terk
> Cc: Duncan Murdoch; r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - 
> Possible Bug in R 2.15.1 64-bit Ubuntu
>
> Well, you still haven't convinced anyone but yourself that it's 
> definitely an xts problem, since you have not provided any reproducible
example...
> --
> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  |  
> www.fosstrading.com
>
>
> On Mon, Jul 23, 2012 at 8:14 AM, David Terk  wrote:
>> Where should this be discussed since it is definitely XTS related?  I 
>> will gladly upload the simplified script + data files to whoever is 
>> maintaining this part of the code.  Fortunately there is a workaround
> here.
>>
>> -Original Message-
>> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
>> Sent: Monday, July 23, 2012 8:15 AM
>> To: David Terk
>> Cc: Duncan Murdoch; r-devel@r-project.org
>> Subject: Re: [Rd] Reading many large files causes R to crash - 
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>
>> David,
>>
>> You still haven't provided a reproducible example.  As Duncan already 
>> said, "if you don't post code that allows us to reproduce the crash, 
>> it's really unlikely that we'll be able to fix it."
>>
>> And R-devel is not the appropriate venue to discuss this if it's 
>> truly an issue with xts/zoo.
>>
>> Best,
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  | 
>> www.fosstrading.com
>>
>>
>> On Mon, Jul 23, 2012 at 12:41 AM, David Terk 
wrote:
>>> Looks like the call to:
>>>
>>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>>
>>> If what is causing the issue.  If variable name is not set, or set 
>>> to any value other than NULL.  Than no hang occurs.
>>>
>>> -Original Message-
>>> From: David Terk [mailto:david.t...@gmail.com]
>>> Sent: Monday, July 23, 2012 1:25 AM
>>> To: 'Duncan Murdoch'
>>> Cc: 'r-devel@r-project.org'
>>> Subject: RE: [Rd] Reading many large files causes R to crash - 
>>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>>
>>> I've isolated the bug.  When the seg fault was produced there was an 
>>> error that memory had not been mapped.  Here is the odd part of the 
>>> bug.  If you comment out certain code and get a full run than 
>>> comment in
>> the code which
>>> is causing the problem it will actually run.   So I think it is safe to
>>> assume something wrong is taking place with memory allocation.  Example.
>>> While testing, I have been able to get to a point where the code 
>>> will
> run.
>>> But if I reboot the machine and try again, the code will not run.
>>>
>>> The bug itself is happening somewhere in XTS or ZOO.  I will gladly 
>>> upload the data files.  It is happening on the 10th data file which 
>>> is only 225k lines in size.
>>>
>>> Below is the simplified code.  The call to either
>>>
>>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
>>> index(dat.i) <- index(to.period(templateTimes, period=per, 
>>> k=subper))
>>>
>>> is what is causing R to hang or crash.  I have been able to 
>>> replicate this on Windows 7 64 bit and Ubuntu 64 bi

[Rd] large dataset - confused

2012-07-23 Thread walcotteric
I'm trying to load a dataset into R, but I'm completely lost. This is
probably due mostly to the fact that I'm a complete R newb, but it's got me
stuck in  a research project. 
I've tried just opening the text file in WordPad and copying the data
directly into R, but it's too big and causes the program to crash. 

Any suggestions or assistance? I'm kinda desperate and lost. 



--
View this message in context: 
http://r.789695.n4.nabble.com/large-dataset-confused-tp4637476.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large dataset - confused

2012-07-23 Thread Prof Brian Ripley

On 23/07/2012 18:32, walcotteric wrote:

I'm trying to load a dataset into R, but I'm completely lost. This is
probably due mostly to the fact that I'm a complete R newb, but it's got me
stuck in  a research project.
I've tried just opening the text file in WordPad and copying the data
directly into R, but it's too big and causes the program to crash.

Any suggestions or assistance? I'm kinda desperate and lost.


Yes, you are lost. The R posting guide is at 
http://www.r-project.org/posting-guide.html and will point you to the 
right list and also the manuals (at e.g. 
http://cran.r-project.org/manuals.html, and one of them seems exactly 
what you need).


BTW, 'large dataset' is meaningless: when I asked a class of Statistics 
PhD students the answers differed by 7 orders of magnitude.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large dataset - confused

2012-07-23 Thread Brian G. Peterson

On 07/23/2012 12:32 PM, walcotteric wrote:

I'm trying to load a dataset into R, but I'm completely lost. This is
probably due mostly to the fact that I'm a complete R newb, but it's got me
stuck in  a research project.
I've tried just opening the text file in WordPad and copying the data
directly into R, but it's too big and causes the program to crash.

Any suggestions or assistance? I'm kinda desperate and lost.


Check the manual about loading data:

http://cran.r-project.org/doc/manuals/R-data.html

If you're still having trouble, read the posting guide:

http://www.r-project.org/posting-guide.html

Follow its advice about reproducibility.

Also, this question should have been directed to R-Help, not R-devel

Regards,

   - Brian

--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large dataset - confused

2012-07-23 Thread R. Michael Weylandt
1) Move this off R-devel to R-help.

2) Read the IO manual here: http://cran.r-project.org/manuals.html

3) You probably want to look at the read.table() function's help page
by typing ?read.table

Michael

On Mon, Jul 23, 2012 at 12:32 PM, walcotteric  wrote:
> I'm trying to load a dataset into R, but I'm completely lost. This is
> probably due mostly to the fact that I'm a complete R newb, but it's got me
> stuck in  a research project.
> I've tried just opening the text file in WordPad and copying the data
> directly into R, but it's too big and causes the program to crash.
>
> Any suggestions or assistance? I'm kinda desperate and lost.
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/large-dataset-confused-tp4637476.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large dataset - confused

2012-07-23 Thread Sarah Goslee
Hi,

On Mon, Jul 23, 2012 at 1:32 PM, walcotteric  wrote:
> I'm trying to load a dataset into R, but I'm completely lost. This is
> probably due mostly to the fact that I'm a complete R newb, but it's got me
> stuck in  a research project.
> I've tried just opening the text file in WordPad and copying the data
> directly into R, but it's too big and causes the program to crash.
>
> Any suggestions or assistance? I'm kinda desperate and lost.

Sure. First of all, you  need to post to the R-help list, not the R-devel list.

Then you need to read the Intro to R that came with R when you installed it.

Then you need to read the posting guide for R-help, and provide the
requested information, including:
how big is your dataset?
what format is it in? (text file isn't very informative)
what R commands have you used? (read.table() perhaps)
and so on.

Also, what do you mean by "crash"? R stops working? You get an error message?

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread Brian G. Peterson

On 07/23/2012 11:49 AM, David Terk wrote:

One quick question regarding a call to as.character.POSIXt.  When using
scan, since scan reads line by line, would it make sense to have the ability
to perform a char -> POSIXct conversion on each line that is read, rather
than after all lines have been read?  Perhaps this already exists somewhere
and I am not aware of it.


It's actually much faster to load everything into memory and then 
convert it all to xts at once. as.POSIXct will work on a vector to 
create your index, this s better than calling it millions of times, once 
for each row.


--
Brian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large dataset - confused

2012-07-23 Thread oliver
On Mon, Jul 23, 2012 at 06:42:17PM +0100, Prof Brian Ripley wrote:
[...]
> BTW, 'large dataset' is meaningless: when I asked a class of
> Statistics PhD students the answers differed by 7 orders of
> magnitude.
[...]

lol

But isn't 7 a "small" number? ;-)

Ciao,
   Oliver

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

2012-07-23 Thread Simon Urbanek

On Jul 23, 2012, at 12:49 PM, David Terk wrote:

> Thank you for getting this done so quickly.  This will process now.
> 
> One quick question regarding a call to as.character.POSIXt.  When using
> scan, since scan reads line by line, would it make sense to have the ability
> to perform a char -> POSIXct conversion on each line that is read, rather
> than after all lines have been read?

That's not the problem -- the problem is that converting through format 
specifications is very, very slow - if you have standard -mm-dd hh:mm:ss 
format (or a subset thereof) you can use fastPOSTXct from 
http://rforge.net/fasttime  - it's many orders of magnitude faster than using 
format-based conversions - but it is also limited to the standard GMT format 
(hence the speed). If you have more complex format and have to go through 
format, you can use pvec from multicore/parallel to at least use all cores of 
your machine.

Cheers,
Simon


>  Perhaps this already exists somewhere
> and I am not aware of it.
> 
> -Original Message-
> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] 
> Sent: Monday, July 23, 2012 12:00 PM
> To: David Terk
> Cc: Duncan Murdoch; r-devel@r-project.org
> Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
> in R 2.15.1 64-bit Ubuntu
> 
> David,
> 
> Thank you for providing something reproducible.
> 
> This line:
> templateTimes <- as.xts(templateTimes)
> 
> creates a zero-width xts object (i.e. the coredata is a zero-length vector,
> but there is a non-zero-length index). So, the
> to.period(templateTimes) call returns OHLC data of random memory locations.
> This is the likely cause of the segfaults.
> 
> Since aggregating "no data" doesn't make sense, I have patched to.period to
> throw an error when run on zero-width/length objects (revision 690 on
> R-Forge).  The attached file works with the CRAN version of xts because it
> avoids the issue entirely.
> 
> Your script will still "hang" on the BAC_0.csv file because
> as.character.POSIXt can take a long time.  Better to just call
> format() directly (as I do in the attached file).
> 
> If you have any follow-up questions, please send them to R-SIG-Finance.
> 
> Best,
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
> 
> 
> On Mon, Jul 23, 2012 at 8:41 AM, David Terk  wrote:
>> I'm attaching a runnable script and corresponding data files.  This 
>> will freeze at 83%.
>> 
>> I'm not sure how much simpler to get than this.
>> 
>> -Original Message-
>> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
>> Sent: Monday, July 23, 2012 9:17 AM
>> To: David Terk
>> Cc: Duncan Murdoch; r-devel@r-project.org
>> Subject: Re: [Rd] Reading many large files causes R to crash - 
>> Possible Bug in R 2.15.1 64-bit Ubuntu
>> 
>> Well, you still haven't convinced anyone but yourself that it's 
>> definitely an xts problem, since you have not provided any reproducible
> example...
>> --
>> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  |  
>> www.fosstrading.com
>> 
>> 
>> On Mon, Jul 23, 2012 at 8:14 AM, David Terk  wrote:
>>> Where should this be discussed since it is definitely XTS related?  I 
>>> will gladly upload the simplified script + data files to whoever is 
>>> maintaining this part of the code.  Fortunately there is a workaround
>> here.
>>> 
>>> -Original Message-
>>> From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com]
>>> Sent: Monday, July 23, 2012 8:15 AM
>>> To: David Terk
>>> Cc: Duncan Murdoch; r-devel@r-project.org
>>> Subject: Re: [Rd] Reading many large files causes R to crash - 
>>> Possible Bug in R 2.15.1 64-bit Ubuntu
>>> 
>>> David,
>>> 
>>> You still haven't provided a reproducible example.  As Duncan already 
>>> said, "if you don't post code that allows us to reproduce the crash, 
>>> it's really unlikely that we'll be able to fix it."
>>> 
>>> And R-devel is not the appropriate venue to discuss this if it's 
>>> truly an issue with xts/zoo.
>>> 
>>> Best,
>>> --
>>> Joshua Ulrich  |  about.me/joshuaulrich FOSS Trading  | 
>>> www.fosstrading.com
>>> 
>>> 
>>> On Mon, Jul 23, 2012 at 12:41 AM, David Terk 
> wrote:
 Looks like the call to:
 
 dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
 
 If what is causing the issue.  If variable name is not set, or set 
 to any value other than NULL.  Than no hang occurs.
 
 -Original Message-
 From: David Terk [mailto:david.t...@gmail.com]
 Sent: Monday, July 23, 2012 1:25 AM
 To: 'Duncan Murdoch'
 Cc: 'r-devel@r-project.org'
 Subject: RE: [Rd] Reading many large files causes R to crash - 
 Possible Bug in R 2.15.1 64-bit Ubuntu
 
 I've isolated the bug.  When the seg fault was produced there was an 
 error that memory had not been mapped.  Here is the odd part of the 
 bug.  If you comment out certain code and get a full run than 
 comment in
>>> the code which
 is causing the problem it w

Re: [Rd] large dataset - confused

2012-07-23 Thread oliver
On Mon, Jul 23, 2012 at 10:32:42AM -0700, walcotteric wrote:
> I'm trying to load a dataset into R, but I'm completely lost. This is
> probably due mostly to the fact that I'm a complete R newb, but it's got me
> stuck in  a research project. 
[...]

Hmhh, becoming stuck in a "research project", because you are a "complete R 
newb"?
Strange what kind of working style is this?
What about first learning the tools you use?


> I've tried just opening the text file in WordPad and copying the data
> directly into R, but it's too big and causes the program to crash. 
[...]

OMFG

> 
> Any suggestions or assistance? I'm kinda desperate and lost. 

Looks like this is completely caused by yourself.

Check your working style...

and follow the hints to the docs that other people
on this list suggested to you.

There are a lot of good documentatiuons online.
Even books for R are available. Look at the library
of your "research institute"...

Ciao,
   Oliver

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S4 objects in formulas

2012-07-23 Thread David L Lorenz
Hi,
  I have very carefully developed several S4 classes that describe 
censored water-quality data. I have routines for them that will support 
their use in data.frames and so forth. I have run into a problem when I 
try to use the S4 class as the response variable in a formula and try to 
extract the model frame. I get an error like:

Error in model.frame.default(as.lcens(Y) ~ X) : object is not a matrix

  In this case, as.lcens works much like the Surv function in the survival 
package except that the object is an S4 class and not a matrix of class 
Surv. I would have expected that the model.frame function would have been 
able to manipulate any kind of object that can be subsetted and put into a 
data.frame. But that appears not to be the case. I'm using R 2.14.1 if 
that matters.
  I can supply the routines for the lcens data if needed.
  Am I looking at needing to write a wrapper to convert all of my S4 
classes into matrices and then extract the necessary data in the matrices 
according to rules for the particular kind of S4 class? Or, am I missing a 
key piece on how model.frame works?
  Thanks.
Dave

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On RObjectTables

2012-07-23 Thread Michael Lawrence
Luke,

Please keep me advised on this, because the Qt interfaces heavily rely on
the ObjectTables (btw, it has worked great for my use cases).

Michael

On Fri, Jul 20, 2012 at 7:32 AM,  wrote:

> I believe everyone who has worked on the relevant files has tried to
> maintain this functionality, but as it seems to get used and tested
> very little I can't be sure it is functional at this point. The
> facility in its current form does complicate the internal code and
> limit some experiments we might otherwise do, so I would not be
> surprised if it was at least substantially changed in the next year or
> two.
>
> Best,
>
> luke
>
>
> On Thu, 19 Jul 2012, Jeroen Ooms wrote:
>
>  I was wondering if anyone knows more about the state of RObjectTables.
>> This
>> largely undocumented functionality was introduced by Duncan around 2002
>> somewhere and enables you create an environment where the contents are
>> dynamically queried by R through a hook function. It is mentioned in R
>> Internals and ?attach. This functionality is quite powerful and allows you
>> to e.g. offload a big database of R objects to disk, yet use them as if
>> they were in your workspace. The recent RProtoBuf package also uses some
>> of
>> this functionality to dynamically lookup proto definitions.
>>
>> I would like to do something similar, but I am not sure if support for
>> this
>> functionality will be or has been discontinued. The RObjectTables package
>> is no longer available on OmegaHat and nothing has not been mentioned on
>> the mailing lists for about 5 years. I found an old version of the package
>> no github which seems to work, but as far as I understand, the package
>> still needs the hooks from within R to work. So if this functionality is
>> actually unsupported and might be removed at some point, I should probably
>> not invest in it.
>>
>> [[alternative HTML version deleted]]
>>
>> __**
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-devel
>>
>>
> --
> Luke Tierney
> Chair, Statistics and Actuarial Science
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
>Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>
>
> __**
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Finding dynamic shared libraries loaded with a package

2012-07-23 Thread Winston Chang
Is there a way to query a package to see what dynamic shared libraries are
loaded with it?


The reason I ask is because during development, I want to unload libraries
so that they can be reloaded without restarting R. I want to make it
automatic so that you can just pass in the name of the package, and it will
unload all the relevant shared libraries.

Typically, the name of the shared library is the same as the package. So
something like this usually works:
pkgname <- 'bitops'
pkgpath <- system.file(package=pkgname)
library.dynam.unload(pkgname, pkgpath)


Some R packages have shared libraries with names that differ from the
package, and this strategy won't work for them. I'm aware that the
NAMESPACE file will have an entry like this:
useDynLib(libname)

but I don't know how to access this information from within R. Is this
possible?


Another strategy I've looked at is to get all the directories listed by
.dynLibs() and picking out those that contain the path of the package, but
I'd prefer not to do it this way if possible, since it seems like a bit of
a hack. For example, this code will load bitops, then unload the shared
library and unload the package.


library(bitops)
# Show what's loaded
.dynLibs()

pkgname <- 'bitops'

# Get installation path for the package
pkgpath <- system.file(package=pkgname)

# Get a vector of paths for all loaded libs
dynlib_paths <- vapply(.dynLibs(), function(x) x[["path"]], character(1))

# Find which of the lib paths start with pkgpath
pkgmatch <- pkgpath == substr(dynlib_paths, 1, nchar(pkgpath))

# Get matching lib paths and strip off leading path and extension (.so or
.dll)
libnames <- sub("\\.[^\\.]*$", "", basename(dynlib_paths[pkgmatch]))

library.dynam.unload(libnames, pkgpath)

# Show what's loaded
.dynLibs()

# Finally, also unload the package
detach(paste("package", pkgname, sep =":"), character.only = TRUE,
  force = TRUE, unload = TRUE)


Thanks for any help you can provide,
-Winston

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Finding dynamic shared libraries loaded with a package

2012-07-23 Thread Gabor Grothendieck
On Mon, Jul 23, 2012 at 8:29 PM, Winston Chang  wrote:
> Is there a way to query a package to see what dynamic shared libraries are
> loaded with it?
>

This gives a "DLLInfoList" class object whose components are info
associated with the loaded dll's

DLLInfoList <- library.dynam()

and this gives the components associated with package "stats"

DLLInfoList[sapply(DLLInfoList, "[[", "name") == "stats"]





-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Finding dynamic shared libraries loaded with a package

2012-07-23 Thread Winston Chang
On Mon, Jul 23, 2012 at 7:47 PM, Gabor Grothendieck  wrote:

> On Mon, Jul 23, 2012 at 8:29 PM, Winston Chang 
> wrote:
> > Is there a way to query a package to see what dynamic shared libraries
> are
> > loaded with it?
> >
>
> This gives a "DLLInfoList" class object whose components are info
> associated with the loaded dll's
>
> DLLInfoList <- library.dynam()
>
> and this gives the components associated with package "stats"
>
> DLLInfoList[sapply(DLLInfoList, "[[", "name") == "stats"]
>
>
Thanks - I think this does the trick!

Although I decided to use .dynLibs() instead of library.dynam(). The latter
just calls the former when no name is passed to it.


Another mailing list member sent me a message suggesting getLoadedDLLs().
This appears to be slightly different -- if I understand correctly, it
returns all loaded DLLs, while .dynLibs() returns just the ones loaded by
packages.

-Winston

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4 objects in formulas (really, model frames)

2012-07-23 Thread Prof Brian Ripley

The help for model.frame says

 Only variables whose type is raw, logical, integer, real, complex
 or character can be included in a model frame: this includes
 classed variables such as factors (whose underlying type is
 integer), but excludes lists.

Some S4 objects are of one of those types, but some are not.  Some 
matrices are, some are not.  Objects of class "Surv" are.


On 23/07/2012 21:33, David L Lorenz wrote:

Hi,
   I have very carefully developed several S4 classes that describe
censored water-quality data. I have routines for them that will support
their use in data.frames and so forth. I have run into a problem when I
try to use the S4 class as the response variable in a formula and try to
extract the model frame. I get an error like:

Error in model.frame.default(as.lcens(Y) ~ X) : object is not a matrix

   In this case, as.lcens works much like the Surv function in the survival
package except that the object is an S4 class and not a matrix of class
Surv. I would have expected that the model.frame function would have been
able to manipulate any kind of object that can be subsetted and put into a
data.frame. But that appears not to be the case. I'm using R 2.14.1 if
that matters.
   I can supply the routines for the lcens data if needed.
   Am I looking at needing to write a wrapper to convert all of my S4
classes into matrices and then extract the necessary data in the matrices
according to rules for the particular kind of S4 class? Or, am I missing a
key piece on how model.frame works?
   Thanks.
Dave

[[alternative HTML version deleted]]


The posting guide asked you not to do that.  And to do your own homework.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Using .onUnload to unload DLLs

2012-07-23 Thread Winston Chang
I've noticed that many of the "base" R packages have an .onUnload()
function which automatically unloads compiled shared libraries
with library.dynam.unload(). For example:

> stats:::.onUnload
function (libpath)
library.dynam.unload("stats", libpath)




I've noticed that many other packages don't do this. Is it considered good
practice to do this in packages that have compiled code?

-Winston

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Using .onUnload to unload DLLs

2012-07-23 Thread Prof Brian Ripley

On 24/07/2012 07:11, Winston Chang wrote:

I've noticed that many of the "base" R packages have an .onUnload()
function which automatically unloads compiled shared libraries
with library.dynam.unload(). For example:


stats:::.onUnload

function (libpath)
library.dynam.unload("stats", libpath)




I've noticed that many other packages don't do this. Is it considered good
practice to do this in packages that have compiled code?


Yes, but do not all packages have well-enough written DSOs to do this. 
(Tcl/Tk does not, so we no longer attempt to upload tcltk.dll.)




-Winston

[[alternative HTML version deleted]]


which the posting guide expressly asked you not to do.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel