Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-03 Thread Jeff Ryan
A bit late and possibly tangential. The mmap package has something called struct() which is really a row-wise array of heterogenous columns. As Simon and others have pointed out, R has no way to handle this natively, but mmap does provide a very measurable performance gain by orienting rows to

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Antonio Piccolboni
On Tue, May 1, 2012 at 11:29 AM, Simon Urbanek wrote: > > On May 1, 2012, at 1:26 PM, Antonio Piccolboni > wrote: > > > It seems like people need to hear more context, happy to provide it. I am > > implementing a serialization format (typedbytes, HADOOP-1722 if people > want > > the gory details)

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Simon Urbanek
On May 1, 2012, at 1:26 PM, Antonio Piccolboni wrote: > It seems like people need to hear more context, happy to provide it. I am > implementing a serialization format (typedbytes, HADOOP-1722 if people want > the gory details) to make R and Hadoop interoperate better (RHadoop > project, package

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Antonio Piccolboni
It seems like people need to hear more context, happy to provide it. I am implementing a serialization format (typedbytes, HADOOP-1722 if people want the gory details) to make R and Hadoop interoperate better (RHadoop project, package rmr). It is a row first format and it's already implemented as a

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Prof Brian Ripley
On 01/05/2012 00:28, Antonio Piccolboni wrote: Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", 1:2000, sep =""))}) user syst

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Matthew Dowle
Antonio Piccolboni piccolboni.info> writes: > Hi, > I was wondering if there is anything more efficient than split to do the > kind of conversion in the subject. If I create a data frame as in > > system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", > 1:2000, sep =""))}) >

[Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-04-30 Thread Antonio Piccolboni
Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", 1:2000, sep =""))}) user system elapsed 0.004 0.000 0.004 and then I try