Re: [R] Improving data processing efficiency

2008-06-06 Thread hadley wickham
>> > install.packages("profr") >> Warning message: >> package 'profr' is not available > > I selected a different mirror in place of the Iowa one and it > worked. Odd, I just assumed all the same packages are available > on all mirrors. The Iowa mirror is rather out of date as the guy who was loo

Re: [R] Improving data processing efficiency

2008-06-06 Thread Charles C. Berry
On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote: install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley so profiling reveals that '[.data.frame' and '[[.data.fram

Re: [R] Improving data processing efficiency

2008-06-06 Thread Esmail Bonakdarian
Esmail Bonakdarian wrote: hadley wickham wrote: Hi, I tried this suggestion as I am curious about bottlenecks in my own R code ... Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages("profr") > inst

Re: [R] Improving data processing efficiency

2008-06-06 Thread Esmail Bonakdarian
hadley wickham wrote: Hi, I tried this suggestion as I am curious about bottlenecks in my own R code ... Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages("profr") > install.packages("profr") Warnin

Re: [R] Improving data processing efficiency

2008-06-06 Thread Horace Tso
.) H. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Folkinshteyn Sent: Friday, June 06, 2008 4:35 PM To: hadley wickham Cc: r-help@r-project.org; Patrick Burns Subject: Re: [R] Improving data processing efficiency > install.packages("prof

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers... i suppose

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
on 06/06/2008 06:55 PM hadley wickham said the following: Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p)

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
thanks for the suggestions! I'll play with this over the weekend and see what comes out. :) on 06/06/2008 06:48 PM Don MacQueen said the following: In a case like this, if you can possibly work with matrices instead of data frames, you might get significant speedup. (More accurately, I have had

Re: [R] Improving data processing efficiency

2008-06-06 Thread hadley wickham
On Fri, Jun 6, 2008 at 5:10 PM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: > Hmm... ok... so i ran the code twice - once with a preallocated result, > assigning rows to it, and once with a nrow=0 result, rbinding rows to it, > for the first 20 quarters. There was no speedup. In fact, running wi

Re: [R] Improving data processing efficiency

2008-06-06 Thread Don MacQueen
In a case like this, if you can possibly work with matrices instead of data frames, you might get significant speedup. (More accurately, I have had situations where I obtained speed up by working with matrices instead of dataframes.) Even if you have to code character columns as numeric, it can

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Hmm... ok... so i ran the code twice - once with a preallocated result, assigning rows to it, and once with a nrow=0 result, rbinding rows to it, for the first 20 quarters. There was no speedup. In fact, running with a preallocated result matrix was slower than rbinding to the matrix: for prea

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow
> -Original Message- > From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] > Sent: Friday, June 06, 2008 12:33 PM > To: Greg Snow > Cc: Patrick Burns; Daniel Folkinshteyn; r-help@r-project.org > Subject: Re: [R] Improving data processing efficiency > > On Fri, Jun

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
TECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not fa

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
c: r-help@r-project.org >> Subject: Re: [R] Improving data processing efficiency >> >> That is going to be situation dependent, but if you have a >> reasonable upper bound, then that will be much easier and not >> far from optimal. >> >> If you pick the possib

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns > Sent: Friday, June 06, 2008 12:04 PM > To: Daniel Folkinshteyn > Cc: r-help@r-project.org > Subject: Re: [R] Improving data processing efficiency > > Th

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Cool, I do have an upper bound, so I'll try it and how much of a speedboost it gives me. Thanks for the suggestion! on 06/06/2008 02:03 PM Patrick Burns said the following: That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not f

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn wrot

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue me in as to the best posting procedure! well, here's a dput-ed version of the small data subset you can use for testing. below that, an updated version of the function, with extra explanatory comments, and producing an ext

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
just in case, uploaded it to the server, you can get the zip file i mentioned here: http://astro.temple.edu/~dfolkins/helplistfiles.zip on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following: i thought since the function code (which i provided in full) was pretty short, it would be reaso

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
I think the posting guide may not be clear enough and have suggested that it be clarified. Hopefully this better communicates what is required and why in a shorter amount of space: https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn <

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick Bu

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
That is the last line of every message to r-help. On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Its summarized in the last line to r-help. Note reproducible and > minimal. > > On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <[EMAIL PROTECTED]> > wrote: >>

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: > i did! what did i miss? > > on 06/06/2008 11:45 AM Gabor Grothendieck said the following: >> >> Try reading the posting guide before posting.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folki

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: > Anybody have any thoughts on this? Please? :) > > on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: >> >> Hi everyone! >> >> I have a question about data pro

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] <- bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-s

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recen

Re: [R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
Thanks, I'll take a look at Rprof... but I think what i'm missing is facility with R idiom to get around the looping, and no amount of profiling will help me with that :) also, full working code is provided in my original post (see toward the bottom). on 06/05/2008 03:43 PM bartjoosen said t

Re: [R] Improving data processing efficiency

2008-06-05 Thread bartjoosen
Maybe you should provide a minimal, working code with data, so that we all can give it a try. In the mean time: take a look at the Rprof function to see where your code can be improved. Good luck Bart Daniel Folkinshteyn-2 wrote: > > Hi everyone! > > I have a question about data processing e

[R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For