On 11/2/06, Vladimir Dergachev <[EMAIL PROTECTED]> wrote: > On Tuesday 31 October 2006 9:30 pm, miguel manese wrote: > The slowness manifests itself for vectorized code as well. I believe it is due > to the code mucking about with row.names attribute which introduces a penalty > on any [,] operation - penalty that grows linearly with the number of rows. > > Thus for large data frames A[,1] is slower than A[[1]]. For example, for the > data frame I mentioned above E<-A[[1]] took 0.46 seconds (way too much in my > opinion), but E<-A[,1] took 62.45 seconds - more than a minute and more than > twice the time it took to load the entire thing into memory. Silly, isn't > it ? > > Also, there are good reasons to want to address individual cells. And there is > no reason why such access cannot be constant time. Yeah, it should be O(1) because a data frame is just a list of vectors and everything is in memory: index the column in the list, then the row on the vector. For non-vectorized code, the problem is more of the loop overhead (maintaining loop variables) which is done on R instead of in C.
> > <pimp-my-project> > > Or, you may just use (and pour your effort on improving) SQLiteDF > > http://cran.r-project.org/src/contrib/Descriptions/SQLiteDF.html > > </pimp-my-project> > > Very nice ! The documentation mentioned something about assignment operator > not working - is this still true ? Or, maybe, I misunderstood something ? Yes, unfortunately, still no [<- operator. For every way that a data frame can index-ed (or subscript-ed), that's how many ways the data frames can be mutated. There are many other things more "fun" than coding that (graphics!, extending sqlite syntax, R expression evaluation), but I'd do that on the weekend. > Also, I wonder whether it would be possible to extend [[ operator so one can > run queries: SQLDF[["SELECT * FROM a WHERE.."]] That has been suggested before, but in retrospect this can be achieved more "poetically" as sdf[sdf$a>3 && sdf$b=="i",] # where a>3 and b == 'i' although not as efficient. I have been thinking of adding a method like select(sdf, select=<select_clause>,where=<where_clause>,ordery_by=order_by_clause) so that sum(sdf$a) can just be done with select(sdf, "sum(a)"), and not go .Call("..."). It can also optimize stuff, like with(sdf, a+b) can be done with select(sdf, "a+b"). M. Manese ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel