Regarding the explanation of where the time goes it might be parsing
the statement or the development of the query plan. The SQL statement
for the more complex query is obviously much longer and its generated
query plan involves 95 lines of byte code vs 19 lines of generated
code for the simpler q
I'm talking about ease of use to. The first line of the Details section in
?"[.data.table" says :
"Builds on base R functionality to reduce 2 types of time :
1. programming time (easier to write, read, debug and maintain)
2. compute time"
Once again, I am merely saying that the
I think one would only be concerned about such internals if one were
primarily interested in performance; otherwise, one would be more
interested in ease of specification and part of that ease is having it
independent of implementation and separating implementation from
specification activities. A
Are you claiming that SQL is that utopia? SQL is a row store. It cannot
give the user the benefits of column store.
For example, why does SQL take 113 seconds in the example in this thread :
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
but data.table takes 5 seconds to get the same
Its only important internally. Externally its undesirable that the
user have to get involved in it. The idea of making software easy to
write and use is to hide the implementation and focus on the problem.
That is why we use high level languages, object orientation, etc.
On Thu, Jan 28, 2010 at
How it represents data internally is very important, depending on the real
goal :
http://en.wikipedia.org/wiki/Column-oriented_DBMS
"Gabor Grothendieck" wrote in message
news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
How it represents data internally should not be importa
How it represents data internally should not be important as long as
you can do what you want. SQL is declarative so you just specify what
you want rather than how to get it and invisibly to the user it
automatically draws up a query plan and then uses that plan to get the
result.
On Wed, Jan 27,
> sqldf("select * from BOD order by Time desc limit 3")
Exactly. SQL requires use of order by. It knows the order, but it isn't
ordered. Thats not good, but might be fine, depending on what the real goal
is.
"Gabor Grothendieck" wrote in message
news:971536df1001270629w4795da89vb7d77af6e4e8b
On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle wrote:
> How many columns, and of what type are the columns ? As Olga asked too, it
> would be useful to know more about what you're really trying to do.
>
> 3.5m rows is not actually that many rows, even for 32bit R. Its depends on
> the columns and
How many columns, and of what type are the columns ? As Olga asked too, it
would be useful to know more about what you're really trying to do.
3.5m rows is not actually that many rows, even for 32bit R. Its depends on
the columns and what you want to do with those columns.
At the risk of sugge
Hi Nathan,
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files
which if successful will be about 3.5million rows (approx 16000 rows
per file). However, I have a linking table (an_contact) to resolve a
m:m rela
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files which if
successful will be about 3.5million rows (approx 16000 rows per file).
However, I have a linking table (an_contact) to resolve a m:m
relationship between
12 matches
Mail list logo