Re: Solr - DataImportHandler - Large Dataset results ?

Kay Kay Fri, 12 Dec 2008 22:16:10 -0800

Shalin Shekhar Mangar wrote:

On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay <kaykay.uni...@gmail.com> wrote:

Thanks Shalin for the clarification.

The case about Lucene taking more time to index the Document when compared
to DataImportHandler creating the input is definitely intuitive.

But just curious about the underlying architecture on which the test was
being run. Was this performed on a multi-core machine . If so - how many
cores were there ? What architecture would they be ?  It might be useful to
know more about them to understand more about the results and see where they
could be improved.


This was with 4 CPU 64-bit Xeon dual core boxes with 6GB dedicated to the
JVM. IIRC, dataset was 3 million documents joining 3 tables from MySQL
(index size on disk 1.3 gigs). Both Solr and MySql boxes were same
configuration and running on a gigabit network. This was done a long time
back so these may not be the exact values but should be pretty close.

Thanks for the detailed configuration on which the tests were performed.
Our current architecture also looks more or less very similar to the same.

As about the query -

select * from table LIMIT 0, 5000

how database / vendor / driver neutral is this statement . I believe mysql
supports this. But I am just curious how generic is this statement going to
be .

This is for MySql. I believe we are discussing these workarounds only
because MySQL driver does not support batch streaming. It fetches rows
either one-by-one or all-at-once. You probably wouldn't need these tricks
for other databases.

True - Currently , playing around with mysql . But I was trying tounderstand more about how the Statement object is getting created (inthe case of a platform / vendor specific query like this ). Are we goingthrough JPA internally in Solr to create the Statements for the queries.Where can I look into this in Solr source code to understand more aboutthis.

Re: Solr - DataImportHandler - Large Dataset results ?

Reply via email to