Re: Solr - DataImportHandler - Large Dataset results ?

Shalin Shekhar Mangar Fri, 12 Dec 2008 21:57:48 -0800

On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay <kaykay.uni...@gmail.com> wrote:


> Thanks Shalin for the clarification.
>
> The case about Lucene taking more time to index the Document when compared
> to DataImportHandler creating the input is definitely intuitive.
>
> But just curious about the underlying architecture on which the test was
> being run. Was this performed on a multi-core machine . If so - how many
> cores were there ? What architecture would they be ?  It might be useful to
> know more about them to understand more about the results and see where they
> could be improved.
>

This was with 4 CPU 64-bit Xeon dual core boxes with 6GB dedicated to the
JVM. IIRC, dataset was 3 million documents joining 3 tables from MySQL
(index size on disk 1.3 gigs). Both Solr and MySql boxes were same
configuration and running on a gigabit network. This was done a long time
back so these may not be the exact values but should be pretty close.


>
> As about the query -
>
> select * from table LIMIT 0, 5000
>
> how database / vendor / driver neutral is this statement . I believe mysql
> supports this. But I am just curious how generic is this statement going to
> be .
>
>
This is for MySql. I believe we are discussing these workarounds only
because MySQL driver does not support batch streaming. It fetches rows
either one-by-one or all-at-once. You probably wouldn't need these tricks
for other databases.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr - DataImportHandler - Large Dataset results ?

Reply via email to