On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay <kaykay.uni...@gmail.com> wrote:
> Thanks Shalin for the clarification. > > The case about Lucene taking more time to index the Document when compared > to DataImportHandler creating the input is definitely intuitive. > > But just curious about the underlying architecture on which the test was > being run. Was this performed on a multi-core machine . If so - how many > cores were there ? What architecture would they be ? It might be useful to > know more about them to understand more about the results and see where they > could be improved. > This was with 4 CPU 64-bit Xeon dual core boxes with 6GB dedicated to the JVM. IIRC, dataset was 3 million documents joining 3 tables from MySQL (index size on disk 1.3 gigs). Both Solr and MySql boxes were same configuration and running on a gigabit network. This was done a long time back so these may not be the exact values but should be pretty close. > > As about the query - > > select * from table LIMIT 0, 5000 > > how database / vendor / driver neutral is this statement . I believe mysql > supports this. But I am just curious how generic is this statement going to > be . > > This is for MySql. I believe we are discussing these workarounds only because MySQL driver does not support batch streaming. It fetches rows either one-by-one or all-at-once. You probably wouldn't need these tricks for other databases. -- Regards, Shalin Shekhar Mangar.