Frankly, I never tried any DIH... probably it is the best option for this specific case (they have Java developer) - but one should be knowledgeable enough to design SOLR schema... And I noticed here (and also at HBase mailing list) many first-time users are still thinking in terms of Relational-DBMS and are trying to index as-is their tables with relations (and different PKs) instead of indexing their documents... I have constantly 1000+ docs per second now, with 5%-15% CPU... small docs 5Kb in size in average, 7 fields... yes, correct, 3M+ docs in an hour... could be 10 times more!!! (5%-15%CPU currently) Fuad
>With a relational database, the approach that has been working for us >and many customers is to first give DataImportHandler a go. It's >powerful and fast. 3M docs should index in about an hour or less, I'd >speculate. But using DIH does require making access from Solr to the >DB server solid, of course. > > Erik