Frankly, I never tried any DIH... probably it is the best option for this
specific case (they have Java developer) - but one should be knowledgeable
enough to design SOLR schema... And I noticed here (and also at HBase
mailing list) many first-time users are still thinking in terms of
Relational-DBMS and are trying to index as-is their tables with relations
(and different PKs) instead of indexing their documents... I have constantly
1000+ docs per second now, with 5%-15% CPU... small docs 5Kb in size in
average, 7 fields... yes, correct, 3M+ docs in an hour... could be 10 times
more!!! (5%-15%CPU currently)
        Fuad

>With a relational database, the approach that has been working for us  
>and many customers is to first give DataImportHandler a go.  It's  
>powerful and fast.  3M docs should index in about an hour or less, I'd  
>speculate.  But using DIH does require making access from Solr to the  
>DB server solid, of course.
>
>       Erik



Reply via email to