It sounds like DataImportHandler will not be really performant with SolrCloud. From what I see it should essentiallly work - it sends doc to the chain, which should distribute them via DistributedUpdateProcessor. But it works synchronously - no multithreading in DIH since 4.0! Does anyone has an experience or idea of fast data acquisition with DIH&SolrCloud? Excuse me for thread hijacking.
On Tue, Nov 27, 2012 at 8:10 PM, Mark Miller <markrmil...@gmail.com> wrote: > To get the best speed out of SolrCloud you have to index from many clients > (or threads). Even better is if you index to many nodes rather than one. > > Using a single thread against a single instance with replicas will be a > fair amount slower with cloud than if you just used one node. > > - Mark > > On Nov 27, 2012, at 12:02 AM, deniz <denizdurmu...@gmail.com> wrote: > > > As I am some kinda confused, I wanna check if anyone else has same > confusions > > like mine about solrcloud.. > > > > I have set up an environment with 3 solr instances and 2 zookeepers, amd > > tried to index some documents from mysql db. the total amount the docs > are > > around 3.5M. before indexing i was expecting some longer time for cloud > as > > it does replication between nodes, but i am some kinda disappointed after > > seeing that indexing took 4 to 5 times higher than indexing on a single > solr > > instance. on a single solr instance i am able to index those docs around > 17 > > mins while with cloud it tooks around 60 minutes. and as a possible > > production environment will have more instances and machines available > for > > the cloud, i cant imagine the indexing time... in adiditon to initial > > indexing time, we will be updating our indexes frequently, which makes me > > sceptical about solrcloud. > > > > so in a possible production environment with solrcloud, in case there is > a > > serious failure on some nodes, sync operation on cloud will take long > > time... in this case, reindexing everything on a single instance will > took > > less than 17 mins, which is a reasonable amount of time for a crash.. so > in > > this case does it make sense use solrcloud although indexing time will > > increase much higher than a single instance? or using a traditional > master - > > slave structure will be better for this case? > > > > I am aware cloud makes loadbalancing and some other stuff largely > concerned > > about searching, rather than indexing, but for a frequently updated > system, > > does it still useful to set up a cloud environment? > > > > and are there some workarounds for indexing speed, other than the known > ones > > for solr, on cloud? > > > > > > > > ----- > > Zeki ama calismiyor... Calissa yapar... > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Indexing-tp4022549.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>