It sounds like DataImportHandler will not be really performant with
SolrCloud. From what I see it should essentiallly work - it sends doc to
the chain, which should distribute them via DistributedUpdateProcessor. But
it works synchronously - no multithreading in DIH since 4.0!
Does anyone has an experience or idea of fast data acquisition with
DIH&SolrCloud?
Excuse me for thread hijacking.


On Tue, Nov 27, 2012 at 8:10 PM, Mark Miller <markrmil...@gmail.com> wrote:

> To get the best speed out of SolrCloud you have to index from many clients
> (or threads). Even better is if you index to many nodes rather than one.
>
> Using a single thread against a single instance with replicas will be a
> fair amount slower with cloud than if you just used one node.
>
> - Mark
>
> On Nov 27, 2012, at 12:02 AM, deniz <denizdurmu...@gmail.com> wrote:
>
> > As I am some kinda confused, I wanna check if anyone else has same
> confusions
> > like mine about solrcloud..
> >
> > I have set up an environment with 3 solr instances and 2 zookeepers, amd
> > tried to index some documents from mysql db. the total amount the docs
> are
> > around 3.5M. before indexing i was expecting some longer time for cloud
> as
> > it does replication between nodes, but i am some kinda disappointed after
> > seeing that indexing took 4 to 5 times higher than indexing on a single
> solr
> > instance. on a single solr instance i am able to index those docs around
> 17
> > mins while with cloud it tooks around 60 minutes. and as a possible
> > production environment will have more instances and machines available
> for
> > the cloud, i cant imagine the indexing time... in adiditon to initial
> > indexing time, we will be updating our indexes frequently, which makes me
> > sceptical about solrcloud.
> >
> > so in a possible production environment with solrcloud, in case there is
> a
> > serious failure on some nodes, sync operation on cloud will take long
> > time... in this case, reindexing everything on a single instance will
> took
> > less than 17 mins, which is a reasonable amount of time for a crash.. so
> in
> > this case does it make sense use solrcloud although indexing time will
> > increase much higher than a single instance? or using a traditional
> master -
> > slave structure will be better for this case?
> >
> > I am aware cloud makes loadbalancing and some other stuff largely
> concerned
> > about searching, rather than indexing, but for a frequently updated
> system,
> > does it still useful to set up a cloud environment?
> >
> > and are there some workarounds for indexing speed, other than the known
> ones
> > for solr, on cloud?
> >
> >
> >
> > -----
> > Zeki ama calismiyor... Calissa yapar...
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Indexing-tp4022549.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to