OK, clarify a bit more what you're doing with Hadoop. Are you using
the MapReduceIndexerTool? Or are your Hadoop jobs writing directly to
SolrCloud?

How are you measuring "out of sync"? Are you sure that you've
committed? Does "out of synch" mean reporting different result counts?
Different order? Different numbers of deleted docs? Completely
different search results? How do you know? Do you measure with
&distrib=false to each one?

Details matter a lot here ;)
Erick

On Sun, Oct 26, 2014 at 9:59 PM, S.L <simpleliving...@gmail.com> wrote:
> Folks,
>
> I have posted previously about this , I am using SolrCloud 4.10.1 and have
> a sharded collection with  6 nodes , 3 shards and a replication factor of 2.
>
> I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that
> can each have upto 5 threds each , so the load on the indexing side can get
> to as high as 75 concurrent threads.
>
> I am facing an issue where the replicas of a particular shard(s) are
> consistently getting out of synch , initially I thought this was beccause I
> was using a custom component , but I did a fresh install and removed the
> custom component and reindexed using the Hadoop job , I still see the same
> behavior.
>
> I do not see any exceptions in my catalina.out , like OOM , or any other
> excepitions, I suspecting thi scould be because of the multi-threaded
> indexing nature of the Hadoop job . I use CloudSolrServer from my java code
> to index and initialize the CloudSolrServer using a 3 node ZK ensemble.
>
> Does any one know of any known issues with a highly multi-threaded indexing
> and SolrCloud ?
>
> Can someone help ? This issue has been slowing things down on my end for a
> while now.
>
> Thanks and much appreciated!

Reply via email to