Re: Replication for SolrCloud

2015-04-19 Thread juergen.wag...@devoteam.com
In simple words: HDFS is good for file-oriented replication. Solr is good for index replication. Consequently, if atomic file update operations of an application (like Solr) are not atomic on a file level, HDFS is not adequate - like for Solr with live index updates. Running Solr on HDFS (as a

Re: Replication for SolrCloud

2015-04-19 Thread gengmao
Thanks for the suggestion, Erick. However here what we need is not a patch, is a clarification from practice perspective. I think solr replication is a great feature to scale reads, and kind of increase reliability. However, on HDFS it is not as useful as just sharding. Sharding can scale both rea

Re: Replication for SolrCloud

2015-04-19 Thread gengmao
Please see my response in line: On Fri, Apr 17, 2015 at 10:59 PM Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Some comments inline: > > On Sat, Apr 18, 2015 at 2:12 PM, gengmao wrote: > > > On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" < > > juergen.wag...@devoteam.com> wrot

Re: Replication for SolrCloud

2015-04-18 Thread Erick Erickson
AFAIK, the HDFS replication of Solr indexes isn't something that was designed, it just came along for the ride given HDFS replication. Having a shard with 1 leader and two followers have 9 copies of the index around _is_ overkill, nobody argues that at all. I know the folks at Cloudera (who contri

Re: Replication for SolrCloud

2015-04-18 Thread Shalin Shekhar Mangar
Some comments inline: On Sat, Apr 18, 2015 at 2:12 PM, gengmao wrote: > On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" < > juergen.wag...@devoteam.com> wrote: > > > Replication on the storage layer will provide a reliable storage for the > > index and other data of Solr. In particular,

Re: Replication for SolrCloud

2015-04-18 Thread gengmao
On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" < juergen.wag...@devoteam.com> wrote: > Replication on the storage layer will provide a reliable storage for the > index and other data of Solr. In particular, this replication does not > guarantee your index files are consistent at any time

Re: Replication for SolrCloud

2015-04-18 Thread Jürgen Wagner (DVT)
Replication on the storage layer will provide a reliable storage for the index and other data of Solr. In particular, this replication does not guarantee your index files are consistent at any time as there may be intermediate states that are only partially replicated. Replication is only a converg

Re: Replication for SolrCloud

2015-04-17 Thread gengmao
I wonder why need to use SolrCloud replication on HDFS at all, given HDFS already provides replication and availability? The way to optimize performance and scalability should be tweaking shards, just like tweaking regions on HBase - which doesn't provide "region replication" too, isn't it? I have

Re: Replication for SolrCloud

2015-04-09 Thread Erick Erickson
Yes. 3 replicas and an HDFS replication factor of 3 means 9 copies of the index are laying around. You can change your HDFS replication factor, but that affects other applications using HDFS, so that may not be an option. Best, Erick On Thu, Apr 9, 2015 at 2:31 AM, Vijaya Narayana Reddy Bhoomi Re

Replication for SolrCloud

2015-04-09 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, Can anyone please tell me how does shard replication work when the indexes are stored in HDFS? i..e with HDFS, the default replication factor is 3. Now, for the Solr shards, if I set the replication factor to 3 again, does that mean, internally index data is replicated thrice and then HDFS rep