On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" < juergen.wag...@devoteam.com> wrote:
> Replication on the storage layer will provide a reliable storage for the > index and other data of Solr. In particular, this replication does not > guarantee your index files are consistent at any time as there may be > intermediate states that are only partially replicated. Replication is only > a convergent process, not an instant, atomic operation. With frequent > changes, this becomes an issue. > Firstly thanks for your reply. However I can't agree with you on this. HDFS guarantees the consistency even with replicates - you always read what you write, no partially replicated state will be read, which is guaranteed by HDFS server and client. Hence HBase can rely on HDFS for consistency and availability, without implementing another replication mechanism - if I understand correctly. > Replication inside SolrCloud as an application will not only maintain the > consistency of the search-level interfaces to your indexes, but also scale > in the sense of the application (query throughput). > Split one shard into two shards can increase the query throughput too. > Imagine a database: if you change one record, this may also result in an > index change. If the record and the index are stored in different storage > blocks, one will get replicated first. However, the replication target will > only be consistent again when both have been replicated. So, you would have > to suspend all accesses until the entire replication has completed. That's > undesirable. If you replicate on the application (database management > system) level, the application will employ a more fine-grained approach to > replication, guaranteeing application consistency. > In HBase, a region only locates on single region server at any time, which guarantee its consistency. Because your read/write always drops in one region, you won't have concern of parallel writes happens on multiple replicates of same region. The replication of HDFS is totally transparent to HBase. When a HDFS write call returns, HBase know the data is written and replicated so losing one copy of the data won't impact HBase at all. So HDFS means consistency and reliability for HBase. However, HBase doesn't use replicates (either HBase itself or HDFS's) to scale reads. If one region's is too "hot" for reads or write, you split that region into two regions, so that the reads and writes of that region can be distributed into two region servers. Hence HBase scales. I think this is the simplicity and beauty of HBase. Again, I am curious if SolrCloud has better reason to use replication on HDFS? As I described, HDFS provided consistency and reliability, meanwhile scalability can be achieved via sharding, even without Solr replication. > Consequently, HDFS will allow you to scale storage and possibly even > replicate static indexes that won't change, but it won't help much with > live index replication. That's where SolrCloud jumps in. > > Cheers, > --Jürgen > > > On 18.04.2015 08:44, gengmao wrote: > > I wonder why need to use SolrCloud replication on HDFS at all, given HDFS > already provides replication and availability? The way to optimize > performance and scalability should be tweaking shards, just like tweaking > regions on HBase - which doesn't provide "region replication" too, isn't > it? > > I have this question for a while and I didn't find clear answer about it. > Could some experts please explain a bit? > > Best regards, > Mao Geng > > > > > > -- > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > уважением > *i.A. Jürgen Wagner* > Head of Competence Center "Intelligence" > & Senior Cloud Consultant > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de > ------------------------------ > Managing Board: Jürgen Hatzipantelis (CEO) > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > >