In simple words:
HDFS is good for file-oriented replication. Solr is good for index replication.
Consequently, if atomic file update operations of an application (like Solr)
are not atomic on a file level, HDFS is not adequate - like for Solr with live
index updates. Running Solr on HDFS (as a
Thanks for the suggestion, Erick. However here what we need is not a patch,
is a clarification from practice perspective.
I think solr replication is a great feature to scale reads, and kind of
increase reliability. However, on HDFS it is not as useful as just
sharding. Sharding can scale both rea
Please see my response in line:
On Fri, Apr 17, 2015 at 10:59 PM Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> Some comments inline:
>
> On Sat, Apr 18, 2015 at 2:12 PM, gengmao wrote:
>
> > On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" <
> > juergen.wag...@devoteam.com> wrot
AFAIK, the HDFS replication of Solr indexes isn't something that was
designed, it just came along for the ride given HDFS replication.
Having a shard with 1 leader and two followers have 9 copies of the
index around _is_ overkill, nobody argues that at all.
I know the folks at Cloudera (who contri
Some comments inline:
On Sat, Apr 18, 2015 at 2:12 PM, gengmao wrote:
> On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" <
> juergen.wag...@devoteam.com> wrote:
>
> > Replication on the storage layer will provide a reliable storage for the
> > index and other data of Solr. In particular,
On Sat, Apr 18, 2015 at 12:20 AM "Jürgen Wagner (DVT)" <
juergen.wag...@devoteam.com> wrote:
> Replication on the storage layer will provide a reliable storage for the
> index and other data of Solr. In particular, this replication does not
> guarantee your index files are consistent at any time
Replication on the storage layer will provide a reliable storage for the
index and other data of Solr. In particular, this replication does not
guarantee your index files are consistent at any time as there may be
intermediate states that are only partially replicated. Replication is
only a converg
I wonder why need to use SolrCloud replication on HDFS at all, given HDFS
already provides replication and availability? The way to optimize
performance and scalability should be tweaking shards, just like tweaking
regions on HBase - which doesn't provide "region replication" too, isn't
it?
I have
Yes. 3 replicas and an HDFS replication factor of 3 means 9 copies of
the index are laying around. You can change your HDFS replication
factor, but that affects other applications using HDFS, so that may
not be an option.
Best,
Erick
On Thu, Apr 9, 2015 at 2:31 AM, Vijaya Narayana Reddy Bhoomi Re
Hi,
Can anyone please tell me how does shard replication work when the indexes
are stored in HDFS? i..e with HDFS, the default replication factor is 3.
Now, for the Solr shards, if I set the replication factor to 3 again, does
that mean, internally index data is replicated thrice and then HDFS
rep
10 matches
Mail list logo