Re: New leader/replica solution for HDFS

William Bell Wed, 25 Feb 2015 22:52:29 -0800

Use DocValues.

On Wed, Feb 25, 2015 at 3:14 PM, Joseph Obernberger <j...@lovehorsepower.com
> wrote:


> Thank you!  I'm mainly concerned about facet performance.  When we have
> indexing turned on, our facet performance suffers significantly.
> I will add replicas and measure the performance change.
>
> -Joe Obernberger
>
>
> On 2/25/2015 4:31 PM, Erick Erickson wrote:
>
>> bq: Is adding replicas going to increase search performance?
>>
>> Absolutely, assuming you've maxed out Solr. You can scale the SOLR
>> query/second rate nearly linearly by adding replicas regardless of
>> whether it's over HDFS or not.
>>
>> Having multiple replicas per shard _also_ increases fault tolerance,
>> so you get both. Even with HDFS, though, a single replica (just a
>> leader) per shard means that you don't have any redundancy if the
>> motherboard on that server dies even though HDFS has multiple copies
>> of the _data_.
>>
>> Best,
>> Erick
>>
>> On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger
>> <j...@lovehorsepower.com> wrote:
>>
>>> I am also confused on this.  Is adding replicas going to increase search
>>> performance?  I'm not sure I see the point of any replicas when using
>>> HDFS.
>>> Is there one?
>>> Thank you!
>>>
>>> -Joe
>>>
>>>
>>> On 2/25/2015 10:57 AM, Erick Erickson wrote:
>>>
>>>> bq: And the data sync between leader/replica is always a problem
>>>>
>>>> Not quite sure what you mean by this. There shouldn't need to be
>>>> any synching in the sense that the index gets replicated, the
>>>> incoming documents should be sent to each node (and indexed
>>>> to HDFS) as they come in.
>>>>
>>>> bq: There is duplicate index computing on Replilca side.
>>>>
>>>> Yes, that's the design of SolrCloud, explicitly to provide data safety.
>>>> If you instead rely on the leader to index and somehow pull that
>>>> indexed form to the replica, then you will lose data if the leader
>>>> goes down before sending the indexed form.
>>>>
>>>> bq: My thought is that the leader and the replica all bind to the same
>>>> data
>>>> index directory.
>>>>
>>>> This is unsafe. They would both then try to _write_ to the same
>>>> index, which can easily corrupt indexes and/or all but the first
>>>> one to access the index would be locked out.
>>>>
>>>> All that said, the HDFS triple-redundancy compounded with the
>>>> Solr leaders/replicas redundancy means a bunch of extra
>>>> storage. You can turn the HDFS replication down to 1, but that has
>>>> other implications.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Tue, Feb 24, 2015 at 11:12 PM, longsan <longsan...@sina.com> wrote:
>>>>
>>>>> We used HDFS as our Solr index storage and we really have a heavy
>>>>> update
>>>>> load. We had met much problems with current leader/replica solution.
>>>>> There
>>>>> is duplicate index computing on Replilca side. And the data sync
>>>>> between
>>>>> leader/replica is always a problem.
>>>>>
>>>>> As HDFS already provides data replication on data layer, could Solr
>>>>> provide
>>>>> just service layer replication?
>>>>>
>>>>> My thought is that the leader and the replica all bind to the same data
>>>>> index directory. And the leader will build up index for new request,
>>>>> the
>>>>> replica will just keep update the index version with the leader(such
>>>>> as a
>>>>> soft commit periodically? ). If the leader lost then the replica will
>>>>> take
>>>>> the duty immediately.
>>>>>
>>>>> Thanks for any suggestion of this idea.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/New-leader-replica-
>>>>> solution-for-HDFS-tp4188735.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: New leader/replica solution for HDFS

Reply via email to