Use DocValues. On Wed, Feb 25, 2015 at 3:14 PM, Joseph Obernberger <j...@lovehorsepower.com > wrote:
> Thank you! I'm mainly concerned about facet performance. When we have > indexing turned on, our facet performance suffers significantly. > I will add replicas and measure the performance change. > > -Joe Obernberger > > > On 2/25/2015 4:31 PM, Erick Erickson wrote: > >> bq: Is adding replicas going to increase search performance? >> >> Absolutely, assuming you've maxed out Solr. You can scale the SOLR >> query/second rate nearly linearly by adding replicas regardless of >> whether it's over HDFS or not. >> >> Having multiple replicas per shard _also_ increases fault tolerance, >> so you get both. Even with HDFS, though, a single replica (just a >> leader) per shard means that you don't have any redundancy if the >> motherboard on that server dies even though HDFS has multiple copies >> of the _data_. >> >> Best, >> Erick >> >> On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger >> <j...@lovehorsepower.com> wrote: >> >>> I am also confused on this. Is adding replicas going to increase search >>> performance? I'm not sure I see the point of any replicas when using >>> HDFS. >>> Is there one? >>> Thank you! >>> >>> -Joe >>> >>> >>> On 2/25/2015 10:57 AM, Erick Erickson wrote: >>> >>>> bq: And the data sync between leader/replica is always a problem >>>> >>>> Not quite sure what you mean by this. There shouldn't need to be >>>> any synching in the sense that the index gets replicated, the >>>> incoming documents should be sent to each node (and indexed >>>> to HDFS) as they come in. >>>> >>>> bq: There is duplicate index computing on Replilca side. >>>> >>>> Yes, that's the design of SolrCloud, explicitly to provide data safety. >>>> If you instead rely on the leader to index and somehow pull that >>>> indexed form to the replica, then you will lose data if the leader >>>> goes down before sending the indexed form. >>>> >>>> bq: My thought is that the leader and the replica all bind to the same >>>> data >>>> index directory. >>>> >>>> This is unsafe. They would both then try to _write_ to the same >>>> index, which can easily corrupt indexes and/or all but the first >>>> one to access the index would be locked out. >>>> >>>> All that said, the HDFS triple-redundancy compounded with the >>>> Solr leaders/replicas redundancy means a bunch of extra >>>> storage. You can turn the HDFS replication down to 1, but that has >>>> other implications. >>>> >>>> Best, >>>> Erick >>>> >>>> On Tue, Feb 24, 2015 at 11:12 PM, longsan <longsan...@sina.com> wrote: >>>> >>>>> We used HDFS as our Solr index storage and we really have a heavy >>>>> update >>>>> load. We had met much problems with current leader/replica solution. >>>>> There >>>>> is duplicate index computing on Replilca side. And the data sync >>>>> between >>>>> leader/replica is always a problem. >>>>> >>>>> As HDFS already provides data replication on data layer, could Solr >>>>> provide >>>>> just service layer replication? >>>>> >>>>> My thought is that the leader and the replica all bind to the same data >>>>> index directory. And the leader will build up index for new request, >>>>> the >>>>> replica will just keep update the index version with the leader(such >>>>> as a >>>>> soft commit periodically? ). If the leader lost then the replica will >>>>> take >>>>> the duty immediately. >>>>> >>>>> Thanks for any suggestion of this idea. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://lucene.472066.n3.nabble.com/New-leader-replica- >>>>> solution-for-HDFS-tp4188735.html >>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>> >>>> >>> > -- Bill Bell billnb...@gmail.com cell 720-256-8076