subject:"Re\: Solr cloud performance degradation with billions of documents"

Re: Solr cloud performance degradation with billions of documents

2014-08-17 Thread Erick Erickson

ctions, is it a bug? > > Thanks. > > Shushuai > > > From: Erick Erickson > To: solr-user@lucene.apache.org > Sent: Friday, August 15, 2014 7:30 PM > Subject: Re: Solr cloud performance degradation with billions of documents > > > Toke: > > bq: I would h

Re: Solr cloud performance degradation with billions of documents

2014-08-16 Thread shushuai zhu

a bug? Thanks. Shushuai From: Erick Erickson To: solr-user@lucene.apache.org Sent: Friday, August 15, 2014 7:30 PM Subject: Re: Solr cloud performance degradation with billions of documents Toke: bq: I would have agreed with you fully an hour ago. Well, I now disagree with myself too

Re: Solr cloud performance degradation with billions of documents

2014-08-15 Thread Erick Erickson

Toke: bq: I would have agreed with you fully an hour ago. Well, I now disagree with myself too :) I don't mind talking to myself. I don't even mind arguing with myself. I really _do_ mind losing the arguments I have with myself though. Scott: OK, that has a much better chance of working

RE: Solr cloud performance degradation with billions of documents

2014-08-15 Thread Toke Eskildsen

Erick Erickson [erickerick...@gmail.com] wrote: > I guess that my main issue is that from everything I've seen so far, > this project is doomed. You simply cannot put 7B documents in a single > shard, period. Lucene has a 2B hard limit. I would have agreed with you fully an hour ago and actually p

RE: Solr cloud performance degradation with billions of documents

2014-08-15 Thread Toke Eskildsen

Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote: > You make some very good valid points. Let me clear a few things up, though. > We are not trying to put 7B docs into one single shard, because we are using > collections, created daily, which spread the index across the 32 shards th

RE: Solr cloud performance degradation with billions of documents

2014-08-15 Thread Wilburn, Scott

ideal, to ensure the project succeeds and comes in under budget. Thanks, Scott -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, August 15, 2014 7:52 AM To: solr-user@lucene.apache.org Subject: Re: Solr cloud performance degradation with billions of

Re: Solr cloud performance degradation with billions of documents

2014-08-15 Thread Erick Erickson

Toke: You make valid points. You're completely right that my reflexes are for sub-second responses so I tend to think of lots and lots of memory being a requirement. I agree that depending on the problem space the percentage of the index that has to be in memory varies widely, I've seen a large va

RE: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Toke Eskildsen

Erick Erickson [erickerick...@gmail.com] wrote: > Solr requires holding large parts of the index in memory. > For the entire corpus. At once. That requirement is under the assumption that one must have the lowest possible latency at each individual box. You might as well argue for the fastest po

RE: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Wilburn, Scott

few things to try, thanks to all of your comments. I am very appreciative. Thanks, Scott -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, August 14, 2014 8:31 AM To: solr-user@lucene.apache.org Subject: Re: Solr cloud performance degradation with

Re: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Erick Erickson

You are absolutely on the bleeding edge. I know of a couple of projects that are at that scale, but 1> they aren't being done on just a few nodes. As Jack says, this scale for SolrCloud is not common and there are no OOB templates to follow. 2> AFAIK, the projects I'm talking about aren't in

RE: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Toke Eskildsen

Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote: > Thanks for your suggestion to look into MapReduceIndexerTool, I'm looking > into that now. > I agree what I am trying to do is a tall order, and the more I hear from all > of your > comments, the more I am convinced that lack of

Re: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Jack Krupansky

Wilburn, Scott Sent: Thursday, August 14, 2014 11:05 AM To: solr-user@lucene.apache.org Subject: RE: Solr cloud performance degradation with billions of documents Erick, Thanks for your suggestion to look into MapReduceIndexerTool, I'm looking into that now. I agree what I am trying to do is

RE: Solr cloud performance degradation with billions of documents

2014-08-14 Thread Wilburn, Scott

nks, Scott -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 13, 2014 4:48 PM To: solr-user@lucene.apache.org Subject: Re: Solr cloud performance degradation with billions of documents Several points: 1> Have you considered using the MapReduceIn

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Erick Erickson

Several points: 1> Have you considered using the MapReduceIndexerTool for your ingestion? Assuming you don't have duplicate IDs, i.e. each doc is new, you can spread your indexing across as many nodes as you have in your cluster. That said, it's not entirely clear that you'll gain throughput since

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Jack Krupansky

ent: Wednesday, August 13, 2014 5:42 PM To: solr-user@lucene.apache.org Subject: RE: Solr cloud performance degradation with billions of documents Thanks for replying Jack. I have 4 SolrCloud instances( or clusters ), each consisting of 32 shards. The clusters do not have any interaction with eac

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Markus Jelsma

Hi - You are running mapred jobs on the same nodes as Solr runs right? The first thing i would think of is that your OS file buffer cache is abused. The mappers read all data, presumably residing on the same node. The mapper output and shuffling part would take place on the same node, only the r

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Toke Eskildsen

Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote: > Hardware wise, I have a 32-node Hadoop cluster that I use to run all of the > Solr shards and > each node has 128GB of memory. The current SolrCloud setup is split into 4 > > separate and > individual clouds of 32 shards each the

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Wilburn, Scott

:17 PM To: solr-user@lucene.apache.org Subject: Re: Solr cloud performance degradation with billions of documents Could you clarify what you mean with the term "cloud", as in "per cloud" and "individual clouds"? That's not a proper Solr or SolrCloud concept per

Re: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Jack Krupansky

Could you clarify what you mean with the term "cloud", as in "per cloud" and "individual clouds"? That's not a proper Solr or SolrCloud concept per se. SolrCloud works with a single "cluster" of nodes. And there is no interaction between separate SolrCloud clusters. -- Jack Krupansky -Ori

Re: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

RE: Solr cloud performance degradation with billions of documents

Re: Solr cloud performance degradation with billions of documents

19 matches

Site Navigation

Mail list logo

Footer information