ctions, is it a bug?
>
> Thanks.
>
> Shushuai
>
>
> From: Erick Erickson
> To: solr-user@lucene.apache.org
> Sent: Friday, August 15, 2014 7:30 PM
> Subject: Re: Solr cloud performance degradation with billions of documents
>
>
> Toke:
>
> bq: I would h
a bug?
Thanks.
Shushuai
From: Erick Erickson
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:30 PM
Subject: Re: Solr cloud performance degradation with billions of documents
Toke:
bq: I would have agreed with you fully an hour ago.
Well, I now disagree with myself too
Toke:
bq: I would have agreed with you fully an hour ago.
Well, I now disagree with myself too :) I don't mind
talking to myself. I don't even mind arguing with myself. I
really _do_ mind losing the arguments I have with
myself though.
Scott:
OK, that has a much better chance of working
Erick Erickson [erickerick...@gmail.com] wrote:
> I guess that my main issue is that from everything I've seen so far,
> this project is doomed. You simply cannot put 7B documents in a single
> shard, period. Lucene has a 2B hard limit.
I would have agreed with you fully an hour ago and actually p
Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
> You make some very good valid points. Let me clear a few things up, though.
> We are not trying to put 7B docs into one single shard, because we are using
> collections, created daily, which spread the index across the 32 shards th
ideal, to ensure the project succeeds and comes in under
budget.
Thanks,
Scott
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, August 15, 2014 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud performance degradation with billions of
Toke:
You make valid points. You're completely right that my reflexes are
for sub-second responses so I tend to think of lots and lots of memory
being a requirement. I agree that depending on the problem space the
percentage of the index that has to be in memory varies widely, I've
seen a large va
Erick Erickson [erickerick...@gmail.com] wrote:
> Solr requires holding large parts of the index in memory.
> For the entire corpus. At once.
That requirement is under the assumption that one must have the lowest possible
latency at each individual box. You might as well argue for the fastest
po
few things to try, thanks to all of your comments. I am very
appreciative.
Thanks,
Scott
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, August 14, 2014 8:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud performance degradation with
You are absolutely on the bleeding edge.
I know of a couple of projects that are at that scale, but
1> they aren't being done on just a few nodes. As Jack
says, this scale for SolrCloud is not common and there
are no OOB templates to follow.
2> AFAIK, the projects I'm talking about aren't in
Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
> Thanks for your suggestion to look into MapReduceIndexerTool, I'm looking
> into that now.
> I agree what I am trying to do is a tall order, and the more I hear from all
> of your
> comments, the more I am convinced that lack of
Wilburn, Scott
Sent: Thursday, August 14, 2014 11:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr cloud performance degradation with billions of documents
Erick,
Thanks for your suggestion to look into MapReduceIndexerTool, I'm looking
into that now. I agree what I am trying to do is
nks,
Scott
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, August 13, 2014 4:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud performance degradation with billions of documents
Several points:
1> Have you considered using the MapReduceIn
Several points:
1> Have you considered using the MapReduceIndexerTool for your ingestion?
Assuming you don't have duplicate IDs, i.e. each doc is new, you can spread
your indexing across as many nodes as you have in your cluster. That said,
it's not entirely clear that you'll gain throughput since
ent: Wednesday, August 13, 2014 5:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr cloud performance degradation with billions of documents
Thanks for replying Jack. I have 4 SolrCloud instances( or clusters ), each
consisting of 32 shards. The clusters do not have any interaction with eac
Hi - You are running mapred jobs on the same nodes as Solr runs right? The
first thing i would think of is that your OS file buffer cache is abused. The
mappers read all data, presumably residing on the same node. The mapper output
and shuffling part would take place on the same node, only the r
Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
> Hardware wise, I have a 32-node Hadoop cluster that I use to run all of the
> Solr shards and
> each node has 128GB of memory. The current SolrCloud setup is split into 4 >
> separate and
> individual clouds of 32 shards each the
:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud performance degradation with billions of documents
Could you clarify what you mean with the term "cloud", as in "per cloud" and
"individual clouds"? That's not a proper Solr or SolrCloud concept per
Could you clarify what you mean with the term "cloud", as in "per cloud" and
"individual clouds"? That's not a proper Solr or SolrCloud concept per se.
SolrCloud works with a single "cluster" of nodes. And there is no
interaction between separate SolrCloud clusters.
-- Jack Krupansky
-Ori
19 matches
Mail list logo