Again, no hard limits, mostly performance-based limits and environmental factors of your own environment, as well as the fact that most people on this list will have deeper experience with smaller clusters, so if you decide to "go big", you will be in uncharted and untested territory.

I would relax my number a little (actually, double it) to 64 nodes, to handle the 8-shard, 8-replica case, since just yesterday somebody on the list mentioned that they were using such a configuration.

In other words, with configurations up to 16 or 32 or even 64 nodes, you will readily find people here who might be able to help support you, but if you are thinking of a 16-shard, 16-replica cluster with 256 nodes or 32-shard, 32-replica cluster with 1,024 nodes, it's not that that will hit any hard limit in Solr, but simply that not as many people will be able to provide support, answer questions, or simply confirm that yes, a cluster that big is a... "slam-dunk." And if you do want to try a 1,024-node cluster, you absolutely should do a Proof of Concept implementation first.

I actually don't have any hard, empirical evidence to back up my 32/64-node guidance, but it seems reasonable and consistent with configurations people commonly talk about. Generally, people talk about smaller clusters, so I'm stretching a little to get up to my 32/64 guidance. And, to be clear, that's just a rough guide and not intended to guarantee that a 64-node cluster will perform really well, nor to imply that a 96-node or 128-node cluster won't perform well.

-- Jack Krupansky

-----Original Message----- From: Ramkumar R. Aiyengar
Sent: Wednesday, July 10, 2013 4:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr limitations

I understand, thanks. I just wanted to check in case there were scalability
limitations with how SolrCloud operates..
On 9 Jul 2013 12:45, "Erick Erickson" <erickerick...@gmail.com> wrote:

I think Jack was mostly thinking in "slam dunk" terms. I know of
SolrCloud demo clusters with 500+ nodes, and at that point
people said "it's going to work for our situation, we don't need
to push more".

As you start getting into that kind of scale, though, you really
have a bunch of ops considerations etc. Mostly when I get into
larger scales I pretty much want to examine my assumptions
and see if they're correct, perhaps start to trim my requirements
etc.

FWIW,
Erick

On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar
<andyetitmo...@gmail.com> wrote:
>> 5. No more than 32 nodes in your SolrCloud cluster.
>
> I hope this isn't too OT, but what tradeoffs is this based on? Would > have
> thought it easy to hit this number for a big index and high load (hence
> with the view of both the number of shards and replicas horizontally
> scaling..)
>
>> 6. Don't return more than 250 results on a query.
>>
>> None of those is a hard limit, but don't go beyond them unless your
Proof
> of Concept testing proves that performance is acceptable for your
situation.
>>
>> Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary
> tests and then scale as needed.
>>
>> Dynamic and multivalued fields? Try to stay away from them - excepts >> for
> the simplest cases, they are usually an indicator of a weak data model.
> Sure, it's fine to store a relatively small number of values in a
> multivalued field (say, dozens of values), but be aware that you can't
> directly access individual values, you can't tell which was matched on a
> query, and you can't coordinate values between multiple multivalued
fields.
> Except for very simple cases, multivalued fields should be flattened > into
> multiple documents with a parent ID.
>>
>> Since you brought up the topic of dynamic fields, I am curious how you
> got the impression that they were a good technique to use as a starting
> point. They're fine for prototyping and hacking, and fine when used in
> moderation, but not when used to excess. The whole point of Solr is
> searching and searching is optimized within fields, not across fields, > so
> having lots of dynamic fields is counter to the primary strengths of
Lucene
> and Solr. And... schemas with lots  of dynamic fields tend to be
difficult
> to maintain. For example, if you wanted to ask a support question here,
one
> of the first things we want to know is what your schema looks like, but
> with lots of dynamic fields it is not possible to have a simple
discussion
> of what your schema looks like.
>>
>> Sure, there is something called "schemaless design" (and Solr supports
> that in 4.4), but that's very different from heavy reliance on dynamic
> fields in the traditional sense. Schemaless design is A-OK, but using
> dynamic fields for "arrays" of data in a single document is a poor match
> for the search features of Solr (e.g., Edismax searching across multiple
> fields.)
>>
>> One other tidbit: Although Solr does not enforce naming conventions for
> field names, and you can put special characters in them, there are > plenty > of features in Solr, such as the common "fl" parameter, where field > names
> are expected to adhere to Java naming rules. When people start "going
wild"
> with dynamic fields, it is common that they start "going wild" with > their
> names as well, using spaces, colons, slashes, etc. that cannot be parsed
in
> the "fl" and "qf" parameters, for example. Please don't go there!
>>
>> In short, put up a small cluster and start doing a Proof of Concept
> cluster. Stay within my suggested guidelines and you should do okay.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Marcelo Elias Del Valle
>> Sent: Monday, July 08, 2013 9:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr limitations
>>
>>
>> Hello everyone,
>>
>>    I am trying to search information about possible solr limitations I
>> should consider in my architecture. Things like max number of dynamic
>> fields, max number o documents in SolrCloud, etc.
>>    Does anyone know where I can find this info?
>>
>> Best regards,
>> --
>> Marcelo Elias Del Valle
>> http://mvalle.com - @mvallebr


Reply via email to