Yago Riveiro <yago.rive...@gmail.com> wrote:
> One thing that I forget to mention is that my clients can aggregate
> by any field in the schema with limit=-1, this is not a problem with
> 99% of the fields, but 2 or 3 of them are URLs. URLs has very
> high cardinality and one of the reasons to sharding collections is
> to lower the memory footprint to not blow the node and do the
> last merge in a big machine.

That is really a job for streaming, not simple faceting.

Even if you insist on faceting, the problem remains that your merger needs to 
be powerful enough to process the full result set. Using that machine with a 
single shard collection instead would eliminate the excessive overhead of doing 
distributed faceting on millions of values, sparing a lot of hardware 
allocation, which could be used to beef up the single-shard hardware even more.

[Toke: You can always split later]

> Every time I run the SPLITSHARD command, the command fails
> in a different way. IMHO right now Solr doesn’t have an efficient
> way to rebalance collection’s shard.

Okay. You coul create a new collection with the wanted amount of shards and do 
a full re-index into that.

[Toke: "And yes, more logistics on your part as one size no longer fits all”]

> The key point of this deploy is reduce the amount of management
> as much as possible,

That is your prerogative. I hope my suggestions can be used by other people 
with similar challenges then.

- Toke Eskildsen

Reply via email to