On 1/28/2019 8:12 PM, Monica Skidmore wrote:
I would have to negotiate with the middle-ware teams - but, we've used a core per customer in master-slave mode for about 3 years now, with great success. Our pool of data is very large, so limiting a customer's searches to just their core keeps query times fast (or at least reduces the chances of one customer impacting another with expensive queries. There is also a little security added - since the customer is required to provide the core to search, there is less chance that they'll see another customer's data in their responses (like they might if they 'forgot' to add a filter to their query. We were hoping that moving to Cloud would help our management of the largest customers - some of which we'd like to sub-shard with the cloud tooling. We expected cloud to support as many cores/collections as our 2-versions-old Solr instances - but we didn't count on all the increased network traffic or the extra complications of bringing up a large cloud cluster.
At this time, SolrCloud will not handle what you're trying to throw at it. Without Cloud, Solr can fairly easily handle thousands of indexes, because there is no communication between nodes about cluster state. The immensity of that communication (handled via ZooKeeper) is why SolrCloud can't scale to thousands of shard replicas.
The solution to this problem will be twofold: 1) Reduce the number of work items in the Overseer queue. 2) Make the Overseer do its job a lot faster. There have been small incremental improvements towards these goals, but as you've noticed, we're definitely not there yet.
On the subject of a customer forgetting to add a filter ... your systems should be handling that for them ... if the customer has direct access to Solr, then all bets are off... they'll be able to do just about anything they want. It is possible to configure a proxy to limit what somebody can get to, but it would be pretty complicated to come up with a proxy configuration that fully locks things down.
Using shards is completely possible without SolrCloud. But SolrCloud certainly does make it a lot easier.
How many records in your largest customer indexes? How big are those indexes on disk?
Thanks, Shawn