On 1/28/2019 8:12 PM, Monica Skidmore wrote:
I would have to negotiate with the middle-ware teams - but, we've used a core 
per customer in master-slave mode for about 3 years now, with great success.  
Our pool of data is very large, so limiting a customer's searches to just their 
core keeps query times fast (or at least reduces the chances of one customer 
impacting another with expensive queries.  There is also a little security 
added - since the customer is required to provide the core to search, there is 
less chance that they'll see another customer's data in their responses (like 
they might if they 'forgot' to add a filter to their query.  We were hoping 
that moving to Cloud would help our management of the largest customers - some 
of which we'd like to sub-shard with the cloud tooling.  We expected cloud to 
support as many cores/collections as our 2-versions-old Solr instances - but we 
didn't count on all the increased network traffic or the extra complications of 
bringing up a large cloud cluster.

At this time, SolrCloud will not handle what you're trying to throw at it. Without Cloud, Solr can fairly easily handle thousands of indexes, because there is no communication between nodes about cluster state. The immensity of that communication (handled via ZooKeeper) is why SolrCloud can't scale to thousands of shard replicas.

The solution to this problem will be twofold: 1) Reduce the number of work items in the Overseer queue. 2) Make the Overseer do its job a lot faster. There have been small incremental improvements towards these goals, but as you've noticed, we're definitely not there yet.

On the subject of a customer forgetting to add a filter ... your systems should be handling that for them ... if the customer has direct access to Solr, then all bets are off... they'll be able to do just about anything they want. It is possible to configure a proxy to limit what somebody can get to, but it would be pretty complicated to come up with a proxy configuration that fully locks things down.

Using shards is completely possible without SolrCloud. But SolrCloud certainly does make it a lot easier.

How many records in your largest customer indexes? How big are those indexes on disk?

Thanks,
Shawn

Reply via email to