Does it all have to be in a single cloud? On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey <apa...@elyograg.org wrote:
> On 1/28/2019 8:12 PM, Monica Skidmore wrote: > > I would have to negotiate with the middle-ware teams - but, we've used a > core per customer in master-slave mode for about 3 years now, with great > success. Our pool of data is very large, so limiting a customer's searches > to just their core keeps query times fast (or at least reduces the chances > of one customer impacting another with expensive queries. There is also a > little security added - since the customer is required to provide the core > to search, there is less chance that they'll see another customer's data in > their responses (like they might if they 'forgot' to add a filter to their > query. We were hoping that moving to Cloud would help our management of > the largest customers - some of which we'd like to sub-shard with the cloud > tooling. We expected cloud to support as many cores/collections as our > 2-versions-old Solr instances - but we didn't count on all the increased > network traffic or the extra complications of bringing up a large cloud > cluster. > > At this time, SolrCloud will not handle what you're trying to throw at > it. Without Cloud, Solr can fairly easily handle thousands of indexes, > because there is no communication between nodes about cluster state. > The immensity of that communication (handled via ZooKeeper) is why > SolrCloud can't scale to thousands of shard replicas. > > The solution to this problem will be twofold: 1) Reduce the number of > work items in the Overseer queue. 2) Make the Overseer do its job a lot > faster. There have been small incremental improvements towards these > goals, but as you've noticed, we're definitely not there yet. > > On the subject of a customer forgetting to add a filter ... your systems > should be handling that for them ... if the customer has direct access > to Solr, then all bets are off... they'll be able to do just about > anything they want. It is possible to configure a proxy to limit what > somebody can get to, but it would be pretty complicated to come up with > a proxy configuration that fully locks things down. > > Using shards is completely possible without SolrCloud. But SolrCloud > certainly does make it a lot easier. > > How many records in your largest customer indexes? How big are those > indexes on disk? > > Thanks, > Shawn >