On Wed, 2015-03-25 at 03:46 +0100, Ian Rose wrote: > Thus theoretically we could actually just use one single collection for >all of our customers (adding a 'customer:<whatever>' type fq to all > queries) but since we never need to query across customers it seemed > more performant (as well as safer - less chance of accidentally > leaking data across customers) to use separate collections.
If only a few customers are active at a given time, it is more performant to use a collestion/customer. If many of them are active, the more performant option is to lump them together and filter on a field, due to the redundancy-reduction of larger indexes. The 1 collection/customer solution has another edge as ranking will be calculated based on the corpus of the customer and not based on all customers. If the number of customers is low enough to get the individual collections solution to work, that would be the preferable solution. - Toke Eskildsen, State and University Library, Denmark