On 4/13/2018 1:44 AM, neotorand wrote:
Lets say i have 5 different entities and they have each 10,20,30,40 and 50 attributes(Columns) to be indexed/stored. Now if i store them in single collection.is there any ways empty spaces being created. On other way if i store heterogeneous data items in a single collection, Does by any means there is a poor utilization of memory by creation of empty holes.
If a document doesn't have some of the fields your schema is capable of addressing, no space or memory is consumed for the missing fields.
What are the pros and cons of single vs Multiple.
If you have a single collection for different kinds of data, then you do get *some* economies of scale in the total index size. Whether that means a significant size reduction or a small size reduction depends on your data. The downside to one collection: If you have 500000 of each kind of document and combine five of them into one collection, then every query must look through 2.5 million documents instead of 500000 documents. These are both small numbers for Solr, but the larger index is still going to take more time to search. If there are any possible issues with security, you'll need to include one or more fields with every document with information about the type of document so that you're able to filter results according to the access privileges of the user making a query.
With multiple collections, searching on each one is going to be faster than searching on a combined collection. Whether or not it's enough of a difference to matter will depend on how much data is involved, the nature of that data, and the nature of your queries. But as Erick mentioned, you'll have less capability with Solr's join functionality -- assuming you even need that functionality.
Thanks, Shawn