I've been looking around some of the posts, wiki and such and haven't fully found the answer. So, if someone might take a moment to explain some of the nuances given the following.
So say you have a large number of datasets with some common fields, Additionally, the each dataset may or may no have more fields (i.e. meta data) specific to that dataset. For deployment and maintenance considerations, we would like to define a schema to use dynamic fields or a super set of all possible fields across the datasets or a combination of superset with dynamic. When indexing data, my understanding is if the field is defined but not required and no data is provided then that field will be empty. Our view is that each dataset will have at least one Solr index. Focusing on the definitions in the shema.xml, given the above setup, my questions are: How does defining fields that may not be used affect an index? - Is there more of a performance hit on the index/update? And/or on the query? - Does the number of fields specified in the schema affect the index size? - Specifying dynamic fields in the schema to account for the differences, would there be a difference in performance versus defining the superset? Do copyFields effect performance of an index? If there is a document or documentation that does outline this and I've missed it, I'd appreciate being pointed in that direction. Thanks robbin