Request Solr schema field definition vs dynamic creation performance help questions

Turner, Robbin J Wed, 21 Apr 2010 08:43:45 -0700

I've been looking around some of the posts, wiki and such and haven't fully 
found the answer.  So, if someone might take a moment to explain some of the 
nuances given the following.


So say you have a large number of datasets with some common fields,  
Additionally, the each dataset may or may no have more fields (i.e. meta data) 
specific to that dataset.  For deployment and maintenance considerations, we 
would like to define a schema to use dynamic fields or a super set of all 
possible fields across the datasets or a combination of superset with dynamic. 
When indexing data, my understanding is if the field is defined but not 
required and no data is provided then that field will be empty.  Our view is 
that each dataset will have at least one Solr index.

Focusing on the definitions in the shema.xml, given the above setup, my 
questions are:

How does defining fields that may not be used affect an index?

-          Is there more of a performance hit on the index/update? And/or on 
the query?

-          Does the number of fields specified in the schema affect the index 
size?

-          Specifying dynamic fields in the schema to account for the 
differences, would there be a difference in performance versus defining the 
superset?
Do copyFields effect performance of an index?

If there is a document or documentation that does outline this and I've missed 
it, I'd appreciate being pointed in that direction.

Thanks
robbin

Request Solr schema field definition vs dynamic creation performance help questions

Reply via email to