On 9/28/2020 8:56 AM, Edward Turner wrote:
By removing the copyfields, we've found that our index sizes have reduced by ~40% in some cases, which is great! We're just curious now as to exactly how this can be ...
That's not surprising.
My question is, given the following two schemas, if we index some data to the "description" field, will the index for schema1 be twice as large as the index of schema2? (I guess this relates to how, internally, Solr stores field + index data) Old way -- schema1: ======= <field name="description type="text_general" indexed="true" multiValued="false"/> <field name="default_field" type="text_general" indexed="true" multiValued="false" /> <copyField source="description" dest="default_field /> New way -- schema2: ======= <field name="description type="text_general" indexed="true" multiValued="false"/>
If the only field in the indexed documents is "description", the index built with schema2 will be half the size of the index built with schema1. Both fields referenced by "copyField" are the same type and have the same settings, so they would contain exactly the same data at the Lucene level.
Having the same type for a source and destination field is normally only useful if multiple sources are copied to a destination, which requires multiValued="true" on the destination -- NOT the case in your example.
There is one other use case for a copyField -- using the same data differently, with different type values. For example you might have one type for faceting and one for searching.
Thanks, Shawn