: We are using Solr 7.1.0 to index a database of addresses. We have found : that our index size increases massively when we add one extra field to : the index, even though that field is stored and not indexed, and doesn’t
what about docValues? : When we run an index load without the problematic field present, the : Solr index size is 5.5GB. When we add the field into the index, the : size grows to 13.3GB. The field itself is a maximum of 46 characters in : length and on average is 19 characters. We have ~14,000,000 rows in : total to index of which only ~200,000 have this field present at all : (i.e. not null in database). Given that we don’t want to index the : field, only store it I would have thought (perhaps naively) that the : storage increase would be approximately 200,000 * 19 = 3.8M bytes = : 3.6MB rather than the 7.5GB we are seeing. if the field has docValues enabled, then there will be some overhead for every doc in the index -- even the ones that don't have a value in this field. (allthough i'd still be very suprised if it accounted for 7G) : - The problematic field is created through the API as follows: : : curl -X POST -H 'Content-type:application/json' --data-binary '{ : "add-field":{ : "name":"buildingName", : "type":"string", : "stored":true, : "indexed":false : } : }' http://localhost:8983/solr/address/schema ...that's going to cause the field to inherit any (non-overridden) settings from the fieldType "string" -- in the 7.1 _default configset, "string" is defined with docValues="true" You can see *all* properties set on a field -- regardless of wether they are set on the fieldType, or are implicit hardcoded defaults in the implementation of the fieldType via the 'showDefaults=true' Schema API option. Consider these API examples from the techproducts demo... $ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat' { "responseHeader":{ "status":0, "QTime":0}, "field":{ "name":"cat", "type":"string", "multiValued":true, "indexed":true, "stored":true}} $ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat?showDefaults=true' { "responseHeader":{ "status":0, "QTime":0}, "field":{ "name":"cat", "type":"string", "indexed":true, "stored":true, "docValues":false, "termVectors":false, "termPositions":false, "termOffsets":false, "termPayloads":false, "omitNorms":true, "omitTermFreqAndPositions":true, "omitPositions":false, "storeOffsetsWithPositions":false, "multiValued":true, "large":false, "sortMissingLast":true, "required":false, "tokenized":false, "useDocValuesAsStored":true}} -Hoss http://www.lucidworks.com/