Hello, I'm using solr 4.4. I have a solr core with a schema defining a bunch of different fields, and among them, a date field: - date: indexed and stored // the date used at search time In practice it's a TrieDateField but I think that's not relevant for the concern.
It also has a multi valued, not required, "string" field named "tags" which contains, well a list of tags, for some of the documents. So far, so good: everything works as expected and I'm glad. I'm able to perform partial (or atomic) updates on the tags field whenever it gets modified, and I love it. Now I have an new source that also pushes updates to the same solr core. Unfortunately, that source's incoming documents have their date in an other field, of the same type, named created_time instead of date. - created_time: stored only // some documents come in with this field set To be able to sort any document by time, I decided to ask solr to copy the contents of the field created_time to the field named date: <copyField source="created_date" dest="date" /> I updated my schema and reloaded my core and everything seemed fine. In fact, I did break something 8-) But I figured it out later… Quoting http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations : > all fields in your SchemaXml must be configured as stored="true" except for > fields which are <copyField/> destinations -- which must be configured as > stored="false" However at that time, I was not aware of the limitation and I was able to sort by time across all the documents in my solr core. I then decided to make sure that partial (or atomic) updates could still be performed, and then I was surprised: * documents from the more recent source (having both a date and a created_time field) are updated fine, the date field is kept (the copyField directive is replayed, I guess) * documents from the first source (having only the date field set) are however a little bit less lucky: the date gets lost in process (looks like the date field was overridden by the execution of the copyField directive with nothing in its source field) I then became aware of the caveats and limitations of atomic updates, but now I want to understand why ;-) So my question is: What differs concerning copyField behaviours between a normal (classic) and a partial (atomic) update? In practice, I don't understand why the targets of every copyField directives are *always* cleared during partial updates? Could the clearing of the destination field be performed if one of the source field of a copyField is present in the atomic update only? May be we didn't want to do that because that would have put some complexity where it should not be (updates must be fast), but that's just an idea. I have two ways to handle my problem: 1/ Create a stored="false" search_date field and have two copyFields directives, one for the original "date" field an another one for the newer "created_time" field, and make the search application rely on the search_date field 2/ Since I have some control over the second source pushing documents, I can make sure that documents are pushed with the same date field, and work around the limitation by removing the copyField directive entirely. Since it simplifies my solr schema, I chose the option #2 Thank you very much for your attention Tanguy