Jim Ferenczi created LUCENE-9300:
------------------------------------

             Summary: Index corruption with doc values updates and addIndexes
                 Key: LUCENE-9300
                 URL: https://issues.apache.org/jira/browse/LUCENE-9300
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Jim Ferenczi


Today a doc values update creates a new field infos file that contains the 
original field infos updated for the new generation as well as the new fields 
created by the doc values update.

However existing fields are cloned through the global fields (shared in the 
index writer) instead of the local ones (present in the segment). In practice 
this is not an issue since field numbers are shared between segments created by 
the same index writer. But this assumption doesn't hold for segments created by 
different writers and added through IndexWriter#addIndexes(Directory). In this 
case, the field number of the same field can differ between segments so any doc 
values update can corrupt the index by assigning the wrong field number to an 
existing field in the next generation. 

When this happens, queries and merges can access wrong fields without throwing 
any error, leading to a silent corruption in the index.

 

Since segments are not guaranteed to have the same field number consistently we 
should ensure that doc values update preserves the segment's field number when 
rewriting field infos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to