[ 
https://issues.apache.org/jira/browse/LUCENE-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074526#comment-17074526
 ] 

ASF subversion and git services commented on LUCENE-9300:
---------------------------------------------------------

Commit c2a82d58f5c4e8263a942b85380c5ac156662da8 in lucene-solr's branch 
refs/heads/branch_8x from Jim Ferenczi
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c2a82d5 ]

LUCENE-9300: Fix field infos update on doc values update (#1394)

Today a doc values update creates a new field infos file that contains the 
original field infos updated for the new generation as well as the new fields 
created by the doc values update.

However existing fields are cloned through the global fields (shared in the 
index writer) instead of the local ones (present in the segment).
In practice this is not an issue since field numbers are shared between 
segments created by the same index writer.
But this assumption doesn't hold for segments created by different writers and 
added through IndexWriter#addIndexes(Directory).
In this case, the field number of the same field can differ between segments so 
any doc values update can corrupt the index
by assigning the wrong field number to an existing field in the next generation.

When this happens, queries and merges can access wrong fields without throwing 
any error, leading to a silent corruption in the index.

This change ensures that we preserve local field numbers when creating
a new field infos generation.

> Index corruption with doc values updates and addIndexes
> -------------------------------------------------------
>
>                 Key: LUCENE-9300
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9300
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>            Priority: Major
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Today a doc values update creates a new field infos file that contains the 
> original field infos updated for the new generation as well as the new fields 
> created by the doc values update.
> However existing fields are cloned through the global fields (shared in the 
> index writer) instead of the local ones (present in the segment). In practice 
> this is not an issue since field numbers are shared between segments created 
> by the same index writer. But this assumption doesn't hold for segments 
> created by different writers and added through 
> IndexWriter#addIndexes(Directory). In this case, the field number of the same 
> field can differ between segments so any doc values update can corrupt the 
> index by assigning the wrong field number to an existing field in the next 
> generation. 
> When this happens, queries and merges can access wrong fields without 
> throwing any error, leading to a silent corruption in the index.
>  
> Since segments are not guaranteed to have the same field number consistently 
> we should ensure that doc values update preserves the segment's field number 
> when rewriting field infos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to