[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

Nhat Nguyen (Jira) Mon, 18 Apr 2022 15:23:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523918#comment-17523918
 ]


Nhat Nguyen commented on LUCENE-10518:
--------------------------------------

[~mayya] Thank you for your response. I understand the concern. I think the 
current consistency check is not good enough to enable these rewrite 
optimizations. We can open an inconsistent index (created in Lucene8) as 
read-only, then searches with that reader can return incorrect results. Or we 
can open that inconsistent index after force-merge.


> FieldInfos consistency check can refuse to open Lucene 8 index
> --------------------------------------------------------------
>
>                 Key: LUCENE-10518
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10518
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 8.10.1
>            Reporter: Nhat Nguyen
>            Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
>     IndexWriterConfig config = new IndexWriterConfig();
>     config.setCommitOnClose(false);
>     config.setMergePolicy(NoMergePolicy.INSTANCE);
>     try (IndexWriter writer = new IndexWriter(dir, config)) {
>       // first segment
>       writer.addDocument(new Document()); // an empty doc
>       Document d1 = new Document();
>       byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>       Arrays.fill(chars, (byte) 'a');
>       d1.add(new Field("field", new BytesRef(chars), KeywordField));
>       d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>       expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>       writer.flush();
>       // second segment
>       Document d2 = new Document();
>       d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>       d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>       writer.addDocument(d2);
>       writer.flush();
>       writer.commit();
>       // Check for doc values types consistency
>       Map<String, DocValuesType> docValuesTypes = new HashMap<>();
>       try(DirectoryReader reader = DirectoryReader.open(dir)){
>         for (LeafReaderContext leaf : reader.leaves()) {
>           for (FieldInfo fi : leaf.reader().getFieldInfos()) {
>             DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
>             if (current != null && current != fi.getDocValuesType()) {
>               fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
>             }
>           }
>         }
>       }
>     }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

Reply via email to