[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523918#comment-17523918 ]
Nhat Nguyen commented on LUCENE-10518: -------------------------------------- [~mayya] Thank you for your response. I understand the concern. I think the current consistency check is not good enough to enable these rewrite optimizations. We can open an inconsistent index (created in Lucene8) as read-only, then searches with that reader can return incorrect results. Or we can open that inconsistent index after force-merge. > FieldInfos consistency check can refuse to open Lucene 8 index > -------------------------------------------------------------- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 8.10.1 > Reporter: Nhat Nguyen > Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map<String, DocValuesType> docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org