mikemccand commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1442838857
########## lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java: ########## @@ -1937,4 +1937,97 @@ public void setMergeInfo(SegmentCommitInfo info) { targetDir.close(); sourceDir.close(); } + + public void testIllegalParentDocChange() throws Exception { + Directory dir1 = newDirectory(); + IndexWriterConfig iwc1 = newIndexWriterConfig(new MockAnalyzer(random())); + iwc1.setParentField("foobar"); + RandomIndexWriter w1 = new RandomIndexWriter(random(), dir1, iwc1); + Document parent = new Document(); + w1.addDocuments(Arrays.asList(new Document(), new Document(), parent)); + w1.commit(); + w1.addDocuments(Arrays.asList(new Document(), new Document(), parent)); + w1.commit(); + // so the index sort is in fact burned into the index: + w1.forceMerge(1); + w1.close(); + + Directory dir2 = newDirectory(); + IndexWriterConfig iwc2 = newIndexWriterConfig(new MockAnalyzer(random())); + iwc2.setParentField("foo"); + RandomIndexWriter w2 = new RandomIndexWriter(random(), dir2, iwc2); + + IndexReader r1 = DirectoryReader.open(dir1); + String message = + expectThrows( + IllegalArgumentException.class, + () -> { + w2.addIndexes((SegmentReader) getOnlyLeafReader(r1)); + }) + .getMessage(); + assertEquals( + "cannot configure [foo] as parent document field ; this index uses [foobar] as parent document field already", Review Comment: See I think this exception message is backwards :) Shouldn't it be: ``` cannot configure [foobar] as parent document field ; this index uses [foo] as parent document field already ``` I.e. "this index" is referring to `dir2`/`w2` (the index into which we are adding an external index), and it is configured to use `foo` as the parent field, while the incoming index via `addIndexes` (`dir1`) is attempting (illegally) to change that to `foobar`? Edit: OK I think this all comes down to `this index` being super confusing to me :) If it's referring to the incoming (newly added) index via addIndexes, the message makes sense. But I would think `this index` means the one you opened with IW... ########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -78,6 +80,7 @@ public FieldInfos(FieldInfo[] infos) { boolean hasPointValues = false; boolean hasVectorValues = false; String softDeletesField = null; + String parentField = null; Review Comment: Maybe fix these silly pointless initializers ^^? This `FieldInfos.java` is some of the oldest, most "pristine"/"untouched" code in Lucene ;) You can almost count the tree rings ... ########## lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java: ########## @@ -1937,4 +1937,97 @@ public void setMergeInfo(SegmentCommitInfo info) { targetDir.close(); sourceDir.close(); } + + public void testIllegalParentDocChange() throws Exception { + Directory dir1 = newDirectory(); + IndexWriterConfig iwc1 = newIndexWriterConfig(new MockAnalyzer(random())); + iwc1.setParentField("foobar"); + RandomIndexWriter w1 = new RandomIndexWriter(random(), dir1, iwc1); + Document parent = new Document(); + w1.addDocuments(Arrays.asList(new Document(), new Document(), parent)); + w1.commit(); + w1.addDocuments(Arrays.asList(new Document(), new Document(), parent)); + w1.commit(); + // so the index sort is in fact burned into the index: + w1.forceMerge(1); + w1.close(); + + Directory dir2 = newDirectory(); + IndexWriterConfig iwc2 = newIndexWriterConfig(new MockAnalyzer(random())); + iwc2.setParentField("foo"); + RandomIndexWriter w2 = new RandomIndexWriter(random(), dir2, iwc2); + + IndexReader r1 = DirectoryReader.open(dir1); + String message = + expectThrows( + IllegalArgumentException.class, + () -> { + w2.addIndexes((SegmentReader) getOnlyLeafReader(r1)); + }) + .getMessage(); + assertEquals( + "cannot configure [foo] as parent document field ; this index uses [foobar] as parent document field already", + message); + + message = + expectThrows( + IllegalArgumentException.class, + () -> { + w2.addIndexes(dir1); + }) + .getMessage(); + assertEquals( + "cannot configure [foo] as parent document field ; this index uses [foobar] as parent document field already", Review Comment: Same here? ########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -437,6 +488,33 @@ private void verifySoftDeletedFieldName(String fieldName, boolean isSoftDeletesF } } + private void verifyParentFieldName(String fieldName, boolean isParentField) { + if (isParentField) { + if (parentFieldName == null) { + throw new IllegalArgumentException( + "this index has [" + + fieldName + + "] as parent document field already but parent document field is not configured in IWC"); + } else if (fieldName.equals(parentFieldName) == false) { + throw new IllegalArgumentException( + "cannot configure [" + + parentFieldName + + "] as parent document field ; this index uses [" + + fieldName Review Comment: > I would like to find a way that users don't have to specify these fieldnames altogether but that should be a follow up Yeah +1 to defer. Phew, I fired up a GitHub workspace to spelunk the code/PR better! I think (maybe?) I understand what's going on and my confusion! I need more caffeine now ... I think this code is supposed to only catch the `addIndexes` mismatch case? Because, by design, if user is indexing documents, they cannot tickle this code -- the `parentFieldName` (from IWC) and the newly born `FieldInfo` (in-memory DWPT segment) will also use IWC's configured parent field, so these ifs should never trigger from simply indexing documents? Maybe we can rename `parentFieldName` to `iwcParentFieldName` making its provenance clear? Then, by contrast, it's more clear that `fieldName` is the new foreign entity that must be validated against IWC's config. And then I find `this index` super confusing -- in the test case it seems to refer to the incoming index (via `addIndexes`) and not to the currently open index that your `IndexWriter` is pointing to? How about changing the wording for all of these exceptions to something like `the current index is configured with parentName=X; cannot add external index with parentName=Y` or so? ########## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ########## @@ -132,6 +135,13 @@ public FieldInfos(FieldInfo[] infos) { } softDeletesField = info.name; } + if (info.isParentField()) { + if (parentField != null && parentField.equals(info.name) == false) { + throw new IllegalArgumentException( + "multiple parent fields [" + info.name + ", " + parentField + "]"); Review Comment: Can we improve this wording @s1monw ^^? It is confusing because it does not actually say specifically what is wrong (that you must have only one parent field). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org