mikemccand commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1442838857


##########
lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java:
##########
@@ -1937,4 +1937,97 @@ public void setMergeInfo(SegmentCommitInfo info) {
     targetDir.close();
     sourceDir.close();
   }
+
+  public void testIllegalParentDocChange() throws Exception {
+    Directory dir1 = newDirectory();
+    IndexWriterConfig iwc1 = newIndexWriterConfig(new MockAnalyzer(random()));
+    iwc1.setParentField("foobar");
+    RandomIndexWriter w1 = new RandomIndexWriter(random(), dir1, iwc1);
+    Document parent = new Document();
+    w1.addDocuments(Arrays.asList(new Document(), new Document(), parent));
+    w1.commit();
+    w1.addDocuments(Arrays.asList(new Document(), new Document(), parent));
+    w1.commit();
+    // so the index sort is in fact burned into the index:
+    w1.forceMerge(1);
+    w1.close();
+
+    Directory dir2 = newDirectory();
+    IndexWriterConfig iwc2 = newIndexWriterConfig(new MockAnalyzer(random()));
+    iwc2.setParentField("foo");
+    RandomIndexWriter w2 = new RandomIndexWriter(random(), dir2, iwc2);
+
+    IndexReader r1 = DirectoryReader.open(dir1);
+    String message =
+        expectThrows(
+                IllegalArgumentException.class,
+                () -> {
+                  w2.addIndexes((SegmentReader) getOnlyLeafReader(r1));
+                })
+            .getMessage();
+    assertEquals(
+        "cannot configure [foo] as parent document field ; this index uses 
[foobar] as parent document field already",

Review Comment:
   See I think this exception message is backwards :)  Shouldn't it be:
   
   ```
   cannot configure [foobar] as parent document field ; this index uses [foo] 
as parent document field already
   ```
   
   I.e. "this index" is referring to `dir2`/`w2` (the index into which we are 
adding an external index), and it is configured to use `foo` as the parent 
field, while the incoming index via `addIndexes` (`dir1`) is attempting 
(illegally) to change that to `foobar`?
   
   Edit: OK I think this all comes down to `this index` being super confusing 
to me :)  If it's referring to the incoming (newly added) index via addIndexes, 
the message makes sense.  But I would think `this index` means the one you 
opened with IW...



##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -78,6 +80,7 @@ public FieldInfos(FieldInfo[] infos) {
     boolean hasPointValues = false;
     boolean hasVectorValues = false;
     String softDeletesField = null;
+    String parentField = null;

Review Comment:
   Maybe fix these silly pointless initializers ^^?
   
   This `FieldInfos.java` is some of the oldest, most "pristine"/"untouched" 
code in Lucene ;)  You can almost count the tree rings ...



##########
lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java:
##########
@@ -1937,4 +1937,97 @@ public void setMergeInfo(SegmentCommitInfo info) {
     targetDir.close();
     sourceDir.close();
   }
+
+  public void testIllegalParentDocChange() throws Exception {
+    Directory dir1 = newDirectory();
+    IndexWriterConfig iwc1 = newIndexWriterConfig(new MockAnalyzer(random()));
+    iwc1.setParentField("foobar");
+    RandomIndexWriter w1 = new RandomIndexWriter(random(), dir1, iwc1);
+    Document parent = new Document();
+    w1.addDocuments(Arrays.asList(new Document(), new Document(), parent));
+    w1.commit();
+    w1.addDocuments(Arrays.asList(new Document(), new Document(), parent));
+    w1.commit();
+    // so the index sort is in fact burned into the index:
+    w1.forceMerge(1);
+    w1.close();
+
+    Directory dir2 = newDirectory();
+    IndexWriterConfig iwc2 = newIndexWriterConfig(new MockAnalyzer(random()));
+    iwc2.setParentField("foo");
+    RandomIndexWriter w2 = new RandomIndexWriter(random(), dir2, iwc2);
+
+    IndexReader r1 = DirectoryReader.open(dir1);
+    String message =
+        expectThrows(
+                IllegalArgumentException.class,
+                () -> {
+                  w2.addIndexes((SegmentReader) getOnlyLeafReader(r1));
+                })
+            .getMessage();
+    assertEquals(
+        "cannot configure [foo] as parent document field ; this index uses 
[foobar] as parent document field already",
+        message);
+
+    message =
+        expectThrows(
+                IllegalArgumentException.class,
+                () -> {
+                  w2.addIndexes(dir1);
+                })
+            .getMessage();
+    assertEquals(
+        "cannot configure [foo] as parent document field ; this index uses 
[foobar] as parent document field already",

Review Comment:
   Same here?



##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -437,6 +488,33 @@ private void verifySoftDeletedFieldName(String fieldName, 
boolean isSoftDeletesF
       }
     }
 
+    private void verifyParentFieldName(String fieldName, boolean 
isParentField) {
+      if (isParentField) {
+        if (parentFieldName == null) {
+          throw new IllegalArgumentException(
+              "this index has ["
+                  + fieldName
+                  + "] as parent document field already but parent document 
field is not configured in IWC");
+        } else if (fieldName.equals(parentFieldName) == false) {
+          throw new IllegalArgumentException(
+              "cannot configure ["
+                  + parentFieldName
+                  + "] as parent document field ; this index uses ["
+                  + fieldName

Review Comment:
   > I would like to find a way that users don't have to specify these 
fieldnames altogether but that should be a follow up
   
   Yeah +1 to defer.
   
   Phew, I fired up a GitHub workspace to spelunk the code/PR better!  I think 
(maybe?) I understand what's going on and my confusion!  I need more caffeine 
now ...
   
   I think this code is supposed to only catch the `addIndexes` mismatch case?  
Because, by design, if user is indexing documents, they cannot tickle this code 
-- the `parentFieldName` (from IWC) and the newly born `FieldInfo` (in-memory 
DWPT segment) will also use IWC's configured parent field, so these ifs should 
never trigger from simply indexing documents?
   
   Maybe we can rename `parentFieldName` to `iwcParentFieldName` making its 
provenance clear?  Then, by contrast, it's more clear that `fieldName` is the 
new foreign entity that must be validated against IWC's config.
   
   And then I find `this index` super confusing -- in the test case it seems to 
refer to the incoming index (via `addIndexes`) and not to the currently open 
index that your `IndexWriter` is pointing to?
   
   How about changing the wording for all of these exceptions to something like 
`the current index is configured with parentName=X; cannot add external index 
with parentName=Y` or so?



##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -132,6 +135,13 @@ public FieldInfos(FieldInfo[] infos) {
         }
         softDeletesField = info.name;
       }
+      if (info.isParentField()) {
+        if (parentField != null && parentField.equals(info.name) == false) {
+          throw new IllegalArgumentException(
+              "multiple parent fields [" + info.name + ", " + parentField + 
"]");

Review Comment:
   Can we improve this wording @s1monw ^^?  It is confusing because it does not 
actually say specifically what is wrong (that you must have only one parent 
field).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to