mikemccand commented on code in PR #13486:
URL: https://github.com/apache/lucene/pull/13486#discussion_r1640752212


##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -1838,6 +1838,20 @@ public long updateDocument(Term term, Iterable<? extends 
IndexableField> doc) th
         term == null ? null : DocumentsWriterDeleteQueue.newNode(term), 
List.of(doc));
   }
 
+  /**
+   * Similar to {@link #updateDocuments(Term, Iterable)}, but only apply 
deletion once for all
+   * flushed segments. This is useful for unique filed like ES's _id.
+   *
+   * @lucene.experimental
+   */
+  // TODO: If it is unnecessary to validate unique constraint, we can add a 
isUnique setting to
+  // Term.
+  public long updateDocument(Term term, boolean isUnique, Iterable<? extends 
IndexableField> doc)

Review Comment:
   Hmm this makes me nervous -- we are relying on the application to properly 
claim `isUnique` but the application may get it wrong.  Though I suppose worst 
case if the application gets it wrong, documents fail to get deleted (just the 
first occurrence will), not for example index corruption.
   
   Could Lucene maybe track that a field is actually unique internally and then 
apply this optimization automatically / always correctly?  We may have to 
tighten the opto to "isUnique and isNonNull (every doc has a value for the 
field)", which OpenSearch/Elasaticsearch `id` field would meet?
   
   If not for deletes/updates we could compare number of unique terms in the 
field == `totalTermFreq`.  Or, maybe we'd instead  track that "this field 
always has exactly one value" so that neither freqs nor positions would have to 
be indexed for this opto to apply?  And then if number of unique terms is >= 
`numDocs`, and every doc has one term in this field, then it is a "primary key"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to