mikemccand commented on code in PR #13486: URL: https://github.com/apache/lucene/pull/13486#discussion_r1640752212
########## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ########## @@ -1838,6 +1838,20 @@ public long updateDocument(Term term, Iterable<? extends IndexableField> doc) th term == null ? null : DocumentsWriterDeleteQueue.newNode(term), List.of(doc)); } + /** + * Similar to {@link #updateDocuments(Term, Iterable)}, but only apply deletion once for all + * flushed segments. This is useful for unique filed like ES's _id. + * + * @lucene.experimental + */ + // TODO: If it is unnecessary to validate unique constraint, we can add a isUnique setting to + // Term. + public long updateDocument(Term term, boolean isUnique, Iterable<? extends IndexableField> doc) Review Comment: Hmm this makes me nervous -- we are relying on the application to properly claim `isUnique` but the application may get it wrong. Though I suppose worst case if the application gets it wrong, documents fail to get deleted (just the first occurrence will), not for example index corruption. Could Lucene maybe track that a field is actually unique internally and then apply this optimization automatically / always correctly? We may have to tighten the opto to "isUnique and isNonNull (every doc has a value for the field)", which OpenSearch/Elasaticsearch `id` field would meet? If not for deletes/updates we could compare number of unique terms in the field == `totalTermFreq`. Or, maybe we'd instead track that "this field always has exactly one value" so that neither freqs nor positions would have to be indexed for this opto to apply? And then if number of unique terms is >= `numDocs`, and every doc has one term in this field, then it is a "primary key"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org