[GitHub] [lucene] jpountz opened a new pull request, #12549: Run merge-on-full-flush even though no changes got flushed.

2023-09-11 Thread via GitHub
jpountz opened a new pull request, #12549: URL: https://github.com/apache/lucene/pull/12549 Currently, merge-on-full-flush only checks if merges need to run if changes have been flushed to disk. This prevents from having different merging logic for refreshes and commits, since the merge pol

[GitHub] [lucene] mikemccand commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

2023-09-11 Thread via GitHub
mikemccand commented on code in PR #12337: URL: https://github.com/apache/lucene/pull/12337#discussion_r1321799297 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software F

[GitHub] [lucene] mikemccand commented on a diff in pull request #12337: Index arbitrary fields in taxonomy docs

2023-09-11 Thread via GitHub
mikemccand commented on code in PR #12337: URL: https://github.com/apache/lucene/pull/12337#discussion_r1321802426 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/ReindexingEnrichedDirectoryTaxonomyWriter.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apa

[GitHub] [lucene] mikemccand commented on pull request #12337: Index arbitrary fields in taxonomy docs

2023-09-11 Thread via GitHub
mikemccand commented on PR #12337: URL: https://github.com/apache/lucene/pull/12337#issuecomment-1714232934 > But as I think about this feature and how do I see it mature over time, I DO think the payload should be given when ingesting the documents Hmm -- I don't think that's great b

[GitHub] [lucene] mikemccand commented on issue #12190: Add "Expression" Facets Implementation

2023-09-11 Thread via GitHub
mikemccand commented on issue #12190: URL: https://github.com/apache/lucene/issues/12190#issuecomment-1714240627 I like this idea -- it's an "aggregation level expression", which computes an expression in "aggregation space", instead of the existing (already supported) document level expres

[GitHub] [lucene] onyxmaster commented on issue #4549: ShingleFilter should handle positionIncrement of zero, e.g. synonyms [LUCENE-3475]

2023-09-11 Thread via GitHub
onyxmaster commented on issue #4549: URL: https://github.com/apache/lucene/issues/4549#issuecomment-1714290760 Hi. Got bitten by this today after a lemmatizer filter produced two variants of base word at the same position and ShingleFilter producing a "shingle" from these variants, failing

[GitHub] [lucene] jpountz commented on pull request #12490: Reduce the overhead of ImpactsDISI.

2023-09-11 Thread via GitHub
jpountz commented on PR #12490: URL: https://github.com/apache/lucene/pull/12490#issuecomment-1714465729 I plan on merging soon if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-09-11 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1714471318 We could. These tasks are a bit malicious as the doc freq is slightly greater than the value of `k=100` so it takes lots of collected matches to find k documents that have this term. I s

[GitHub] [lucene] gokaai commented on a diff in pull request #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-09-11 Thread via GitHub
gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1322006478 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,39 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

[GitHub] [lucene] jainankitk commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-09-11 Thread via GitHub
jainankitk commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1714517103 > Maybe next we should try 4 readLong() for readInts32? Though I wonder how often in this benchy are we really needing 32 bits to encode the docid deltas in a BKD leaf block?

[GitHub] [lucene] Tony-X closed pull request #12541: Document why we need `lastPosBlockOffset`

2023-09-11 Thread via GitHub
Tony-X closed pull request #12541: Document why we need `lastPosBlockOffset` URL: https://github.com/apache/lucene/pull/12541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] zhaih commented on issue #11537: StackOverflow when RegExp encounters a very large string [LUCENE-10501]

2023-09-11 Thread via GitHub
zhaih commented on issue #11537: URL: https://github.com/apache/lucene/issues/11537#issuecomment-1715016712 I checked the CHANGES list since last release and seems we have good amount of commits already, let me start a thread about releasing the next version. On Wed, Sep 6, 2023 at