[GitHub] [lucene] jpountz commented on issue #11393: Ghost fields and postings/points [LUCENE-10357]
jpountz commented on issue #11393: URL: https://github.com/apache/lucene/issues/11393#issuecomment-1234304794 > I see that getValues also throws exception in case FieldInfo#getPointDimensionCount is 0, which means that callers can't blindly call getValues without consulting FieldInfo first It's a bit more complicated than that. Callers indeed cannot call `PointsReader#getValues` blindly, which is a codec API that should only be called on fields that have points enabled. However, callers can call `LeafReader#getPointValues` blindly, the user-facing API, which internally checks whether the field is indexed with points to know whether it should forward to the `PointsReader#getValues` codec API or return `null`. Queries are expected to always interact with points through `LeafReader#getPointValues`, not `PointsReader#getValues`. If we changed `PointsReader#getValues` to never return `null`, `LeafReader#getPointValues` would still return `null` on fields that do not exist or that do not have points enabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request, #11732: fixed index order needed for TestKnnVectorQuery.testScoreEuclidean
msokolov opened a new pull request, #11732: URL: https://github.com/apache/lucene/pull/11732 This test relies on documents retaining the order in which they were indexed. I had previously tried to fix this a different way (using forceMerge), but this only masked the problem in one case. Here I switched from RandomIndexWriter to IndexWriter for this test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #11732: fixed index order needed for TestKnnVectorQuery.testScoreEuclidean
msokolov merged PR #11732: URL: https://github.com/apache/lucene/pull/11732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov closed issue #1587: SimpleTextKnnVectorsFormat to fully support byte-encoding
msokolov closed issue #1587: SimpleTextKnnVectorsFormat to fully support byte-encoding URL: https://github.com/apache/lucene/issues/1587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov closed issue #11706: Add a Codec class to track merge time of each index part [LUCENE-10670]
msokolov closed issue #11706: Add a Codec class to track merge time of each index part [LUCENE-10670] URL: https://github.com/apache/lucene/issues/11706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11706: Add a Codec class to track merge time of each index part [LUCENE-10670]
msokolov commented on issue #11706: URL: https://github.com/apache/lucene/issues/11706#issuecomment-1234413675 I think we discussed and decided this approach is not viable. Due to stored fields encoding optimization that relies on instanceof checks we are forbidden from wrapping Codecs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]
msokolov commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1234415552 I haven't seen any objections, and it makes sense to me that we may want to have multiple values here, analogous to other doc values types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on issue #11690: New companion doc value format for LatLonShape and XYShape field types [LUCENE-10654]
nknize commented on issue #11690: URL: https://github.com/apache/lucene/issues/11690#issuecomment-1234429569 closing as implemented in #1064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize closed issue #11690: New companion doc value format for LatLonShape and XYShape field types [LUCENE-10654]
nknize closed issue #11690: New companion doc value format for LatLonShape and XYShape field types [LUCENE-10654] URL: https://github.com/apache/lucene/issues/11690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
msokolov commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234437820 is it worth backporting to 9.x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
uschindler commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234443307 Yes. Wasn't that done? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] madrob commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
madrob commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234453507 @msokolov should be 090b05033e742c4db779dc3e0def83e8425b7ce3? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
uschindler commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234456942 Yes. It should be cherry picked for 9x branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
uschindler commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234458503 I think all is fine here. It is in change log of 9.4 and in 9x branch. So will be in release when 9.4 is branched away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11715: Add Integer awareness to RamUsageEstimator.sizeOf
msokolov commented on PR #11715: URL: https://github.com/apache/lucene/pull/11715#issuecomment-1234489241 great, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11625: Fix corner case in TestKnnVectorQuery.testRandomWithFilter [LUCENE-10589]
msokolov commented on issue #11625: URL: https://github.com/apache/lucene/issues/11625#issuecomment-1234496182 We've since added support for exact knn search to the simpletext codec so this shouldn't happen any more. FWIW I did try running the test using the repro line above (on branch 9x) and it now passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11696: precompute the max level in LogMergePolicy [LUCENE-10660]
msokolov commented on issue #11696: URL: https://github.com/apache/lucene/issues/11696#issuecomment-1234510685 I removed it from 9.4.0 since I didn't find it backported to 9.x branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] thomasschuerger opened a new issue, #11733: Provide a version of GermanNormalizationFilter that uses a modified Umlaut mapping
thomasschuerger opened a new issue, #11733: URL: https://github.com/apache/lucene/issues/11733 ### Description The GermanNormalizationFilter includes the following mappings: ä/ae -> a, ö/oe -> o, ü/ue -> u and ß -> ss (plus some simple rules when "ue" should not be converted to "u"). This mapping is very uncommon in German. In German, it is common to treat ä and ae, ö and oe, ü and ue, as well as ß and ss as equivalent (the ASCII versions are used in cases where you cannot use the non-ASCII characters, e.g. when using an English keyboard or when the system doesn't allow these characters). With this mapping, searching for "Uber" (the company) finds the frequent word "über", which is unexpected, because "u" and "ü" are (normally) not treated as equivalent. Therefore I would like to see a filter that normalizes German by mapping ä->ae, ö->oe, ü->ue and ß->ss, either by an additional parameter for GermanNormalizationFilter which switches to that mapping (the previous mapping should of course be the default), or by having a separate filter (GermanNormalizationFilter2?) with that mapping. Using a charfilter is not the same, as this is done before the whole filter chain. The new filter should be a drop-in replacement for GermanNormalizationFilter in any position in the filter chain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kotman12 opened a new pull request, #11734: Fix repeating token sentence boundary bug
kotman12 opened a new pull request, #11734: URL: https://github.com/apache/lucene/pull/11734 Fix sentence boundary detection bug in case of repeating tokens (i.e. while using OpenNLP analysis chain in conjunction with a KeywordRepeatFilter) by keeping track of the sentence index and looking ahead one token. Move inner sentence iteration to a component to be shared by the sentence-aware OpenNLP filters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kotman12 opened a new issue, #11735: Incorrect sentence boundaries with repeating tokens in OpenNLP package
kotman12 opened a new issue, #11735: URL: https://github.com/apache/lucene/issues/11735 ### Description **Initial issue**: `KeywordRepeatFilter `+ `OpenNLPLLemmatizer` leads to empty token list in case of a single token stream. **Steps to re-produce**: run [TestOpenNLPLemmatizerFilterFactory.testNoBreakWithRepeatKeywordFilter](https://github.com/kotman12/lucene/blob/fix-sentence-iteration/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L298) and observe that 0 tokens are returned after processing the text “period”. **Underlying issue**: opennlp package mishandles sentence boundary detection in general when KeywordRepeatFilter is added. The issue flies under the radar because the tests don’t verify which tokens are processed together as one sentence. Below is a screenshot showing that the _last_ token of the _last_ sentence gets dropped. This is usually not a big deal when that token is punctuation (most of the time) but can become especially problematic when the last bit of text of a stream has no punctuation. For example consider the text "This is some sentence". If you pass this on its own into an analysis chain identical to the one configured in [TestOpenNLPLemmatizerFilterFactory.testNoBreakWithRepeatKeywordFilter](https://github.com/kotman12/lucene/blob/fix-sentence-iteration/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L298) you will see this:  The `OpenNLPPOSFilter` has a similar issue although not quite as dramatic as `OpenNLPLLemmatizer`. This is a screenshot from a breakpoint in `OpenNLPLLemmatizer` after running the test [TestOpenNLPPOSFilterFactory.testNoBreakWithRepeatKeywordFilter:](https://github.com/kotman12/lucene/blob/fix-sentence-iteration/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPPOSFilterFactory.java#L150)   Notice how the one sentence “No period” gets processed as two separate sentences. Functionally processing it as one sentence wouldn’t be very different (at least as far as the tests are concerned) but it is still most likely not the desired behavior. **Suggested fix**: Linking a [PR ](https://github.com/apache/lucene/pull/11734) as the suggested fix for this. The gist is to use a one-step lookahead when processing the token stream to correctly detect sentence transition in the general case of repeating tokens. I have centralized the inner sentence token loop which had been repeated across the different sentence-aware filters. The suggested fix also removes other seemingly unnecessary conditional branching and tidies up the different open-nlp filters so they behave operate more similarly to one another (at least wherever possible) ### Version and environment details Latest version of lucene running jdk-17 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kotman12 commented on pull request #11734: Fix repeating token sentence boundary bug
kotman12 commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1234648586 Linking issue #11735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #1058: LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery
gsmiller commented on PR #1058: URL: https://github.com/apache/lucene/pull/1058#issuecomment-1234688519 @msokolov any additional feedback or concerns on this? If not, I'll merge today so it can go with 9.4. It's not critical to get it into 9.4 though, so if you (or anyone else) would like some extra time to consider the change, I can wait on it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #1058: LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery
gsmiller merged PR #1058: URL: https://github.com/apache/lucene/pull/1058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #1058: LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery
gsmiller commented on PR #1058: URL: https://github.com/apache/lucene/pull/1058#issuecomment-1234812574 Thanks @msokolov. Merged and backported. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new issue, #11736: Promote DocValuesTermsQuery functionality from sandbox module
gsmiller opened a new issue, #11736: URL: https://github.com/apache/lucene/issues/11736 ### Description Now that `TermInSetQuery` is able to estimate its cost and work with `IndexOrDocValuesQuery`, it would be nice to have a first-class doc-values-based term-in-set approach to pair with the current postings-based implementation. `DocValuesTermsQuery` in the sandbox module provides this, and I propose we promote the functionality out of `sandbox`. One approach for this, brought up by @rmuir over in #11244, would be to refactor `TermInSetQuery` to extend `MultiTermQuery`. If we do that, we can provide a rewrite method that creates a doc-values-based approach, avoiding some duplicate code. The unknown right now is if extending `MultiTermQuery` would have any adverse performance side-effects on `TermInSetQuery` in general since the terms intersection is implemented a little differently. We would like to benchmark this before making the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11244: Make TermInSetQuery usable with IndexOrDocValuesQuery [LUCENE-10207]
gsmiller commented on issue #11244: URL: https://github.com/apache/lucene/issues/11244#issuecomment-1234849107 As of #1058, `TermInSetQuery` can now estimate its cost, making it usable with `IndexOrDocValuesQuery` as the index-based query. The already exists `DocValuesTermsQuery` in the sandbox module, which provides a doc-values-based approach that it can be paired with. I've opened #11736 to suggest promoting that functionality out of the sandbox module. I propose we resolve this issue, capturing the core work of `TermInSetQuery` being able to estimate its cost, which it now does. Let's create spin-off issues (like #11736) for any additional work we'd like to try in this space. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller closed issue #11244: Make TermInSetQuery usable with IndexOrDocValuesQuery [LUCENE-10207]
gsmiller closed issue #11244: Make TermInSetQuery usable with IndexOrDocValuesQuery [LUCENE-10207] URL: https://github.com/apache/lucene/issues/11244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kotman12 commented on pull request #11734: Fix repeating token sentence boundary bug
kotman12 commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1234911661 ./gradlew check passed locally as described in the contribution guide 😃  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #11737: Simplify dense optimization check in TermInSetQuery
gsmiller opened a new pull request, #11737: URL: https://github.com/apache/lucene/pull/11737 ### Description Small simplification to some recently added logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11736: Promote DocValuesTermsQuery functionality from sandbox module
gsmiller commented on issue #11736: URL: https://github.com/apache/lucene/issues/11736#issuecomment-1234927353 I'll post a draft PR for this soon. I have the proposed changes on a local branch but just need to untangle it from some other work and rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.
gsmiller opened a new pull request, #11738: URL: https://github.com/apache/lucene/pull/11738 ### Description This PR brings over an optimization we recently made to `TermInSetQuery` (#1062) to `MultiTermQuery` more generally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #11739: DRAFT: TermInSetQuery refactored to extend MultiTermsQuery
gsmiller opened a new pull request, #11739: URL: https://github.com/apache/lucene/pull/11739 ### Description This is a demo PR to show how we can make `TermInSetQuery` extend `MultiTermsQuery` and add "slow" doc-value-based queries by doing so. We'd need to benchmark to understand any potential regressions to the "standard" index-based term-in-set query functionality before merging this. Marking as a "draft" for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11736: Promote DocValuesTermsQuery functionality from sandbox module
gsmiller commented on issue #11736: URL: https://github.com/apache/lucene/issues/11736#issuecomment-1234938538 Here's a draft PR showing how we might do this: #11739 If that approach ends up regressing "normal" term-in-set query behavior, we could take a simpler approach and just move the `DocValuesTermsQuery` out of sandbox I suppose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new issue, #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?
gsmiller opened a new issue, #11740: URL: https://github.com/apache/lucene/issues/11740 ### Description To minimize the up-front cost of creating a `ScoreSupplier`, `TermInSetQuery` doesn't actually intersect its terms with the index, which means it has no visibility into the postings length of each term for the purpose of cost estimation. Because of this, we might grossly over-estimate the cost. I wonder if we can do better somehow? As one thought, I wonder if there are any cases where it's actually justified to intersect the terms up-front? While there's a cost of doing so, having a more accurate cost estimate for the `Scorer` might be useful in some cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost
gsmiller opened a new pull request, #11741: URL: https://github.com/apache/lucene/pull/11741 …estimate cost ### Description Here's a rough sketch of what it might look like to intersect `TermInSetQuery` terms when creating a `ScoreSupplier` to more effectively estimate cost (see #11740) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?
gsmiller commented on issue #11740: URL: https://github.com/apache/lucene/issues/11740#issuecomment-1234950938 Put up a draft PR to show how we could intersect terms early here: #11741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller closed pull request #435: LUCENE-10207: Add "slow" term-in-set query support to SortedDocValuesField / SortedSetDocValuesField
gsmiller closed pull request #435: LUCENE-10207: Add "slow" term-in-set query support to SortedDocValuesField / SortedSetDocValuesField URL: https://github.com/apache/lucene/pull/435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org