[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-02-21 Thread via GitHub
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1438199152 Thanks Greg for sharing more info about how it helped on Amazon Product search. Do your queries early terminate somehow (in which case I'd expect this change to help the most since it ca

[GitHub] [lucene] henryrneh opened a new issue, #12165: Integrating Apache Lucene into OSS-Fuzz

2023-02-21 Thread via GitHub
henryrneh opened a new issue, #12165: URL: https://github.com/apache/lucene/issues/12165 ### Description Hi Apache Lucene developers, We have prepared the [initial integration](https://github.com/google/oss-fuzz/pull/9772) of Apache Lucene into [Google OSS-Fuzz](https://github

[GitHub] [lucene] dweiss commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-02-21 Thread via GitHub
dweiss commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438317336 Thank you. Your contribution is appreciated but Lucene already uses what you call a "fuzzer" - a reproducible, pseudo-random component assembly for tests... In fact, we have used it

[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-02-21 Thread via GitHub
henryrneh commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438392493 Hello @dweiss, great to hear that Apache Lucene is already using fuzzing! The big value is that computation power is sponsored by Google and that it is fuzzed by [Jazzer](h

[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-02-21 Thread via GitHub
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1438404538 OK I think I better understand the concern around the slowness with NIOFSDirectory now. With a single PostingsEnum getting reused, a single BufferedIndexInput refill would buffer posting

[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438418170 major problem with newGeometryQuery is that, people think it is some universal language to speak across points and shapes. its not. points are infinitely small and have no "ma

[GitHub] [lucene] benwtrent merged pull request #12152: Minor vector search matching doc optimizations

2023-02-21 Thread via GitHub
benwtrent merged PR #12152: URL: https://github.com/apache/lucene/pull/12152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438428404 just look at the code to see what i mean: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/LatLonPoint.java#L325-L341 The worst part is, the

[GitHub] [lucene] jpountz commented on a diff in pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
jpountz commented on code in PR #12162: URL: https://github.com/apache/lucene/pull/12162#discussion_r1112997446 ## lucene/CHANGES.txt: ## @@ -112,6 +112,9 @@ API Changes * GITHUB#12129: Move DocValuesTermsQuery from sandbox to SortedDocValuesField#newSlowSetQuery and Sorted

[GitHub] [lucene] jpountz commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
jpountz commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438438628 For `KeywordField` and `LongField`/`DoubleField` we ended up adding an option to the ctor to store the field. This PR doesn't have this, but I'm unsure what should be the canonical repre

[GitHub] [lucene] jpountz merged pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-21 Thread via GitHub
jpountz merged PR #12139: URL: https://github.com/apache/lucene/pull/12139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] rmuir commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
rmuir commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438469151 Perhaps we need to walk thru an example. A user wants to only documents within 25km of their current location, very typical. This signature is pretty intuitive for that use-case:

[GitHub] [lucene] Tjianke commented on issue #11707: Re-evaluate different ways to encode postings [LUCENE-10672]

2023-02-21 Thread via GitHub
Tjianke commented on issue #11707: URL: https://github.com/apache/lucene/issues/11707#issuecomment-1438514073 Any progress on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] iverase commented on pull request #12162: Add LatLonField class to index both LatLonPoint and LatLonDocValues

2023-02-21 Thread via GitHub
iverase commented on PR #12162: URL: https://github.com/apache/lucene/pull/12162#issuecomment-1438631454 > For KeywordField and LongField/DoubleField we ended up adding an option to the ctor to store the field. This PR doesn't have this, but I'm unsure what should be the canonical represent

[GitHub] [lucene] nknize commented on issue #11829: Reproducible TestShapeDocValues failure

2023-02-21 Thread via GitHub
nknize commented on issue #11829: URL: https://github.com/apache/lucene/issues/11829#issuecomment-1438669547 I wasn't getting notifications on this for some reason, so thanks for stepping in @ioanatia! Closing this issue for now and will re-open if we see another failure. -- This i

[GitHub] [lucene] nknize closed issue #11829: Reproducible TestShapeDocValues failure

2023-02-21 Thread via GitHub
nknize closed issue #11829: Reproducible TestShapeDocValues failure URL: https://github.com/apache/lucene/issues/11829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-02-21 Thread via GitHub
uschindler commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438783686 I checked the patch in the related issue. It fuzzes analyzer creation with some data and also has some fuzzing for IndexSearcher. That's nothing new, e.g., we have our own Analys

[GitHub] [lucene] rmuir commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-02-21 Thread via GitHub
rmuir commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438859463 In the analyzers example given there, it is a good one to see the differences. Both approaches (OSS Fuzz and existing TestRandomChains) test "random analysis chains", but the c

[GitHub] [lucene] zhaih commented on a diff in pull request #12160: Concurrent rewrite for KnnVectorQuery

2023-02-21 Thread via GitHub
zhaih commented on code in PR #12160: URL: https://github.com/apache/lucene/pull/12160#discussion_r1113630957 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -73,17 +76,41 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOExceptio

[GitHub] [lucene] zhaih commented on a diff in pull request #12158: Clone the BytesRef[] values in KeywordField#newSetQuery

2023-02-21 Thread via GitHub
zhaih commented on code in PR #12158: URL: https://github.com/apache/lucene/pull/12158#discussion_r1113646765 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -169,7 +169,7 @@ public static Query newSetQuery(String field, BytesRef... values) { Ob