[GitHub] [lucene] jpountz commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
jpountz commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433116951 I removed type guessing by adding a new `IndexableField#invertableType` that can be either `TERM` or `TOKEN_STREAM`. The type guessing is now contained in `Field.java`. Initially I wante

[GitHub] [lucene] rmuir commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
rmuir commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433128283 I'm lost, i see type guessing and an InvertableType class that does nothing. Maybe you forgot to 'git add' or something? -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] jpountz commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
jpountz commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433141266 Yes! Sorry about that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [lucene] jpountz commented on issue #11915: Make Lucene smarter about long runs of matches

2023-02-16 Thread via GitHub
jpountz commented on issue #11915: URL: https://github.com/apache/lucene/issues/11915#issuecomment-1433171306 Thanks for looking! > peekNextNonMatchingDocID() - 1 is guaranteed to not be a match. `peekNextNonMatchingDocID() - 1` would either be the current doc ID, or a match. (

[GitHub] [lucene] rmuir commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
rmuir commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433180423 its better, i'm only sad about a naming issue: * InvertableType: OK * InvertableType.TERM: Terrible, it isn't a Term at all, its a BytesRef. * InvertableType.TOKEN_STREAM: OK

[GitHub] [lucene] jpountz commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
jpountz commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433184781 Fair point, I renamed `TERM` to `BINARY`, which is consistent with `StoredValue` and the fact that the API on `IndexableField` is called `#binaryValue()`? -- This is an automated mess

[GitHub] [lucene] rmuir commented on pull request #12139: Skip the TokenStream overhead when indexing simple keywords.

2023-02-16 Thread via GitHub
rmuir commented on PR #12139: URL: https://github.com/apache/lucene/pull/12139#issuecomment-1433193923 yes, better thanks! The only thing good about the "Term" was that it did capture the singleton nature. I'd just suggest a small improvement to the javadocs for BINARY to mention that its "

[GitHub] [lucene] tylerbertrand commented on a diff in pull request #12150: Gradle optimizations

2023-02-16 Thread via GitHub
tylerbertrand commented on code in PR #12150: URL: https://github.com/apache/lucene/pull/12150#discussion_r1108598872 ## gradle/validation/jar-checks.gradle: ## @@ -231,7 +238,8 @@ subprojects { } } } - + def f = new File(project.buildDir.path + "

[GitHub] [lucene] tylerbertrand commented on a diff in pull request #12150: Gradle optimizations

2023-02-16 Thread via GitHub
tylerbertrand commented on code in PR #12150: URL: https://github.com/apache/lucene/pull/12150#discussion_r1108635838 ## gradle/validation/jar-checks.gradle: ## @@ -231,7 +238,8 @@ subprojects { } } } - + def f = new File(project.buildDir.path + "

[GitHub] [lucene] dnhatn commented on pull request #12147: Ensure caching all leaves from the upper tier

2023-02-16 Thread via GitHub
dnhatn commented on PR #12147: URL: https://github.com/apache/lucene/pull/12147#issuecomment-1433551735 @jpountz Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] dnhatn merged pull request #12147: Ensure caching all leaves from the upper tier

2023-02-16 Thread via GitHub
dnhatn merged PR #12147: URL: https://github.com/apache/lucene/pull/12147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dnhatn closed issue #12140: LRUQueryCache disabled for indices with more than 33 segments

2023-02-16 Thread via GitHub
dnhatn closed issue #12140: LRUQueryCache disabled for indices with more than 33 segments URL: https://github.com/apache/lucene/issues/12140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] jtibshirani merged pull request #12146: Simplify max score for kNN vector queries

2023-02-16 Thread via GitHub
jtibshirani merged PR #12146: URL: https://github.com/apache/lucene/pull/12146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] jtibshirani commented on pull request #12146: Simplify max score for kNN vector queries

2023-02-16 Thread via GitHub
jtibshirani commented on PR #12146: URL: https://github.com/apache/lucene/pull/12146#issuecomment-1433647800 Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [lucene] benwtrent opened a new pull request, #12152: Fix vector search doc score query bugs

2023-02-16 Thread via GitHub
benwtrent opened a new pull request, #12152: URL: https://github.com/apache/lucene/pull/12152 This commit fixes one major bug and has two minor performance improvements. In a pure disjunction case within the `BoolQuery` (and probably other times), the maximum score up to `NO_MORE_DOCS

[GitHub] [lucene] benwtrent commented on pull request #12152: Fix vector search doc score query bugs

2023-02-16 Thread via GitHub
benwtrent commented on PR #12152: URL: https://github.com/apache/lucene/pull/12152#issuecomment-1433673847 I see that the maxScore was fixed within: https://github.com/apache/lucene/pull/12146 Will revert that part and simply add the tests && minor optimizations :) -- This is an au

[GitHub] [lucene] zhaih commented on a diff in pull request #12152: Minor vector search matching doc optimizations

2023-02-16 Thread via GitHub
zhaih commented on code in PR #12152: URL: https://github.com/apache/lucene/pull/12152#discussion_r1109161735 ## lucene/core/src/test/org/apache/lucene/search/TestDocAndScoreQuery.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor