[GitHub] [lucene] uschindler commented on pull request #12376: Allow VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled

2023-07-10 Thread via GitHub
uschindler commented on PR #12376: URL: https://github.com/apache/lucene/pull/12376#issuecomment-1628405409 You can't run Gradle with that version as it's incompatible. This has nothing to do with this PR, it is documented in Lucene's help: https://github.com/apache/lucene/blob/main/help/jv

[GitHub] [lucene] startjava opened a new issue, #12429: has integration spring-boot code demo ?

2023-07-10 Thread via GitHub
startjava opened a new issue, #12429: URL: https://github.com/apache/lucene/issues/12429 has integration spring-boot code demo ? Official documents has ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628889057 I tested the code for vectors and scalars on wikimediumall. Comparing them to the baseline, it shows that neither of them performs better than the baseline. I found that the bottleneck i

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628890319 scalar TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighLow 444.15

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628893134 vectorized TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Prefix3 195.7

[GitHub] [lucene] ChrisHegarty commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
ChrisHegarty commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628899369 >@ChrisHegarty have you experimented at all with narrower vector lanes for lower bpv? @gsmiller It is not surprising that ByteVector is faster for narrow bit width scenarios

[GitHub] [lucene] ChrisHegarty commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
ChrisHegarty commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628920916 > Comparing them to the baseline, it shows that neither of them performs better than the baseline. This is a little surprising, and also disappointing. I had assumed, without

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628938905 > you say it is in prefix sum. How are you determining that? When I noticed a significant decrease in performance even with vectorized code, I reexamined the benchmark. Surprisingl

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628942331 If there is a need for me to provide assistance in any subsequent performance testing, please feel free to let me know, such as the built index (building an index can be quite time-consu

[GitHub] [lucene] uschindler commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
uschindler commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628999019 Hi, what I figured out with performance testing is the following: When you use default settings it doe snot run enough queries by default and instead repeats the whole JVM startup

[GitHub] [lucene] ChrisHegarty commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
ChrisHegarty commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-162906 Part of the issue here is that we are (well, at least I have been), looking at bit packing and unpacking in isolation. We really need to consider the impact on the codec more holist

[GitHub] [lucene] mayya-sharipova commented on pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-10 Thread via GitHub
mayya-sharipova commented on PR #12421: URL: https://github.com/apache/lucene/pull/12421#issuecomment-1629016815 Great work! Have you compared the recall of the parallel graph with the serially built graph (for example using ann-benchmarks)? -- This is an automated message from the Apa

[GitHub] [lucene] nknize commented on pull request #12376: Allow VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled

2023-07-10 Thread via GitHub
nknize commented on PR #12376: URL: https://github.com/apache/lucene/pull/12376#issuecomment-1629025414 > * pass a prop: `-Pruntime.java.home=...` That's what I missed. I was using `-Druntime.java=20` which does nothing. :) Thanks @uschindler! -- This is an automated message from

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629033196 > It's silly to represent the decoded values as int[], and then coerce then into long[] for prefix sum - we should just vectorise prefix sum to work on int[] (in a reasonable way)

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629043326 @uschindler Understood, I will adjust the settings. But first, I need to go and find where the settings are located.😀 -- This is an automated message from the Apache Git Servi

[GitHub] [lucene] stefanvodita commented on pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-10 Thread via GitHub
stefanvodita commented on PR #12409: URL: https://github.com/apache/lucene/pull/12409#issuecomment-1629067744 @mikemccand, can you merge this change if you think it’s ready? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] mikemccand merged pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-10 Thread via GitHub
mikemccand merged PR #12409: URL: https://github.com/apache/lucene/pull/12409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] mikemccand closed issue #11248: Move IntBlockPool's slice allocator and SliceReader/Writer out to MemoryIndex [LUCENE-10211]

2023-07-10 Thread via GitHub
mikemccand closed issue #11248: Move IntBlockPool's slice allocator and SliceReader/Writer out to MemoryIndex [LUCENE-10211] URL: https://github.com/apache/lucene/issues/11248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] mikemccand commented on pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-10 Thread via GitHub
mikemccand commented on PR #12409: URL: https://github.com/apache/lucene/pull/12409#issuecomment-1629074013 I'll backport to 9.x -- this is an internal API so it's fine to change it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [lucene] mikemccand commented on pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-10 Thread via GitHub
mikemccand commented on PR #12409: URL: https://github.com/apache/lucene/pull/12409#issuecomment-1629089475 OK I merged to 9.x as well: https://github.com/apache/lucene/commit/79b9664224f37204504b26e3ed250af01a530f81 THanks @stefanvodita! -- This is an automated message from the Ap

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629213807 @ChrisHegarty @uschindler Hi! I have an idea. Since our encode and decode functions are fast, can we lazy compute prefix sums? We can first decompress them and only calculate the act

[GitHub] [lucene] HoustonPutman opened a new pull request, #12430: Enable search for site javadocs

2023-07-10 Thread via GitHub
HoustonPutman opened a new pull request, #12430: URL: https://github.com/apache/lucene/pull/12430 ### Description Currently the `noindex` flag is always being used when generating javadocs. For the site javadocs, we want to enable to search feature, which requires the index, so

[GitHub] [lucene] gsmiller commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
gsmiller commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629353957 > Hi! I have an idea. Since our encode and decode functions are fast, can we lazy compute prefix sums? We can first decompress them and only calculate the actual values when the user ne

[GitHub] [lucene] madrob commented on pull request #12430: Enable search for site javadocs

2023-07-10 Thread via GitHub
madrob commented on PR #12430: URL: https://github.com/apache/lucene/pull/12430#issuecomment-1629373651 @rmuir FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629390714 > Some queries may benefit a lot from having the delta-decoding done in bulk for an entire block at once, and some may benefit from a lazy solution? Currently, my idea is to decomp

[GitHub] [lucene] jmazanec15 commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-10 Thread via GitHub
jmazanec15 commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1629419274 @benwtrent I have been thinking about this and am still not completely sure of the implications. It seems like the construction of the graphs may rely on some assumption about th

[GitHub] [lucene] hossman opened a new issue, #12431: UnifiedHighlighter: DefaultPassageFormatter causes IndexOutOfBoundsException w/ setStoreTermVectorOffsets unless setStoreTermVectorPositions

2023-07-10 Thread via GitHub
hossman opened a new issue, #12431: URL: https://github.com/apache/lucene/issues/12431 ### Description Summary of mailing list thread... https://lists.apache.org/list.html?java-u...@lucene.apache.org * Using `UnifiedHighlighter` w/ `DefaultPassageFormatter` * Highlight

[GitHub] [lucene] uschindler commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
uschindler commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629526829 > @uschindler Understood, I will adjust the settings. But first, I need to go and find where the settings are located.😀 From my experience with MMapDircetory, you should be fine

[GitHub] [lucene] jpountz commented on issue #12424: Add NO_COMPRESSION option to compression Mode

2023-07-10 Thread via GitHub
jpountz commented on issue #12424: URL: https://github.com/apache/lucene/issues/12424#issuecomment-1629631448 We probably won't add this option to the default codec as we'd like to keep the number of options limited, but we could add a `NoCompressionStoredFieldsFormat` to `lucene/codecs`.

[GitHub] [lucene] jpountz commented on pull request #12426: Introduce VerifyingQuery

2023-07-10 Thread via GitHub
jpountz commented on PR #12426: URL: https://github.com/apache/lucene/pull/12426#issuecomment-1629645539 I must say I'm surprised it's being considered for usage on Amazon Product Search: `IndexOrDocValuesQuery` should pick the best query, and this `VerifyingQuery` would run both queries, s

[GitHub] [lucene] mayya-sharipova commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-07-10 Thread via GitHub
mayya-sharipova commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1629661053 @rmuir > Can we run this test with lucene's defaults (e.g. not a 2GB rambuffer)? I've done the test and surprising indexing time decreased substantially. I

[GitHub] [lucene] dweiss commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-07-10 Thread via GitHub
dweiss commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1629666572 Leaving a higher number of segments dodges the merge costs, I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] gsmiller commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-10 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1629757874 Thanks @shubhamvishu for working on this! As I look at the PR, I'm wondering if accepting a `List` is really the proper thing to do here. If users already have a sorted `Collection` (li

[GitHub] [lucene] jpountz commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-07-10 Thread via GitHub
jpountz commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1629787490 This benchmark really only measures the flushing cost, as `ConcurrentMergeScheduler` is used, so merges run in background threads. So the improvement makes sense to me as the cost o

[GitHub] [lucene] jpountz commented on issue #12394: Add the ability to compute vector similarity scores with the new ValuesSource API

2023-07-10 Thread via GitHub
jpountz commented on issue #12394: URL: https://github.com/apache/lucene/issues/12394#issuecomment-1629825363 They would be pkg-private in `org.apache.lucene.search` and exposed via factory methods on `DoubleValues`. > I was thinking if we could have a DVS where #getValues returns the

[GitHub] [lucene] jpountz commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

2023-07-10 Thread via GitHub
jpountz commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1258982813 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +242,40 @@ public String toString() { return sb.toString(); } + + /** Wrapper

[GitHub] [lucene] jpountz commented on a diff in pull request #12345: LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches

2023-07-10 Thread via GitHub
jpountz commented on code in PR #12345: URL: https://github.com/apache/lucene/pull/12345#discussion_r1258987383 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -763,6 +763,11 @@ public Query rewrite(Query original) throws IOException { for (Query

[GitHub] [lucene] jpountz commented on pull request #12405: Skip docs with Docvalues in NumericLeafComparator

2023-07-10 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1629853306 I'm not clear if this change is still correct when there is another sort field after the one that gets optimized. It seems like it could skip hits that are still needed. -- This is an

[GitHub] [lucene] gsmiller commented on pull request #12417: forutil add vectorized and scalar code

2023-07-10 Thread via GitHub
gsmiller commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1629974586 > @gsmiller It is not surprising that ByteVector is faster for narrow bit width scenarios. The issue is how to represent these different bit widths in a cohesive way, without coercing t

[GitHub] [lucene] jpountz closed issue #12297: Unnecessary float[](BM25Scorer) allocations for non-scoring queries

2023-07-10 Thread via GitHub
jpountz closed issue #12297: Unnecessary float[](BM25Scorer) allocations for non-scoring queries URL: https://github.com/apache/lucene/issues/12297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [lucene] jpountz commented on issue #12297: Unnecessary float[](BM25Scorer) allocations for non-scoring queries

2023-07-10 Thread via GitHub
jpountz commented on issue #12297: URL: https://github.com/apache/lucene/issues/12297#issuecomment-163669 This has been addressed via #12383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [lucene] startjava opened a new issue, #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
startjava opened a new issue, #12432: URL: https://github.com/apache/lucene/issues/12432 https://repo.maven.apache.org/maven2/org/apache/lucene/lucene-analyzers-common/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] mkhludnev closed issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
mkhludnev closed issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version URL: https://github.com/apache/lucene/issues/12432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [lucene] mkhludnev commented on issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
mkhludnev commented on issue #12432: URL: https://github.com/apache/lucene/issues/12432#issuecomment-1630224797 Here we go https://repo.maven.apache.org/maven2/org/apache/lucene/lucene-analysis-common/9.7.0/ dunno why, really. -- This is an automated message from the Apache Git Servic

[GitHub] [lucene] dweiss commented on issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
dweiss commented on issue #12432: URL: https://github.com/apache/lucene/issues/12432#issuecomment-1630230349 The artifact was renamed for 9.x to reflect the directory structure in the repository. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [lucene] startjava commented on issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
startjava commented on issue #12432: URL: https://github.com/apache/lucene/issues/12432#issuecomment-1630232535 how find 9.7.0 version lucene-analyzers-common jar??  url ?? thank you ! -- 原始邮件 -- 发件人:

[GitHub] [lucene] dweiss commented on issue #12432: repo.maven.apache.org no has lucene-analyzers-common 9.7.0 version

2023-07-10 Thread via GitHub
dweiss commented on issue #12432: URL: https://github.com/apache/lucene/issues/12432#issuecomment-1630238255 There is no lucene-analyzers-common, it's been renamed to lucene-analysis-common, the JAR is here: https://repo.maven.apache.org/maven2/org/apache/lucene/lucene-analysis-common/9.

[GitHub] [lucene] shubhamvishu commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-10 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1630241517 @gsmiller I totally agree its not helpful if the input/data `Collection` is sorted since its a unnecessary overhead to convert to a `List` here. As per `TermInSetQuery` ctor its che

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make some heavy query rewrites concurrent

2023-07-10 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1259261938 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +242,40 @@ public String toString() { return sb.toString(); } + + /** Wra