Re: [PR] Random access term dictionary [lucene]

2023-10-26 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1373800366 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/radomaccess/TermsIndex.java: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-26 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1781893327 Thanks for the review @iverase! I’m putting together a new revision. Do you have a name suggestion for `ByteSlicePool`? I don’t really have a better idea. By putting it in a separat

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-26 Thread via GitHub
uschindler commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1781948929 Hi, JDK-22 EA was updated to build 21, which contains fix for https://bugs.openjdk.org/browse/JDK-8318646 -- This is an automated message from the Apache Git Service. To respon

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-26 Thread via GitHub
uschindler commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1781950343 I still think we should improve the test like als stated by openjdk people. The exception mesage is not part of the sepc so you should not rely with tests on its message. This ma

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-10-26 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1373993909 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -287,9 +315,9 @@ public long getMappedStateCount() { return dedupHash == null ? 0 : no

Re: [PR] Optimize: Use DocIdSetIterator Reduuce bkd docvalues iteration [lucene]

2023-10-26 Thread via GitHub
gf2121 commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374099709 ## lucene/core/src/java/org/apache/lucene/util/DocBaseBitSetIterator.java: ## @@ -69,6 +69,9 @@ public int getDocBase() { @Override public int nextDoc() { +

Re: [PR] Optimize: Use DocIdSetIterator Reduuce bkd docvalues iteration [lucene]

2023-10-26 Thread via GitHub
gf2121 commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374099709 ## lucene/core/src/java/org/apache/lucene/util/DocBaseBitSetIterator.java: ## @@ -69,6 +69,9 @@ public int getDocBase() { @Override public int nextDoc() { +

Re: [PR] Optimize: Use DocIdSetIterator Reduuce bkd docvalues iteration [lucene]

2023-10-26 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374132105 ## lucene/core/src/java/org/apache/lucene/util/DocBaseBitSetIterator.java: ## @@ -69,6 +69,9 @@ public int getDocBase() { @Override public int nextDoc() { +

Re: [PR] Optimize: Use DocIdSetIterator Reduuce bkd docvalues iteration [lucene]

2023-10-26 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374132105 ## lucene/core/src/java/org/apache/lucene/util/DocBaseBitSetIterator.java: ## @@ -69,6 +69,9 @@ public int getDocBase() { @Override public int nextDoc() { +

[PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma opened a new pull request, #12724: URL: https://github.com/apache/lucene/pull/12724 Addresses #12708 `xml.TestCoreParser#testSpanNearQueryWithoutSlopXML` fails because of changed exception message Java 22 EA. This change removes the test's dependency on Java exception me

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1782426261 Made a small change to assert on the exception type instead of checking the exception message string. -- This is an automated message from the Apache Git Service. To respond

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-27 Thread via GitHub
dungba88 commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1782469977 I put a new revision with support for DataOutput and FileChannel. When using DataOutput, if suffix sharing is enabled one also needs to pass a RandomAccessInput for reading.

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-27 Thread via GitHub
bruno-roustant commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1782623703 Thank you @shubhamvishu for these experiments. The table answers exactly to the questions. And it means actually there is no point to change the bit mixing, since it does not b

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1782810001 @bruno-roustant Absolutely! I totally agree with your point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-27 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1782814872 FWIW I could reproduce the speedup from disabling patching locally on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1782977506 @gsmiller I finally got a chance to run the benchmarks for this change. Below are the results(looks all good to me). Let me know what do you think? Thanks! ```

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-27 Thread via GitHub
msokolov commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1782983831 as for the renaming maybe we can look to do it as a followup PR? IHnswGraphSearcher -> HnswGraphSearcher? Or maybe we can just use HnswSearcher right now? Re: controlling concurr

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
msokolov commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374646933 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchMin

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-10-27 Thread via GitHub
jpountz commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1783020278 Does this actually matter for performance? My gut feeling is that either a value has a long postings list, and then the vast majority of blocks will be encoded with PFOR and should

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-27 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1783135109 I've not been able to spend all that much time on this this week, but here's my current thinking. The abstractions in the PR are currently not great (as discussed above), but

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-10-27 Thread via GitHub
easyice commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1783202938 @jpountz Thanks for your explanation, i got some flame graph that shows the `readVIntBlock` takes up a bit large proportion, I'll try to reproduce it with some mocked data -- Thi

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on PR #12723: URL: https://github.com/apache/lucene/pull/12723#issuecomment-1783206971 > this seems promising. I guess the only cost is the increased memory needed because we create the FixedBitSet? Can you say how large this might get in the worst case? @msokolo

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374824046 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchM

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374824046 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchM

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
uschindler commented on PR #12724: URL: https://github.com/apache/lucene/pull/12724#issuecomment-1783287980 Should I merge and backport it or do you want to do it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
shubhamvishu opened a new pull request, #12726: URL: https://github.com/apache/lucene/pull/12726 ### Description While going through [VectorUtil](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java) class, I observed we don't have a

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
benwtrent commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1783318072 @shubhamvishu could you add a "CHANGES.txt" entry under optimizations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12727: URL: https://github.com/apache/lucene/pull/12727 We shouldn't ever return negative scores from vector similarity functions. Given vector panama and nearly antipodal float[] vectors, it is possible that cosine and (normalized) dot-product become s

[PR] Add back maxConn & beamWidth HNSW codec ctor [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12728: URL: https://github.com/apache/lucene/pull/12728 follow up to https://github.com/apache/lucene/pull/12582 For user convenience, I added back the two parameter ctor for the HNSW codec. -- This is an automated message from the Apache Git Se

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1783364698 Oh nice! @benwtrent I have added a CHANGES.txt entry under optimizations now. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12729: URL: https://github.com/apache/lucene/pull/12729 Currently the HNSW codec does too many things, it not only indexes vectors, but stores them and determines how to store them given the vector type. This PR extracts out the vector storage int

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-27 Thread via GitHub
stefanvodita commented on code in PR #12506: URL: https://github.com/apache/lucene/pull/12506#discussion_r1375044479 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -170,42 +191,42 @@ public void reset(boolean zeroFillBuffers, boolean reuseFirst) { }

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-27 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1783530943 I’ve integrated most of the suggestions. There’s just the matter of the name for the slice pool class and the right package for it. I don’t have a strong opinion on this. Maybe we m

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma merged PR #12724: URL: https://github.com/apache/lucene/pull/12724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma opened a new pull request, #12730: URL: https://github.com/apache/lucene/pull/12730 Backport of fix in #12724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on PR #12730: URL: https://github.com/apache/lucene/pull/12730#issuecomment-1783652392 Backport change already approved in https://github.com/apache/lucene/pull/12724. Merging. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma merged PR #12730: URL: https://github.com/apache/lucene/pull/12730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on PR #12724: URL: https://github.com/apache/lucene/pull/12724#issuecomment-1783653077 I've backported this to `branch_9x`. Do we need it in any other branches? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-27 Thread via GitHub
rmuir opened a new pull request, #12731: URL: https://github.com/apache/lucene/pull/12731 The intel fma is nice, and its easier to reason about when looking at assembly. We basically reduce the error for free where its available. Along with another change (reducing the unrolling for cosine,

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-27 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1783709669 Thanks Mike and Ben for reviewing! I'll try to merge and backport it tomorrow. And create a issue for future refactoring -- This is an automated message from the Apache Git Service. To r

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-28 Thread via GitHub
uschindler commented on PR #12724: URL: https://github.com/apache/lucene/pull/12724#issuecomment-1783746086 Hi. No more brnaches needed. The older branches do not need it as the issue in idk was already fixed, so error no longer happens (was only in build 20 of jdk-22). P.S.: y

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
uschindler commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375243087 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implement

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
rmuir commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375244023 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements Ve

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
rmuir commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375248973 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implements Ve

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
uschindler commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375252807 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implement

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375278303 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport impleme

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1783869078 ha! So just removing the overly aggressive unrolling in cosine improves things. The check on FMA is nice - I had similar thoughts ( you just beat me to it! ), and it inlines nicel

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1783869625 .. and yes (I've not forgotten), we need something like a `java.lang.Architecture/Platform`, that is queryable for such low-level support (rather than resorting to beans - which act

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-28 Thread via GitHub
zhaih merged PR #12660: URL: https://github.com/apache/lucene/pull/12660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on PR #12727: URL: https://github.com/apache/lucene/pull/12727#issuecomment-1783913908 Hi @benwtrent I think that this is fine - LGTM, just dropping a few small comments / questions. I grabbed and modified your test, and was able to repo this on both my Linu

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on code in PR #12727: URL: https://github.com/apache/lucene/pull/12727#discussion_r1375316778 ## lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java: ## @@ -70,7 +74,11 @@ public float compare(byte[] v1, byte[] v2) { COSINE {

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on code in PR #12727: URL: https://github.com/apache/lucene/pull/12727#discussion_r1375318432 ## lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java: ## @@ -70,7 +74,11 @@ public float compare(byte[] v1, byte[] v2) { COSINE {

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-28 Thread via GitHub
ChrisHegarty commented on code in PR #12694: URL: https://github.com/apache/lucene/pull/12694#discussion_r1375318798 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, b

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-28 Thread via GitHub
uschindler commented on code in PR #12731: URL: https://github.com/apache/lucene/pull/12731#discussion_r1375324223 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -77,6 +77,47 @@ final class PanamaVectorUtilSupport implement

Re: [I] Refactor HNSW graph build such that concurrent build won't impact single thread build [lucene]

2023-10-28 Thread via GitHub
vigyasharma commented on issue #12732: URL: https://github.com/apache/lucene/issues/12732#issuecomment-1783928942 I'm trying to familiarize myself more with Lucene's HNSW implementation, and would like to help with this task. -- This is an automated message from the Apache Git Service. To

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-10-29 Thread via GitHub
Deepika0510 commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1784082772 However, to get that `TimeoutLeafReader` in use, we would need to go through the `ReaderContext` class route(?). In a way we would need some mechanism in `ReaderClass` to know if tim

[PR] Use growNoCopy for SortingStoredFieldsConsumer#NO_COMPRESSION [lucene]

2023-10-29 Thread via GitHub
gf2121 opened a new pull request, #12733: URL: https://github.com/apache/lucene/pull/12733 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-10-29 Thread via GitHub
mikemccand commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1784151630 Hmm I'm confused: why would you need to get to the `TimeoutLeafReader`? Don't you create this timeout reader, passing the timeout to it (which will apply to all queries) and then you

Re: [I] Always collect sparsely in TaxonomyFacets & switch to dense if there are enough unique labels [lucene]

2023-10-29 Thread via GitHub
mikemccand commented on issue #12576: URL: https://github.com/apache/lucene/issues/12576#issuecomment-1784152872 Could we make the collection dynamic? Collect into a sparse structure at first, and if it gets too big, switch to dense. -- This is an automated message from the Apache Git Se

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-29 Thread via GitHub
mikemccand commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1784153400 +1 to move to `oal.index`, and make it package private if possible? `ByteSlicePool` name sounds good to me :) Naming is the hardest part! -- This is an automated message from the

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-29 Thread via GitHub
mikemccand commented on code in PR #12506: URL: https://github.com/apache/lucene/pull/12506#discussion_r1375466035 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -129,21 +143,22 @@ public ByteBlockPool(Allocator allocator) { } /** - * Resets t

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-29 Thread via GitHub
mikemccand commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1784156071 > Below are the results(looks all good to me). Let me know what do you think? Thanks! +1 -- looks like just noise to me. -- This is an automated message from the Apache Git S

[I] Should reseting a ByteBlockPool zero out the buffers? [lucene]

2023-10-29 Thread via GitHub
stefanvodita opened a new issue, #12734: URL: https://github.com/apache/lucene/issues/12734 ### Description `ByteBlockPool.reset` can fill the buffers we're recycling with zeros. 1. Do we need the buffers to be filled with zeros? Is there some implicit assumption if we were to reus

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-29 Thread via GitHub
stefanvodita commented on code in PR #12506: URL: https://github.com/apache/lucene/pull/12506#discussion_r1375520679 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -129,21 +143,22 @@ public ByteBlockPool(Allocator allocator) { } /** - * Resets

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-29 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1784239004 Thanks @mikemccand! I just pushed a commit that does that move. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-29 Thread via GitHub
dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1784418885 If we are to move to value-based LRU cache and no longer fall back to reading FST when items are not in the map, I'm wondering why wouldn't we just use LinkedHashMap (or any doubly

[PR] Clean up inputCount [lucene]

2023-10-30 Thread via GitHub
dungba88 opened a new pull request, #12735: URL: https://github.com/apache/lucene/pull/12735 ### Description Clean-up inputCount as it no longer has an active use -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
dweiss commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1784860286 I'd check whether there's much gain from the switch first. Fill-up-then-discard caches often perform quite well and allow for much easier/faster implementation (both in terms of GC o

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12727: URL: https://github.com/apache/lucene/pull/12727#issuecomment-1785060096 @ChrisHegarty added a test for verifying VectorSimilarityFunction returns scores `>= 0`. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785145823 > ha! So just removing the overly aggressive unrolling in cosine improves things. well, only in combination with switch to FMA. seems then its able to keep cpu busy multiplying.

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
dungba88 commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1785150257 I ran a small test to see what RAM would be needed for some sample dictionary using a simple `LinkedHashMap`: 6MB Cache size 62457 items 977KB FST size The repor

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
jpountz commented on PR #12727: URL: https://github.com/apache/lucene/pull/12727#issuecomment-1785162374 I had a suspicion that the double promotion is not buying us anything in that case, so I ran a quick test that seems to confirm it: ```java long equals = 0; long notEquals =

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785163931 > .. and yes (I've not forgotten), we need something like a `java.lang.Architecture/Platform`, that is queryable for such low-level support (rather than resorting to beans - which actually

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785178856 Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "bail out

Re: [PR] Add back maxConn & beamWidth HNSW codec ctor [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12728: URL: https://github.com/apache/lucene/pull/12728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1785207766 @shubhamvishu I will merge and backport today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
jpountz commented on code in PR #12727: URL: https://github.com/apache/lucene/pull/12727#discussion_r1376230730 ## lucene/core/src/test/org/apache/lucene/util/TestVectorUtil.java: ## @@ -115,6 +116,21 @@ public void testNormalizeZeroThrows() { expectThrows(IllegalArgumentEx

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12726: URL: https://github.com/apache/lucene/pull/12726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-30 Thread via GitHub
benwtrent merged PR #12727: URL: https://github.com/apache/lucene/pull/12727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Should we handle negative scores due to floating point arithmetic errors? [lucene]

2023-10-30 Thread via GitHub
benwtrent closed issue #12700: Should we handle negative scores due to floating point arithmetic errors? URL: https://github.com/apache/lucene/issues/12700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1376303898 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785376897 @jpountz the goal of this change is not just making code reusable. But: - Allowing folks who don't want HNSW to take advantage of the per-segment quantization and logic. Paging

Re: [PR] Add a specialized bulk scorer for regular conjunctions. [lucene]

2023-10-30 Thread via GitHub
jpountz merged PR #12719: URL: https://github.com/apache/lucene/pull/12719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
uschindler commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785453474 > Last time i tried to figure out WTF was happening here, I think i determined that floating point reproducibility was still preventing this from happening? That there isn't like a "b

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12731: URL: https://github.com/apache/lucene/pull/12731#issuecomment-1785549757 > I think the Panama API should allow the user to figure out how many parallel units are available to somehow dynamically split work correctly. I'm not even sure openjdk/hotspot know

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785563758 OK, @jpountz thinking about it more. To do what you are suggesting, I think the following would work: - Force Lucene99HnswVectorsReader & Lucene99HnswVectorsWriter to take a `F

Re: [PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-30 Thread via GitHub
asfgit merged PR #12731: URL: https://github.com/apache/lucene/pull/12731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-30 Thread via GitHub
mikemccand commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1785695227 > Perhaps instead of UnCompiledNode, we could encode it as byte-array (could take the same format as the FST-encoded binary, but the FST operation works on absolute address value

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785712360 Thanks, splitting the way you describe would make me happy. I had not understood that the flat codec was a goal. Now that I think more about it, I wonder if we should better separa

Re: [PR] Upgrade dependencies to address more CVEs [lucene-solr]

2023-10-30 Thread via GitHub
risdenk merged PR #2681: URL: https://github.com/apache/lucene-solr/pull/2681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1785924607 +1 looks good to me as well. I like that this small change, 1) makes the API a little more general, allowing users to provide any Iterable instead of Collection, and 2) adds an explicit

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller merged PR #12427: URL: https://github.com/apache/lucene/pull/12427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] DaciukMihovAutomatonBuilder#build should probably take a List instead of a Collection [lucene]

2023-10-30 Thread via GitHub
gsmiller closed issue #12319: DaciukMihovAutomatonBuilder#build should probably take a List instead of a Collection URL: https://github.com/apache/lucene/issues/12319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix explicit type declaration [lucene-solr]

2023-10-30 Thread via GitHub
nvnmandadhi closed pull request #399: fix explicit type declaration URL: https://github.com/apache/lucene-solr/pull/399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-30 Thread via GitHub
daviscook477 opened a new pull request, #12736: URL: https://github.com/apache/lucene/pull/12736 ### Description The [javadoc for Monitor.getQuery](https://github.com/apache/lucene/blob/a0887c7d26df6c9f32afcf8e9f0ff66275115f92/lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1786059581 @jpountz updated. Flat is no longer pluggable, two HNSW formats are exposed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Fix NullPointerException in Monitor.getQuery when query is not present [lucene]

2023-10-30 Thread via GitHub
romseygeek commented on PR #12736: URL: https://github.com/apache/lucene/pull/12736#issuecomment-1786088337 This looks great, thank you @daviscook477! Would you be able to add an entry to CHANGES.txt under the 9.9.0 release? -- This is an automated message from the Apache Git Service. T

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786392160 with all the data dependencies removed, i also gave at least one stab trying to see if i could trick the compiler into using packed instructions instead of single floats... would be awesom

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1786395055 e.g. for dotproduct case, with this patch, despite there being no data dependencies, compiler literally does 4 `VFMADD*SS` in the loop with different xmm registers. Instead of just doing 1

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-10-30 Thread via GitHub
rmuir commented on code in PR #12737: URL: https://github.com/apache/lucene/pull/12737#discussion_r1377027128 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java: ## @@ -17,72 +17,46 @@ package org.apache.lucene.internal.vectorizatio

<    12   13   14   15   16   17   18   19   20   21   >