[PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma opened a new pull request, #12724: URL: https://github.com/apache/lucene/pull/12724 Addresses #12708 `xml.TestCoreParser#testSpanNearQueryWithoutSlopXML` fails because of changed exception message Java 22 EA. This change removes the test's dependency on Java exception me

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1782426261 Made a small change to assert on the exception type instead of checking the exception message string. -- This is an automated message from the Apache Git Service. To respond

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-27 Thread via GitHub
dungba88 commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1782469977 I put a new revision with support for DataOutput and FileChannel. When using DataOutput, if suffix sharing is enabled one also needs to pass a RandomAccessInput for reading.

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-27 Thread via GitHub
bruno-roustant commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1782623703 Thank you @shubhamvishu for these experiments. The table answers exactly to the questions. And it means actually there is no point to change the bit mixing, since it does not b

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1782810001 @bruno-roustant Absolutely! I totally agree with your point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-27 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1782814872 FWIW I could reproduce the speedup from disabling patching locally on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1782977506 @gsmiller I finally got a chance to run the benchmarks for this change. Below are the results(looks all good to me). Let me know what do you think? Thanks! ```

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-27 Thread via GitHub
msokolov commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1782983831 as for the renaming maybe we can look to do it as a followup PR? IHnswGraphSearcher -> HnswGraphSearcher? Or maybe we can just use HnswSearcher right now? Re: controlling concurr

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
msokolov commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374646933 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchMin

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-10-27 Thread via GitHub
jpountz commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1783020278 Does this actually matter for performance? My gut feeling is that either a value has a long postings list, and then the vast majority of blocks will be encoded with PFOR and should

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-27 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1783135109 I've not been able to spend all that much time on this this week, but here's my current thinking. The abstractions in the PR are currently not great (as discussed above), but

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-10-27 Thread via GitHub
easyice commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1783202938 @jpountz Thanks for your explanation, i got some flame graph that shows the `readVIntBlock` takes up a bit large proportion, I'll try to reproduce it with some mocked data -- Thi

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on PR #12723: URL: https://github.com/apache/lucene/pull/12723#issuecomment-1783206971 > this seems promising. I guess the only cost is the increased memory needed because we create the FixedBitSet? Can you say how large this might get in the worst case? @msokolo

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374824046 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchM

Re: [PR] Optimize: Use DocIdSetIterator Reduce bkd docvalues iteration [lucene]

2023-10-27 Thread via GitHub
luyuncheng commented on code in PR #12723: URL: https://github.com/apache/lucene/pull/12723#discussion_r1374824046 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java: ## @@ -216,7 +216,7 @@ private static class BKDPointTree implements PointTree { scratchM

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
uschindler commented on PR #12724: URL: https://github.com/apache/lucene/pull/12724#issuecomment-1783287980 Should I merge and backport it or do you want to do it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
shubhamvishu opened a new pull request, #12726: URL: https://github.com/apache/lucene/pull/12726 ### Description While going through [VectorUtil](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java) class, I observed we don't have a

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
benwtrent commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1783318072 @shubhamvishu could you add a "CHANGES.txt" entry under optimizations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] Ensure negative scores are not returned by vector similarity functions [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12727: URL: https://github.com/apache/lucene/pull/12727 We shouldn't ever return negative scores from vector similarity functions. Given vector panama and nearly antipodal float[] vectors, it is possible that cosine and (normalized) dot-product become s

[PR] Add back maxConn & beamWidth HNSW codec ctor [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12728: URL: https://github.com/apache/lucene/pull/12728 follow up to https://github.com/apache/lucene/pull/12582 For user convenience, I added back the two parameter ctor for the HNSW codec. -- This is an automated message from the Apache Git Se

Re: [PR] Return the same input vector if its a unit vector in VectorUtil#l2normalize [lucene]

2023-10-27 Thread via GitHub
shubhamvishu commented on PR #12726: URL: https://github.com/apache/lucene/pull/12726#issuecomment-1783364698 Oh nice! @benwtrent I have added a CHANGES.txt entry under optimizations now. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12729: URL: https://github.com/apache/lucene/pull/12729 Currently the HNSW codec does too many things, it not only indexes vectors, but stores them and determines how to store them given the vector type. This PR extracts out the vector storage int

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-27 Thread via GitHub
stefanvodita commented on code in PR #12506: URL: https://github.com/apache/lucene/pull/12506#discussion_r1375044479 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -170,42 +191,42 @@ public void reset(boolean zeroFillBuffers, boolean reuseFirst) { }

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-27 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1783530943 I’ve integrated most of the suggestions. There’s just the matter of the name for the slice pool class and the right package for it. I don’t have a strong opinion on this. Maybe we m

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma merged PR #12724: URL: https://github.com/apache/lucene/pull/12724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma opened a new pull request, #12730: URL: https://github.com/apache/lucene/pull/12730 Backport of fix in #12724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on PR #12730: URL: https://github.com/apache/lucene/pull/12730#issuecomment-1783652392 Backport change already approved in https://github.com/apache/lucene/pull/12724. Merging. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Remove test dependency on Java default exception message [lucene]

2023-10-27 Thread via GitHub
vigyasharma merged PR #12730: URL: https://github.com/apache/lucene/pull/12730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

Re: [PR] Fix test failures for TestCoreParser#testSpanNearQueryWithoutSlopXML [lucene]

2023-10-27 Thread via GitHub
vigyasharma commented on PR #12724: URL: https://github.com/apache/lucene/pull/12724#issuecomment-1783653077 I've backported this to `branch_9x`. Do we need it in any other branches? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Speedup float cosine vectors, use FMA where fast and available to reduce error [lucene]

2023-10-27 Thread via GitHub
rmuir opened a new pull request, #12731: URL: https://github.com/apache/lucene/pull/12731 The intel fma is nice, and its easier to reason about when looking at assembly. We basically reduce the error for free where its available. Along with another change (reducing the unrolling for cosine,

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-27 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1783709669 Thanks Mike and Ben for reviewing! I'll try to merge and backport it tomorrow. And create a issue for future refactoring -- This is an automated message from the Apache Git Service. To r