Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378670256 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } -

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1788822945 So, I replicated the jvector benchmark (the lucene part) using the new int8 quantization. Note, this is with `0` fan out or extra top-k gathered. Since the benchmark on JV

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378703915 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378704913 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [I] Would SIMD powered sort (on top of Panama) be worth it? [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on issue #12399: URL: https://github.com/apache/lucene/issues/12399#issuecomment-1788845861 Some exciting updates here ... Intel continues to improve this "super fast sorting using SIMD instructions" library: https://www.phoronix.com/news/Intel-x86-simd-sort-4.0

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-179191 I see reasonable speedups on both x64 and ARM, but sadly see no vectorization in the disassembly. The speed seems to come from the 2x instructions/pipelining (?) per strip mined lo

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788910222 thank you for the explanation, so now it just adds another mystery for me. Your x64 is still i5-11400 ? I will look into this more. Yes on the ARM, the assembly didn't look like wha

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788919960 and what is the compiler's excuse this time? :) no floating point here. I should just be able to write a simple loop and get good performance! -- This is an automated message from the Ap

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378809307 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789007292 > Your x64 is still i5-11400 ? I will look into this more. Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789014829 jmh prof output from my Linux x64 rocket lake - https://gist.github.com/ChrisHegarty/508bb1857cb50df0d757f711c81fd740 -- This is an automated message from the Apache Git Service.

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378848941 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378841828 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +323,128 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

[I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash() [lucene]

2023-11-01 Thread via GitHub
dungba88 opened a new issue, #12744: URL: https://github.com/apache/lucene/issues/12744 ### Description Spawn from https://github.com/apache/lucene/pull/12738 It seems on [rehash](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHas

Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]

2023-11-01 Thread via GitHub
easyice commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1789106681 Oh.. Group-varint is a interesting encoder, I'd love to try it later in the week -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789218896 Something weird is happening. This commit is causing the following failure: ``` ./gradlew test --tests TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource -Dtests.seed

Re: [PR] speedup arm int functions? [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789229323 thanks @ChrisHegarty ! I will try to dig in more to this. Again i don't see speedups on my x86 so something has changed (maybe architecture) thats allowing more to happen in parallel on yo

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789264349 @benwtrent is this an issue of the code only being partially rebuilt? The signature for `Automata#makeStringUnion` was changed to accept a more general `Iterable` as opposed to a `Colle

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789266991 @gsmiller you are 100% correct 🤦 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789348091 @benwtrent glad it was a simple fix. Sorry it created churn! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]

2023-11-01 Thread via GitHub
asubbu90 commented on issue #12449: URL: https://github.com/apache/lucene/issues/12449#issuecomment-1789370514 Hi @robro612 , you can see I have already opened a PR #12507 on this issue. Do you want to have more context on this? -- This is an automated message from the Apache Git Service.

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789483634 Hi @rmuir, I am fine with both approaches. Let me just create a new PR with some changes for the Constants.java class and a separate pkg private utility class for looking up JVM ar

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
dweiss commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789519097 See #12742 - it's a problem with out javac task settings. Running a clean is a workaround. I will provide a fix for this when I get a chance, it's not a critical issue (but it is a bug in

[PR] Fix javac task inputs so that they include modular dependencies #12742 [lucene]

2023-11-01 Thread via GitHub
dweiss opened a new pull request, #12745: URL: https://github.com/apache/lucene/pull/12745 Fixes #12742., -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors [lucene]

2023-11-01 Thread via GitHub
dweiss commented on issue #12742: URL: https://github.com/apache/lucene/issues/12742#issuecomment-1789587275 I've provided a PR that fixes this. It is a corner case of us providing custom javac parameters (modular classpath). I am surprised this took so long to be discovered and I apologize

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-01 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379282330 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

[I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
benwtrent opened a new issue, #12746: URL: https://github.com/apache/lucene/issues/12746 ### Description TestDeletionPolicy.testOpenPriorSnapshot fails. Assertion fails on assuming we are on a previous commit with more than one leaf. Fails on main and 9.x. stack

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
benwtrent commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789639897 I verified this doesn't happen in 9.8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
jpountz closed issue #12746: TestDeletionPolicy.testOpenPriorSnapshot failing URL: https://github.com/apache/lucene/issues/12746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]

2023-11-01 Thread via GitHub
jpountz commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789651692 Thanks for reporting, I pushed a fix at https://github.com/apache/lucene/commit/66324f763fc7fb0d8e7cd6f334e5438f0171c84e. -- This is an automated message from the Apache Git Servi

Re: [PR] Speed up disjunctions by computing estimations of the score of the k-th top hit up-front. [lucene]

2023-11-01 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1789662266 @mikemccand FYI I gave a try at adding some interesting boolean queries to nightly benchmarks at https://github.com/mikemccand/luceneutil/pull/240. -- This is an automated message fro

Re: [PR] Clean up UnCompiledNode.inputCount [lucene]

2023-11-01 Thread via GitHub
dungba88 closed pull request #12735: Clean up UnCompiledNode.inputCount URL: https://github.com/apache/lucene/pull/12735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-01 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789905123 OK. sounds good to me @uschindler . I agree the user should be in control. And I'd like to avoid us adding tons of flags for this stuff unless we need it for testing. that's just my opinio

[PR] stabilize vectorutil benchmark [lucene]

2023-11-01 Thread via GitHub
rmuir opened a new pull request, #12747: URL: https://github.com/apache/lucene/pull/12747 This benchmark is too noisy across forks which makes comparisons impossible and misleading. Especially vectorized float methods on some machines: it is almost useless. I spent some time to reduc

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12747: URL: https://github.com/apache/lucene/pull/12747#issuecomment-1790041666 and i know it is annoying it runs slower, i really tried to not be overkill, but some methods here are super simple and super stable and some are unstable. So I gave all methods minimum 3

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-01 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1790097686 ARM Graviton3E (256-bit SVE) Master (well #12747 for sanity): ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.

Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]

2023-11-01 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1790129043 Nice!I like it how we discovered a build issue bug(which was hidden for sometime now I think) and fixed it. Thanks for solving @dweiss! -- This is an automated message from th

Re: [PR] Fix javac task inputs so that they include modular dependencies #12742 [lucene]

2023-11-02 Thread via GitHub
dweiss merged PR #12745: URL: https://github.com/apache/lucene/pull/12745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors [lucene]

2023-11-02 Thread via GitHub
dweiss closed issue #12742: JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors URL: https://github.com/apache/lucene/issues/12742 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1379747014 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -24,8 +24,14 @@ @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(T

Re: [PR] Clean up UnCompiledNode.inputCount [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12735: URL: https://github.com/apache/lucene/pull/12735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1790294665 Thanks for tackling this / persisting @slow-J, especially the glorious fun experience of having to "bump" the Codec version ;) A nice rite-of-passage in this Lucene world! -- This

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1790333126 Thanks @dungba88 -- I will review! But first I tried running `IndexToFST` (recently born helper tool, now in luceneutil) on a `wikimediumall` index, creating the FST from all of

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1790354114 Yes, I just noticed that, and pushed out a fix. Seems like I was using the primary table pos instead of the fallback pos. And I added an assertion to catch it earlier. -- This

[PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
easyice opened a new pull request, #12748: URL: https://github.com/apache/lucene/pull/12748 This PR resolves issue: https://github.com/apache/lucene/issues/12701 . Thanks for the cool idea from @gf2121 It need some more benchmarking. -- This is an automated message from the Apache

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379769524 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -145,7 +145,7 @@ private FSTCompiler( if (suffixRAMLimitMB < 0) { throw new I

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1790397736 This looks cool! Sorry, I caused conflicts w/ the earlier merge -- could you please resolve those @easyice? I'm happy to try benchmarking it, using the new `IndexToFST` tool in luce

Re: [PR] LUCENE-10125: Another idea of DirectWriter (v3) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #333: URL: https://github.com/apache/lucene/pull/333#issuecomment-1790411023 @uschindler -- can we close out these old cool `DirectWriter` optimization ideas/PRs? Are they stale now? `refCount` dropped to 0 but we failed to GC? -- This is an automated message

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790414765 @jprinet -- thank you for this PR, and sorry for the insanely slow response. Is this still relevant/helpful? I don't like how slow our gradle builds are, so if we can make it faster, th

Re: [PR] LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (9.0.1 Backporting) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #587: URL: https://github.com/apache/lucene/pull/587#issuecomment-1790417301 @zacharymorn -- yikes, did we fail to backport this bugfix for so long? Is it worth backporting now, or was it separately fixed maybe? -- This is an automated message from the Apache G

Re: [PR] LUCENE-10059: Additional fix to handle n_best backtrace [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #284: URL: https://github.com/apache/lucene/pull/284#issuecomment-1790419043 @jimczi -- is this still relevant? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Create ConjunctionDISI:patcher [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #730: Create ConjunctionDISI:patcher URL: https://github.com/apache/lucene/pull/730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Create ConjunctionDISI:patcher [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #730: URL: https://github.com/apache/lucene/pull/730#issuecomment-1790421970 Thanks for the idea @ldkjdk! It looks like we are unsure this is helpful in the general case ... I'll close the PR for now. Please re-open if you feel strongly otherwise? -- This is

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1790428679 @mikemccand Thanks for your quick reply! the conflicts has resolved, any comment is welcomed! -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Use similarity.tf() in MoreLikeThis [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1790429476 It looks like this PR is a nice improvement to MLQ quality, and we agree we should just enable it by default (`Similarity` can turn it off if the old way is really needed), and the PR is

Re: [PR] GameGenie:1990JMH [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #365: URL: https://github.com/apache/lucene/pull/365#issuecomment-1790433776 This looks really awesome @markrmiller -- we are perpetually in need of better benchmarking tools! -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790441866 This is a cool idea @jpountz! And `OrdinalMap` construction is important, e.g. SSDV faceting uses it on every refresh, merging uses it, etc. Maybe let's revive it? :) -- This is an

Re: [PR] LUCENE-10616: optimizing decompress when only retrieving some fields [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #1003: URL: https://github.com/apache/lucene/pull/1003#issuecomment-1790451795 Is this change still relevant? Or did we achieve laziness on subset of stored fields in a different way maybe? Thanks @JoeHF! > no obvious regression or perf improvement, guess

Re: [PR] LUCENE-10634: Speed up WANDScorer. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #999: URL: https://github.com/apache/lucene/pull/999#issuecomment-1790453084 @jpountz was this change superseded or so? Can we close this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc… [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #968: URL: https://github.com/apache/lucene/pull/968#issuecomment-1790460458 This sounds like a nice optimization @wuwm! Is it still relevant? Lucene's nightly benchmarks include [somewhat sparse documents (NYC taxi database)](https://home.apache.org/~mikem

Re: [PR] LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #955: LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec URL: https://github.com/apache/lucene/pull/955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #955: URL: https://github.com/apache/lucene/pull/955#issuecomment-1790464520 Somewhat related to [this newish issue](https://github.com/apache/lucene/issues/12740) (how to configure concurrent HNSW graph building). Let's stick with the straightforward "pass

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1790470336 Oooh I missed this @uschindler -- it looks like a nice possible opto for the costly `BytesRefHash` methods, and it looks like (on the issue) you and @rmuir came to agreement on approach (

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #857: URL: https://github.com/apache/lucene/pull/857#issuecomment-1790474770 @dweiss is this still relevant? It looks like the original issue was hard to repro too... -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
dweiss closed pull request #857: LUCENE-10548: Weird errors launching gradlew URL: https://github.com/apache/lucene/pull/857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
dweiss commented on PR #857: URL: https://github.com/apache/lucene/pull/857#issuecomment-1790479768 I'm closing it. I don't think we can reproduce the original issue so let's not worry about it. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] LUCENE-10425:PostingsEnum supports to return current index of postings [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #688: LUCENE-10425:PostingsEnum supports to return current index of postings URL: https://github.com/apache/lucene/pull/688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] LUCENE-10425:PostingsEnum supports to return current index of postings [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #688: URL: https://github.com/apache/lucene/pull/688#issuecomment-1790484960 Thanks @wjp719. It looks like this is a nice opto for narrow use cases and the concern is adding a new API, especially to such a hot class as `PostingsEnum`, needs to meet a high b

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
dweiss commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790485212 I think some of it has been integrated already. If not, I'll take a look and go through the changes @jprinet made. It's a shame it took so long, apologies, @jprinet ! > I don't like ho

Re: [PR] LUCENE-10322: Enable -Xlint:path and -Xlint:-exports [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1790489894 > > Yeah, those are actually API bugs? > > They do look like API issues to me. Useful warning, by the way. It's awesome that this change uncovered such API bugs! Thanks @spik

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
benwtrent commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379919203 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throw

Re: [PR] LUCENE-10144:fix resource leak due to Files.list [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #354: URL: https://github.com/apache/lucene/pull/354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] LUCENE-10144:fix resource leak due to Files.list [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #354: URL: https://github.com/apache/lucene/pull/354#issuecomment-1790501879 Whoa, sneaky -- indeed the `Stream` returned from `Files.list` must be closed (it holds a `DirectoryStream` open under-the-hood)! I grep'd Lucene's sources for other places we use `

Re: [PR] LUCENE-10133: Specialize the write path for sorted doc values. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #330: URL: https://github.com/apache/lucene/pull/330#issuecomment-1790506129 @jpountz this PR looks still relevant? Are we still (unnecessarily) computing min, max, gcd, unique values for `SORTED` DVs? -- This is an automated message from the Apache Git Service

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #319: URL: https://github.com/apache/lucene/pull/319#issuecomment-1790508003 @jpountz is this still relevant? There have been lots of optos to `WANDScorer` lately... maybe this is already essentially done? -- This is an automated message from the Apache Git Ser

Re: [PR] LUCENE-10100: same as 10091 Fix some old errors in the main branch [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #301: URL: https://github.com/apache/lucene/pull/301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] configuration items of the alg file are adapted to the 9.0 branch [LUCENE-10100] [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #11138: URL: https://github.com/apache/lucene/issues/11138#issuecomment-1790519853 Merged to 10.0 and 9.9.0. Thanks @xiaoshi2013! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-02 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1790521486 O, I forgot about this PR. When looking at the conflicts it looks like I need to redo at least the BytesRefHash/Pool code. We can use native order at all places where it is only

Re: [PR] LUCENE-10099: Add -Ptests.asyncprofile option. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #295: URL: https://github.com/apache/lucene/pull/295#issuecomment-1790521876 Oh how nice it would be to have async profiling out of the box in a Lucene clone @markrmiller! -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379938642 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +117,39 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] LUCENE-10086: Fix an AssertionError when KoreanTokenizer tries to backtrace from and to the same position [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #285: URL: https://github.com/apache/lucene/pull/285#issuecomment-1790522765 @jimczi it looks like this PR is close? A small comment, and some conflicts to resolve? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #266: URL: https://github.com/apache/lucene/pull/266#issuecomment-1790526613 @jpountz it looks like this one is super-close, and a nice improvement to TMP's default behavior? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1790534046 Hi @zacharymorn -- this change is awesome! The world of servers has rapidly become massively concurrent and Lucene has (generally) been slow to adopt it. I like this hardish switch to t

Re: [PR] LUCENE-10018 Introduce DocTermVectors in lieu of Fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #216: URL: https://github.com/apache/lucene/pull/216#issuecomment-1790537963 It looks like we are abandoning this idea -- too much new API surface area added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] LUCENE-10018 Introduce DocTermVectors in lieu of Fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #216: LUCENE-10018 Introduce DocTermVectors in lieu of Fields. URL: https://github.com/apache/lucene/pull/216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] LUCENE-8682: remove deprecated WordDelimiterFilter[Factory] classes [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #202: URL: https://github.com/apache/lucene/pull/202#issuecomment-1790540423 > but I don't think WordDelimiterGraphFilter is a full replacement for WordDelimiterFilter since it can't be used in conjunction with other filters that consume or produce graphs, like Sy

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #187: URL: https://github.com/apache/lucene/pull/187#issuecomment-1790542516 Thanks @asalamon74 -- looks like we shouldn't fix this in Lucene, but instead Solr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #187: LUCENE-10005: Improve AlreadyClosedException logging URL: https://github.com/apache/lucene/pull/187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] LUCENE-10001: Make CollectionTerminatedException handling in MultiCollector configurable [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #181: URL: https://github.com/apache/lucene/pull/181#issuecomment-1790547178 @gsmiller what should we do with this PR? Are you working on the alternative (wrapping?) approach? Should we close this PR and later open that approach? Or leave this one open...? Tha

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1790552171 Thanks @ChristophKaser and sorry for this very late reply! I like this idea -- Replication is so tricky to debug. This now has conflicts unfortunately -- do you want to refresh the PR t

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379960873 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] LUCENE-9869 allow for configuring a custom cache purge scheduler in Monitor (aka Luwak) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #99: URL: https://github.com/apache/lucene/pull/99#issuecomment-1790557226 This sounds reasonable to me @pawel-bugalski-dynatrace but I'm not familiar with Monitor/Luwak's code. It looks like there are conflicts -- is this PR still relevant? Thanks @pawel-bugals

Re: [PR] LUCENE-9798 : Fix looping bug when calculating full KNN results in KnnGraphTester [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #83: URL: https://github.com/apache/lucene/pull/83#issuecomment-1790559217 Thanks @nitirajrathore! This class has since moved to `luceneutil` I think? Do you know if this bug was resolved there? If not, could you maybe port this PR over to `luceneutil`? Thanks.

Re: [PR] Remove or repurpose obsolete JIRA tasks from release wizard [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11833: URL: https://github.com/apache/lucene/pull/11833#issuecomment-1790562631 Oooh thank you for the attention to detail here @msokolov! RM'ing a Lucene release is another rite-of-passage for each of us :) Since this PR was created there have been 4 more

Re: [PR] NeighborArray is now fixed size [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1790565433 Wow, lots of fun discussion here, including specifics of how Java conditionals are evaluated. @msokolov is this still relevant? The HNSW code has been red-hot lately; maybe this cha

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11855: URL: https://github.com/apache/lucene/pull/11855#issuecomment-1790569350 Since 1) this looks like a great cleanup, 2) it's been approved, 3) it was already merged in Solr (thanks @janhoy for bringing to Lucene's release wizard too!), and 4) no conflicts er

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #11855: URL: https://github.com/apache/lucene/pull/11855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Luke web interface [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #11851: URL: https://github.com/apache/lucene/issues/11851#issuecomment-1790574821 > the Swing UI made me feel like I had stepped into a car with Marty McFly HA! -- This is an automated message from the Apache Git Service. To respond to the message, ple

<    14   15   16   17   18   19   20   21   22   23   >