Re: [PR] Fix javac task inputs so that they include modular dependencies #12742 [lucene]

2023-11-02 Thread via GitHub
dweiss merged PR #12745: URL: https://github.com/apache/lucene/pull/12745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors [lucene]

2023-11-02 Thread via GitHub
dweiss closed issue #12742: JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors URL: https://github.com/apache/lucene/issues/12742 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1379747014 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -24,8 +24,14 @@ @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(T

Re: [PR] Clean up UnCompiledNode.inputCount [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12735: URL: https://github.com/apache/lucene/pull/12735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1790294665 Thanks for tackling this / persisting @slow-J, especially the glorious fun experience of having to "bump" the Codec version ;) A nice rite-of-passage in this Lucene world! -- This

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1790333126 Thanks @dungba88 -- I will review! But first I tried running `IndexToFST` (recently born helper tool, now in luceneutil) on a `wikimediumall` index, creating the FST from all of

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1790354114 Yes, I just noticed that, and pushed out a fix. Seems like I was using the primary table pos instead of the fallback pos. And I added an assertion to catch it earlier. -- This

[PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
easyice opened a new pull request, #12748: URL: https://github.com/apache/lucene/pull/12748 This PR resolves issue: https://github.com/apache/lucene/issues/12701 . Thanks for the cool idea from @gf2121 It need some more benchmarking. -- This is an automated message from the Apache

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379769524 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -145,7 +145,7 @@ private FSTCompiler( if (suffixRAMLimitMB < 0) { throw new I

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1790397736 This looks cool! Sorry, I caused conflicts w/ the earlier merge -- could you please resolve those @easyice? I'm happy to try benchmarking it, using the new `IndexToFST` tool in luce

Re: [PR] LUCENE-10125: Another idea of DirectWriter (v3) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #333: URL: https://github.com/apache/lucene/pull/333#issuecomment-1790411023 @uschindler -- can we close out these old cool `DirectWriter` optimization ideas/PRs? Are they stale now? `refCount` dropped to 0 but we failed to GC? -- This is an automated message

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790414765 @jprinet -- thank you for this PR, and sorry for the insanely slow response. Is this still relevant/helpful? I don't like how slow our gradle builds are, so if we can make it faster, th

Re: [PR] LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (9.0.1 Backporting) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #587: URL: https://github.com/apache/lucene/pull/587#issuecomment-1790417301 @zacharymorn -- yikes, did we fail to backport this bugfix for so long? Is it worth backporting now, or was it separately fixed maybe? -- This is an automated message from the Apache G

Re: [PR] LUCENE-10059: Additional fix to handle n_best backtrace [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #284: URL: https://github.com/apache/lucene/pull/284#issuecomment-1790419043 @jimczi -- is this still relevant? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Create ConjunctionDISI:patcher [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #730: Create ConjunctionDISI:patcher URL: https://github.com/apache/lucene/pull/730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Create ConjunctionDISI:patcher [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #730: URL: https://github.com/apache/lucene/pull/730#issuecomment-1790421970 Thanks for the idea @ldkjdk! It looks like we are unsure this is helpful in the general case ... I'll close the PR for now. Please re-open if you feel strongly otherwise? -- This is

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-02 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1790428679 @mikemccand Thanks for your quick reply! the conflicts has resolved, any comment is welcomed! -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Use similarity.tf() in MoreLikeThis [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1790429476 It looks like this PR is a nice improvement to MLQ quality, and we agree we should just enable it by default (`Similarity` can turn it off if the old way is really needed), and the PR is

Re: [PR] GameGenie:1990JMH [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #365: URL: https://github.com/apache/lucene/pull/365#issuecomment-1790433776 This looks really awesome @markrmiller -- we are perpetually in need of better benchmarking tools! -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790441866 This is a cool idea @jpountz! And `OrdinalMap` construction is important, e.g. SSDV faceting uses it on every refresh, merging uses it, etc. Maybe let's revive it? :) -- This is an

Re: [PR] LUCENE-10616: optimizing decompress when only retrieving some fields [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #1003: URL: https://github.com/apache/lucene/pull/1003#issuecomment-1790451795 Is this change still relevant? Or did we achieve laziness on subset of stored fields in a different way maybe? Thanks @JoeHF! > no obvious regression or perf improvement, guess

Re: [PR] LUCENE-10634: Speed up WANDScorer. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #999: URL: https://github.com/apache/lucene/pull/999#issuecomment-1790453084 @jpountz was this change superseded or so? Can we close this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc… [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #968: URL: https://github.com/apache/lucene/pull/968#issuecomment-1790460458 This sounds like a nice optimization @wuwm! Is it still relevant? Lucene's nightly benchmarks include [somewhat sparse documents (NYC taxi database)](https://home.apache.org/~mikem

Re: [PR] LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #955: LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec URL: https://github.com/apache/lucene/pull/955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] LUCENE-10612: Introduced Lucene93CodecParameters for Lucene93Codec [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #955: URL: https://github.com/apache/lucene/pull/955#issuecomment-1790464520 Somewhat related to [this newish issue](https://github.com/apache/lucene/issues/12740) (how to configure concurrent HNSW graph building). Let's stick with the straightforward "pass

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1790470336 Oooh I missed this @uschindler -- it looks like a nice possible opto for the costly `BytesRefHash` methods, and it looks like (on the issue) you and @rmuir came to agreement on approach (

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #857: URL: https://github.com/apache/lucene/pull/857#issuecomment-1790474770 @dweiss is this still relevant? It looks like the original issue was hard to repro too... -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
dweiss closed pull request #857: LUCENE-10548: Weird errors launching gradlew URL: https://github.com/apache/lucene/pull/857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] LUCENE-10548: Weird errors launching gradlew [lucene]

2023-11-02 Thread via GitHub
dweiss commented on PR #857: URL: https://github.com/apache/lucene/pull/857#issuecomment-1790479768 I'm closing it. I don't think we can reproduce the original issue so let's not worry about it. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] LUCENE-10425:PostingsEnum supports to return current index of postings [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #688: LUCENE-10425:PostingsEnum supports to return current index of postings URL: https://github.com/apache/lucene/pull/688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] LUCENE-10425:PostingsEnum supports to return current index of postings [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #688: URL: https://github.com/apache/lucene/pull/688#issuecomment-1790484960 Thanks @wjp719. It looks like this is a nice opto for narrow use cases and the concern is adding a new API, especially to such a hot class as `PostingsEnum`, needs to meet a high b

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
dweiss commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790485212 I think some of it has been integrated already. If not, I'll take a look and go through the changes @jprinet made. It's a shame it took so long, apologies, @jprinet ! > I don't like ho

Re: [PR] LUCENE-10322: Enable -Xlint:path and -Xlint:-exports [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1790489894 > > Yeah, those are actually API bugs? > > They do look like API issues to me. Useful warning, by the way. It's awesome that this change uncovered such API bugs! Thanks @spik

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
benwtrent commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379919203 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throw

Re: [PR] LUCENE-10144:fix resource leak due to Files.list [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #354: URL: https://github.com/apache/lucene/pull/354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] LUCENE-10144:fix resource leak due to Files.list [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #354: URL: https://github.com/apache/lucene/pull/354#issuecomment-1790501879 Whoa, sneaky -- indeed the `Stream` returned from `Files.list` must be closed (it holds a `DirectoryStream` open under-the-hood)! I grep'd Lucene's sources for other places we use `

Re: [PR] LUCENE-10133: Specialize the write path for sorted doc values. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #330: URL: https://github.com/apache/lucene/pull/330#issuecomment-1790506129 @jpountz this PR looks still relevant? Are we still (unnecessarily) computing min, max, gcd, unique values for `SORTED` DVs? -- This is an automated message from the Apache Git Service

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #319: URL: https://github.com/apache/lucene/pull/319#issuecomment-1790508003 @jpountz is this still relevant? There have been lots of optos to `WANDScorer` lately... maybe this is already essentially done? -- This is an automated message from the Apache Git Ser

Re: [PR] LUCENE-10100: same as 10091 Fix some old errors in the main branch [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #301: URL: https://github.com/apache/lucene/pull/301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] configuration items of the alg file are adapted to the 9.0 branch [LUCENE-10100] [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #11138: URL: https://github.com/apache/lucene/issues/11138#issuecomment-1790519853 Merged to 10.0 and 9.9.0. Thanks @xiaoshi2013! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-02 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1790521486 O, I forgot about this PR. When looking at the conflicts it looks like I need to redo at least the BytesRefHash/Pool code. We can use native order at all places where it is only

Re: [PR] LUCENE-10099: Add -Ptests.asyncprofile option. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #295: URL: https://github.com/apache/lucene/pull/295#issuecomment-1790521876 Oh how nice it would be to have async profiling out of the box in a Lucene clone @markrmiller! -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379938642 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +117,39 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] LUCENE-10086: Fix an AssertionError when KoreanTokenizer tries to backtrace from and to the same position [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #285: URL: https://github.com/apache/lucene/pull/285#issuecomment-1790522765 @jimczi it looks like this PR is close? A small comment, and some conflicts to resolve? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #266: URL: https://github.com/apache/lucene/pull/266#issuecomment-1790526613 @jpountz it looks like this one is super-close, and a nice improvement to TMP's default behavior? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1790534046 Hi @zacharymorn -- this change is awesome! The world of servers has rapidly become massively concurrent and Lucene has (generally) been slow to adopt it. I like this hardish switch to t

Re: [PR] LUCENE-10018 Introduce DocTermVectors in lieu of Fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #216: URL: https://github.com/apache/lucene/pull/216#issuecomment-1790537963 It looks like we are abandoning this idea -- too much new API surface area added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] LUCENE-10018 Introduce DocTermVectors in lieu of Fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #216: LUCENE-10018 Introduce DocTermVectors in lieu of Fields. URL: https://github.com/apache/lucene/pull/216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] LUCENE-8682: remove deprecated WordDelimiterFilter[Factory] classes [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #202: URL: https://github.com/apache/lucene/pull/202#issuecomment-1790540423 > but I don't think WordDelimiterGraphFilter is a full replacement for WordDelimiterFilter since it can't be used in conjunction with other filters that consume or produce graphs, like Sy

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #187: URL: https://github.com/apache/lucene/pull/187#issuecomment-1790542516 Thanks @asalamon74 -- looks like we shouldn't fix this in Lucene, but instead Solr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #187: LUCENE-10005: Improve AlreadyClosedException logging URL: https://github.com/apache/lucene/pull/187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] LUCENE-10001: Make CollectionTerminatedException handling in MultiCollector configurable [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #181: URL: https://github.com/apache/lucene/pull/181#issuecomment-1790547178 @gsmiller what should we do with this PR? Are you working on the alternative (wrapping?) approach? Should we close this PR and later open that approach? Or leave this one open...? Tha

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1790552171 Thanks @ChristophKaser and sorry for this very late reply! I like this idea -- Replication is so tricky to debug. This now has conflicts unfortunately -- do you want to refresh the PR t

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379960873 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] LUCENE-9869 allow for configuring a custom cache purge scheduler in Monitor (aka Luwak) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #99: URL: https://github.com/apache/lucene/pull/99#issuecomment-1790557226 This sounds reasonable to me @pawel-bugalski-dynatrace but I'm not familiar with Monitor/Luwak's code. It looks like there are conflicts -- is this PR still relevant? Thanks @pawel-bugals

Re: [PR] LUCENE-9798 : Fix looping bug when calculating full KNN results in KnnGraphTester [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #83: URL: https://github.com/apache/lucene/pull/83#issuecomment-1790559217 Thanks @nitirajrathore! This class has since moved to `luceneutil` I think? Do you know if this bug was resolved there? If not, could you maybe port this PR over to `luceneutil`? Thanks.

Re: [PR] Remove or repurpose obsolete JIRA tasks from release wizard [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11833: URL: https://github.com/apache/lucene/pull/11833#issuecomment-1790562631 Oooh thank you for the attention to detail here @msokolov! RM'ing a Lucene release is another rite-of-passage for each of us :) Since this PR was created there have been 4 more

Re: [PR] NeighborArray is now fixed size [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1790565433 Wow, lots of fun discussion here, including specifics of how Java conditionals are evaluated. @msokolov is this still relevant? The HNSW code has been red-hot lately; maybe this cha

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11855: URL: https://github.com/apache/lucene/pull/11855#issuecomment-1790569350 Since 1) this looks like a great cleanup, 2) it's been approved, 3) it was already merged in Solr (thanks @janhoy for bringing to Lucene's release wizard too!), and 4) no conflicts er

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #11855: URL: https://github.com/apache/lucene/pull/11855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Luke web interface [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #11851: URL: https://github.com/apache/lucene/issues/11851#issuecomment-1790574821 > the Swing UI made me feel like I had stepped into a car with Marty McFly HA! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] NeighborArray is now fixed size [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1790576576 Thanks @msokolov. This looks like a nice tool, helpful for giving demos of cool Lucene features at conferences, but it looks like consensus is we should not add it to Lucene? Maybe

Re: [PR] Luke Webapp [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11852: Luke Webapp URL: https://github.com/apache/lucene/pull/11852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issu

Re: [PR] Luke Webapp [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1790579836 Thanks @msokolov. This looks like a nice tool, helpful for giving demos of cool Lucene features at conferences, but it looks like consensus is we should not add it to Lucene? Maybe lu

Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #907: URL: https://github.com/apache/lucene/pull/907#issuecomment-1790591979 Thank you for persisting so hard on this one @shahrs87 -- I'm sorry it looks like we should close it at this point, but your efforts / iterations were needed to see that we are mostly exc

Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #907: LUCENE-10357 Ghost fields and postings/points URL: https://github.com/apache/lucene/pull/907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Speed up sorting on unique string fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11903: URL: https://github.com/apache/lucene/pull/11903#issuecomment-1790595404 > @mikemccand Merging this PR will require regolding nightly benchmarks. Does it help if you can control when the PR gets merged? Oh no, I failed to reply to this, until now! N

Re: [PR] Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11934: URL: https://github.com/apache/lucene/pull/11934#issuecomment-1790597593 Looks like we have since removed `IOContext` from `openChecksumInput` since such an `IndexInput` must always be `READONCE` anyways. -- This is an automated message from the Apache G

Re: [PR] Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11934: Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. URL: https://github.com/apache/lucene/pull/11934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790599704 +1 I fell a bit into a trap by trying to make long shared prefixes less adversarial. Let's do progress over perfection and start with a simple approach and look into whether/how we can bet

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790603130 For reference, it should speed up: - OrdinalMap construction - Merging of terms in the inverted index - Merging of terms in doc values (as a side-effect of the OrdinalMap speedu

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380001349 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380001349 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #11847: URL: https://github.com/apache/lucene/pull/11847#discussion_r1380002867 ## lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94FieldInfosFormat.java: ## @@ -145,8 +145,10 @@ public FieldInfos read( // previous field'

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-02 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1790605861 Thanks @mikemccand and yes, the codec version bump is the majority of this change :D -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1790607699 It looks like there are strong objections to sharing string instances here, and there is a JVM command-line flag that may achieve similar gains for many indices X segments X fields so

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11847: Add a method allowing canonical strings to be returned from DataInput URL: https://github.com/apache/lucene/pull/11847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Remove synchronization from OpenNLP integration and add thread-safety tests(checkRandomData) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11955: URL: https://github.com/apache/lucene/pull/11955#issuecomment-1790613216 It looks like this is ready to be merged @rmuir? open-nlp may have thread safety issues but 1) Lucene should not work around those bugs, and 2) the user (of open-nlp tokenizers in Lu

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380016895 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -24,8 +24,14 @@ @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUn

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380017854 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -56,84 +62,72 @@ public void init() { } @Benchmark - @Fork(valu

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380018590 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -56,84 +62,72 @@ public void init() { } @Benchmark - @Fork(valu

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #319: URL: https://github.com/apache/lucene/pull/319#issuecomment-1790626991 It's still relevant but I'm not comfortable with the fact that it's a bit fragile. I'll close for now and think more about it. -- This is an automated message from the Apache Git Service.

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
jpountz closed pull request #319: LUCENE-10121: More skipping in WANDScorer. URL: https://github.com/apache/lucene/pull/319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] LUCENE-10133: Specialize the write path for sorted doc values. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #330: URL: https://github.com/apache/lucene/pull/330#issuecomment-1790631133 Yes we do! I'll look into moving this forward... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Remove unnecessary sort in writeFieldUpdates [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12273: URL: https://github.com/apache/lucene/pull/12273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1380023598 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [PR] Remove unnecessary sort in writeFieldUpdates [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12273: URL: https://github.com/apache/lucene/pull/12273#issuecomment-1790634871 Merged & backported to 9.9.0. Sorry for the long delay @luyuncheng! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[I] Explore partially decoding blocks (within-block skipping) [lucene]

2023-11-02 Thread via GitHub
slow-J opened a new issue, #12749: URL: https://github.com/apache/lucene/issues/12749 ### Description Idea from @mikemccand 's comment in https://github.com/apache/lucene/issues/12696#issuecomment-1770461719 ``` Another exciting optimization such a "patch-less" encoding coul

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12277: URL: https://github.com/apache/lucene/pull/12277#issuecomment-1790637499 Thanks for incorporating @rmuir's feedback @tang-hi! The change looks great to me: we catch an invalid usage and throw a clean exception in that case. I'll merge! Sorry for the lon

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12277: URL: https://github.com/apache/lucene/pull/12277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-11-02 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1790638953 > Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). > > Today, our skipper is forced to align t

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12277: URL: https://github.com/apache/lucene/pull/12277#issuecomment-1790640840 Merged to 10.0 and 9.9.0. Thanks @tang-hi! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1790653367 In general it's great for Lucene devs to use the new language features we gain by setting a minimum Java version. This is (part of?) why we have such minimums! This nice `inst

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-02 Thread via GitHub
s1monw commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1790655491 @robro612 please subscribe to the [dev list](https://lucene.apache.org/core/discussion.html#developer-discussion-devluceneapacheorg) and post your question there. We are more than ha

Re: [PR] Improve error message if codec not found. This fixes #12300 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12301: URL: https://github.com/apache/lucene/pull/12301#issuecomment-1790672389 @gus-asf -- looks like this one is close? @uschindler had one more small feeback (isolate the one line that requires suppression to its own method so we don't suppress more than we n

  1   2   >