Re: [PR] Lucene-10070 [lucene]

2023-11-02 Thread via GitHub
goankur closed pull request #282: Lucene-10070 URL: https://github.com/apache/lucene/pull/282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-uns

Re: [PR] Lucene-10070 [lucene]

2023-11-02 Thread via GitHub
goankur commented on PR #282: URL: https://github.com/apache/lucene/pull/282#issuecomment-1791762753 > @goankur this can be closed out now right since you opened a separate PR for this change? Yep this is correct. I am closing this PR. -- This is an automated message from the Apach

Re: [PR] Use similarity.tf() in MoreLikeThis [lucene]

2023-11-02 Thread via GitHub
MarcusSorealheis commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1791704616 I can reawaken it and get it to closure. I need to carve out time on Sunday unless someone else picks it up. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Random access term dictionary [lucene]

2023-11-02 Thread via GitHub
nknize commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1380840508 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java: ## Review Comment: Looks like this is only used by tes

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-02 Thread via GitHub
dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791521460 In fact, I think it's this block in IW: ``` // close all the closeables we can (but important is readerPool and writeLock to prevent // leaks)

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-02 Thread via GitHub
dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791510733 This may be a legitimate bug somewhere. Maybe @mikemccand or @s1monw will know what the expected state here should be. -- This is an automated message from the Apache Git Service.

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-02 Thread via GitHub
dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791500183 You can reproduce this problem from the IDE as well: ``` -ea -Dtests.seed=4A059D04FCC8873 -Dtests.nightly=true -Dtests.multiplier=1 -Dtests.verbose=true ``` The last messa

Re: [PR] Improve error message if codec not found. This fixes #12300 [lucene]

2023-11-02 Thread via GitHub
gus-asf commented on PR #12301: URL: https://github.com/apache/lucene/pull/12301#issuecomment-1791479559 > Thanks @fsparv. heh whoops wrong browser instance ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-02 Thread via GitHub
dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1791471920 Well, this test is almost never "fast" for me... the conditions passed in Failure.eval are frequently called, but rarely hit the right call stack - this is particularly problematic w

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-02 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1791341717 No worries, I just wanted to merge in the benchmarking fixes so we can rely upon the results. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-02 Thread via GitHub
JarvisCraft commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1791269608 > can we keep explicit `== false` checks instead of less readable `!`? No, since javac only recognizes `(!(EXPR instanceof TYPE NAME))` -- This is an automated message from t

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-11-02 Thread via GitHub
Deepika0510 commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1790965164 Came across `SoftDeletesDirectoryReaderWrapper` where we have wrap [method](https://github.com/apache/lucene/blob/2d50c345fea3d1a64090d6d0cffef6b70d482a9f/lucene/core/src/java/org/apa

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-02 Thread via GitHub
iverase commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1791268281 can we keep explicit `== false` checks instead of less readable `!`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Remove or repurpose obsolete JIRA tasks from release wizard [lucene]

2023-11-02 Thread via GitHub
msokolov merged PR #11833: URL: https://github.com/apache/lucene/pull/11833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-02 Thread via GitHub
JarvisCraft commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1791256716 @mikemccand, thanks for the comments! I've undone the changes to `equals()` methods and applied the fix to the remaining fixable occurrences of the pattern. -- This is an au

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-02 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1791239215 I will try to work on the proposed PR tomorrow (or maybe later this evening). Sorry, I am very busy :-( -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
asfgit merged PR #12747: URL: https://github.com/apache/lucene/pull/12747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

2023-11-02 Thread via GitHub
lukas-vlcek opened a new pull request, #12750: URL: https://github.com/apache/lucene/pull/12750 … PathHierarchy tokenizer ### Description This PR is expected to fail. It demonstrates issue with `BaseTokenStreamTestCase.assertAnalyzesTo()` method in connection to `PathHierarchy

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1791127377 Hi @Deepika0510 -- what is the problem when callers access the leaves? Since you would subclass `FilterLeafReader` (which subclasses `LeafReader`) it should be fine to existing code?

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-02 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1791103958 Thanks @mikemccand for reminding me on this PR, and sorry for missing your question earlier @javanna ! This has totally fallen out of my radar. @javanna Looking at the codebase, it seems

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-11-02 Thread via GitHub
Deepika0510 commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1790951857 What I meant to ask is that after creating the `TimeoutLeafReader` class, how would we make sure that this wrapped class's object is used instead of any normal `LeafReader` instance?

Re: [PR] Improve error message if codec not found. This fixes #12300 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12301: URL: https://github.com/apache/lucene/pull/12301#issuecomment-1790965114 Thanks @fsparv. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-02 Thread via GitHub
s1monw commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1790887384 @gf2121 wild idea but would it make sense to build an automaton off these terms and intersect it? We could reuse it for multiple segments? I am not sure how big the costs are for tha

Re: [PR] Fix docFreq in score calculation after rewrite of boolean query consisting of blended query and boosted term query [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12354: URL: https://github.com/apache/lucene/pull/12354#issuecomment-1790844104 Thank you @rafalh! Query scores depending on `HashMap` iteration order is really awful. And thank you @stefanvodita for reviewing. @rafalh do you want to fold in the feedback maybe

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-02 Thread via GitHub
s1monw commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1790864645 @gf2121 do we have any numbers if it actually helps applying deletes? I think we can assume that we make use of `seekCeil` in the common case ie. all terms have the same field. I wou

Re: [I] Blended queries with boolean rewrite can result in inconsistent scores [LUCENE-9269] [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #10309: URL: https://github.com/apache/lucene/issues/10309#issuecomment-1790845238 > * while testing a solution for adding `perReaderTermState` to the current `TermQuery#equals` implementation, I found a test that I believe is not doing anything of what it was

Re: [PR] Fix comment on decode method in PForUtil [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12495: URL: https://github.com/apache/lucene/pull/12495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1790852894 @LuXugang @jpountz it looks like this PR went through some great discussions / iterations and was close towards the end, but it has accumulated some conflicts now? -- This is an au

Re: [PR] move CSVUtil to common from analyzer nori and kuromoji [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12390: URL: https://github.com/apache/lucene/pull/12390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Enable rank-unsafe optimizations for MAXSCORE/WAND. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12446: URL: https://github.com/apache/lucene/pull/12446#issuecomment-1790831094 Rank unsafe optimizations is a neat idea! It'd give another tool for maybe more smoothly trading cost for recall. -- This is an automated message from the Apache Git Service. To re

Re: [PR] move CSVUtil to common from analyzer nori and kuromoji [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12390: URL: https://github.com/apache/lucene/pull/12390#issuecomment-1790813760 I merged to main but there are quite a few conflicts on backport to 9.x -- any chance you could open a backport PR @twosom? Thanks! -- This is an automated message from the Apache

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1790817479 Thanks @heemin32 for taking the effort to bring the fix down to Lucene, from OpenSearch test failures. A dedicated Lucene unit test would be great. Maybe @nknize could help evaluate

Re: [I] Need to resolve the duplicate CSVUtil classes in analyzer Nori and Kuromoji [lucene]

2023-11-02 Thread via GitHub
mikemccand closed issue #12389: Need to resolve the duplicate CSVUtil classes in analyzer Nori and Kuromoji URL: https://github.com/apache/lucene/issues/12389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Improve error message if codec not found. This fixes #12300 [lucene]

2023-11-02 Thread via GitHub
fsparv commented on PR #12301: URL: https://github.com/apache/lucene/pull/12301#issuecomment-1790800773 Hmm, yeah I think I got busy and forgot about this. Will need to review again. Thx for the nudge. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
clayburn commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790798362 I agree with @jprinet that the PR should probably be closed just due to it's age. Many of the changes here deal with caching, with the Lucene project explicitly opts out of by default. If t

Re: [PR] Introduce the similarity as boost functionality to the Word2VecSynonyFilter [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #12433: URL: https://github.com/apache/lucene/pull/12433#discussion_r1380176295 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymFilter.java: ## @@ -62,14 +65,16 @@ public Word2VecSynonymFilter(

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] require that float vector components are smaller than 1E17 to prevent overflowing to Infinity [lucene]

2023-11-02 Thread via GitHub
msokolov closed pull request #12373: require that float vector components are smaller than 1E17 to prevent overflowing to Infinity URL: https://github.com/apache/lucene/pull/12373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] require that float vector components are smaller than 1E17 to prevent overflowing to Infinity [lucene]

2023-11-02 Thread via GitHub
msokolov commented on PR #12373: URL: https://github.com/apache/lucene/pull/12373#issuecomment-1790767056 it's not clear that we need this limit and it seems somewhat complicated to maintain. I'm closing since we haven't seen any activity in quite a while and there's no consensus to impose

Re: [PR] DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1790763295 Now that we are removing patching for the doc block encoding maybe vectorizing decode of these blocks is more palatable? -- This is an automated message from the Apache Git Service.

Re: [PR] LUCENE-10195: Improve Gradle build speed [lucene]

2023-11-02 Thread via GitHub
jprinet commented on PR #414: URL: https://github.com/apache/lucene/pull/414#issuecomment-1790708355 As far as I remember, all the relevant changes were integrated in: I think this one has been superseded by https://github.com/apache/lucene/pull/421 See the Jira issue for more cont

Re: [PR] Speed up sorting on unique string fields. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #11903: URL: https://github.com/apache/lucene/pull/11903#issuecomment-1790706658 @mikemccand You will need to regold before the next nightly run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Speed up sorting on unique string fields. [lucene]

2023-11-02 Thread via GitHub
jpountz merged PR #11903: URL: https://github.com/apache/lucene/pull/11903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add monster test that indexes 1M vectors [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11867: URL: https://github.com/apache/lucene/pull/11867#issuecomment-1790686276 I love this idea of a "high scale" KNN monster test! It can catch overflow exceptions that we otherwise miss, and @rmuir hit a spooky exception that might be just such an example? @

Re: [PR] Add Setter for vector Encoding in FieldType [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #12279: Add Setter for vector Encoding in FieldType URL: https://github.com/apache/lucene/pull/12279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Add Setter for vector Encoding in FieldType [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12279: URL: https://github.com/apache/lucene/pull/12279#issuecomment-1790681242 Thanks @naveentatikonda! It seems strange to set vector dimensions to 0, and it looks like `FieldType.setVectorAttribute` can otherwise be used to set the `vectorEncoding`. I'

Re: [PR] Speed up sorting on unique string fields. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #11903: URL: https://github.com/apache/lucene/pull/11903#issuecomment-1790680106 I confirmed that there is still a speedup: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Make MAX_DIMENSIONS configurable via a system property. [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #12306: Make MAX_DIMENSIONS configurable via a system property. URL: https://github.com/apache/lucene/pull/12306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Make MAX_DIMENSIONS configurable via a system property. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12306: URL: https://github.com/apache/lucene/pull/12306#issuecomment-1790675801 We've since enabled Codec to set the limit, which is very expert and I think a safer way to change the limit than a `sysprop`? So we can close this one? -- This is an automated mes

Re: [PR] Improve error message if codec not found. This fixes #12300 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12301: URL: https://github.com/apache/lucene/pull/12301#issuecomment-1790672389 @gus-asf -- looks like this one is close? @uschindler had one more small feeback (isolate the one line that requires suppression to its own method so we don't suppress more than we n

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-02 Thread via GitHub
s1monw commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1790655491 @robro612 please subscribe to the [dev list](https://lucene.apache.org/core/discussion.html#developer-discussion-devluceneapacheorg) and post your question there. We are more than ha

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1790653367 In general it's great for Lucene devs to use the new language features we gain by setting a minimum Java version. This is (part of?) why we have such minimums! This nice `inst

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12277: URL: https://github.com/apache/lucene/pull/12277#issuecomment-1790640840 Merged to 10.0 and 9.9.0. Thanks @tang-hi! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-11-02 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1790638953 > Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). > > Today, our skipper is forced to align t

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12277: URL: https://github.com/apache/lucene/pull/12277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] unify exception thrown by regexp & check repetition range [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12277: URL: https://github.com/apache/lucene/pull/12277#issuecomment-1790637499 Thanks for incorporating @rmuir's feedback @tang-hi! The change looks great to me: we catch an invalid usage and throw a clean exception in that case. I'll merge! Sorry for the lon

[I] Explore partially decoding blocks (within-block skipping) [lucene]

2023-11-02 Thread via GitHub
slow-J opened a new issue, #12749: URL: https://github.com/apache/lucene/issues/12749 ### Description Idea from @mikemccand 's comment in https://github.com/apache/lucene/issues/12696#issuecomment-1770461719 ``` Another exciting optimization such a "patch-less" encoding coul

Re: [PR] Remove unnecessary sort in writeFieldUpdates [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #12273: URL: https://github.com/apache/lucene/pull/12273#issuecomment-1790634871 Merged & backported to 9.9.0. Sorry for the long delay @luyuncheng! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1380023598 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [PR] Remove unnecessary sort in writeFieldUpdates [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #12273: URL: https://github.com/apache/lucene/pull/12273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] LUCENE-10133: Specialize the write path for sorted doc values. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #330: URL: https://github.com/apache/lucene/pull/330#issuecomment-1790631133 Yes we do! I'll look into moving this forward... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
jpountz closed pull request #319: LUCENE-10121: More skipping in WANDScorer. URL: https://github.com/apache/lucene/pull/319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] LUCENE-10121: More skipping in WANDScorer. [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #319: URL: https://github.com/apache/lucene/pull/319#issuecomment-1790626991 It's still relevant but I'm not comfortable with the fact that it's a bit fragile. I'll close for now and think more about it. -- This is an automated message from the Apache Git Service.

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380018590 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -56,84 +62,72 @@ public void init() { } @Benchmark - @Fork(valu

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380017854 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -56,84 +62,72 @@ public void init() { } @Benchmark - @Fork(valu

Re: [PR] stabilize vectorutil benchmark [lucene]

2023-11-02 Thread via GitHub
rmuir commented on code in PR #12747: URL: https://github.com/apache/lucene/pull/12747#discussion_r1380016895 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -24,8 +24,14 @@ @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUn

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380008658 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Remove synchronization from OpenNLP integration and add thread-safety tests(checkRandomData) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11955: URL: https://github.com/apache/lucene/pull/11955#issuecomment-1790613216 It looks like this is ready to be merged @rmuir? open-nlp may have thread safety issues but 1) Lucene should not work around those bugs, and 2) the user (of open-nlp tokenizers in Lu

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11847: Add a method allowing canonical strings to be returned from DataInput URL: https://github.com/apache/lucene/pull/11847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1790607699 It looks like there are strong objections to sharing string instances here, and there is a JVM command-line flag that may achieve similar gains for many indices X segments X fields so

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-02 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1790605861 Thanks @mikemccand and yes, the codec version bump is the majority of this change :D -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add a method allowing canonical strings to be returned from DataInput [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on code in PR #11847: URL: https://github.com/apache/lucene/pull/11847#discussion_r1380002867 ## lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94FieldInfosFormat.java: ## @@ -145,8 +145,10 @@ public FieldInfos read( // previous field'

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380001349 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1380001349 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790603130 For reference, it should speed up: - OrdinalMap construction - Merging of terms in the inverted index - Merging of terms in doc values (as a side-effect of the OrdinalMap speedu

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-02 Thread via GitHub
jpountz commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1790599704 +1 I fell a bit into a trap by trying to make long shared prefixes less adversarial. Let's do progress over perfection and start with a simple approach and look into whether/how we can bet

Re: [PR] Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11934: Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. URL: https://github.com/apache/lucene/pull/11934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Fix a few calls to `Directory#openChecksumInput` to pass the right `IOContext`. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11934: URL: https://github.com/apache/lucene/pull/11934#issuecomment-1790597593 Looks like we have since removed `IOContext` from `openChecksumInput` since such an `IndexInput` must always be `READONCE` anyways. -- This is an automated message from the Apache G

Re: [PR] Speed up sorting on unique string fields. [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11903: URL: https://github.com/apache/lucene/pull/11903#issuecomment-1790595404 > @mikemccand Merging this PR will require regolding nightly benchmarks. Does it help if you can control when the PR gets merged? Oh no, I failed to reply to this, until now! N

Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #907: LUCENE-10357 Ghost fields and postings/points URL: https://github.com/apache/lucene/pull/907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] LUCENE-10357 Ghost fields and postings/points [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #907: URL: https://github.com/apache/lucene/pull/907#issuecomment-1790591979 Thank you for persisting so hard on this one @shahrs87 -- I'm sorry it looks like we should close it at this point, but your efforts / iterations were needed to see that we are mostly exc

Re: [PR] Luke Webapp [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1790579836 Thanks @msokolov. This looks like a nice tool, helpful for giving demos of cool Lucene features at conferences, but it looks like consensus is we should not add it to Lucene? Maybe lu

Re: [PR] Luke Webapp [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #11852: Luke Webapp URL: https://github.com/apache/lucene/pull/11852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issu

Re: [PR] NeighborArray is now fixed size [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1790576576 Thanks @msokolov. This looks like a nice tool, helpful for giving demos of cool Lucene features at conferences, but it looks like consensus is we should not add it to Lucene? Maybe

Re: [I] Luke web interface [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on issue #11851: URL: https://github.com/apache/lucene/issues/11851#issuecomment-1790574821 > the Swing UI made me feel like I had stepped into a car with Marty McFly HA! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand merged PR #11855: URL: https://github.com/apache/lucene/pull/11855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] ReleaseWizard - Upgrade 'consolemenu' dependency to v0.7.1 [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11855: URL: https://github.com/apache/lucene/pull/11855#issuecomment-1790569350 Since 1) this looks like a great cleanup, 2) it's been approved, 3) it was already merged in Solr (thanks @janhoy for bringing to Lucene's release wizard too!), and 4) no conflicts er

Re: [PR] NeighborArray is now fixed size [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1790565433 Wow, lots of fun discussion here, including specifics of how Java conditionals are evaluated. @msokolov is this still relevant? The HNSW code has been red-hot lately; maybe this cha

Re: [PR] Remove or repurpose obsolete JIRA tasks from release wizard [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #11833: URL: https://github.com/apache/lucene/pull/11833#issuecomment-1790562631 Oooh thank you for the attention to detail here @msokolov! RM'ing a Lucene release is another rite-of-passage for each of us :) Since this PR was created there have been 4 more

Re: [PR] LUCENE-9798 : Fix looping bug when calculating full KNN results in KnnGraphTester [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #83: URL: https://github.com/apache/lucene/pull/83#issuecomment-1790559217 Thanks @nitirajrathore! This class has since moved to `luceneutil` I think? Do you know if this bug was resolved there? If not, could you maybe port this PR over to `luceneutil`? Thanks.

Re: [PR] LUCENE-9869 allow for configuring a custom cache purge scheduler in Monitor (aka Luwak) [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #99: URL: https://github.com/apache/lucene/pull/99#issuecomment-1790557226 This sounds reasonable to me @pawel-bugalski-dynatrace but I'm not familiar with Monitor/Luwak's code. It looks like there are conflicts -- is this PR still relevant? Thanks @pawel-bugals

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-02 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1379960873 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1790552171 Thanks @ChristophKaser and sorry for this very late reply! I like this idea -- Replication is so tricky to debug. This now has conflicts unfortunately -- do you want to refresh the PR t

Re: [PR] LUCENE-10001: Make CollectionTerminatedException handling in MultiCollector configurable [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #181: URL: https://github.com/apache/lucene/pull/181#issuecomment-1790547178 @gsmiller what should we do with this PR? Are you working on the alternative (wrapping?) approach? Should we close this PR and later open that approach? Or leave this one open...? Tha

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand closed pull request #187: LUCENE-10005: Improve AlreadyClosedException logging URL: https://github.com/apache/lucene/pull/187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] LUCENE-10005: Improve AlreadyClosedException logging [lucene]

2023-11-02 Thread via GitHub
mikemccand commented on PR #187: URL: https://github.com/apache/lucene/pull/187#issuecomment-1790542516 Thanks @asalamon74 -- looks like we shouldn't fix this in Lucene, but instead Solr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

  1   2   >