Re: [PR] Add getter for SynonymQuery#field [lucene]

2024-02-08 Thread via GitHub
AndreyBozhko commented on PR #13077: URL: https://github.com/apache/lucene/pull/13077#issuecomment-1935333285 @dungba88 If all looks good, please feel free to merge this at your earliest convenience (as I don't have access to do so myself). -- This is an automated message from the Apache

Re: [PR] Prevent humongous allocations when calculating scalar quantiles [lucene]

2024-02-08 Thread via GitHub
benwtrent merged PR #13090: URL: https://github.com/apache/lucene/pull/13090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Contributing a deep-learning, BERT-based analyzer [lucene]

2024-02-08 Thread via GitHub
chatman commented on issue #13065: URL: https://github.com/apache/lucene/issues/13065#issuecomment-1934843930 How about something with the source maintained in the sandbox dir (along with instructions to build), but no corresponding official release artifact? On Fri, 9 Feb, 2024, 1:

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
dweiss commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483512492 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { priva

Re: [PR] upgrade to OpenNLP 2.3.2 [lucene]

2024-02-08 Thread via GitHub
dweiss commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1483504210 ## lucene/licenses/opennlp-tools-NOTICE.txt: ## @@ -1,11 +1,101 @@ Apache OpenNLP -Copyright 2017 The Apache Software Foundation +Copyright 2021-2023 The Apache Softwa

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
sabi0 commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483496509 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { privat

Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-08 Thread via GitHub
dweiss commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1934784006 > > > Thanks @gokaai -- I'll try to review soon! > > > If possible please try not to force-push: it removes the history of the past commits and makes it harder to see what changed on th

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
dweiss commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483384841 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { priva

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
sabi0 commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483283018 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { privat

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483269901 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { p

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483268388 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { p

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on code in PR #13068: URL: https://github.com/apache/lucene/pull/13068#discussion_r1483266841 ## lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java: ## @@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] bytes) { p

Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-08 Thread via GitHub
sabi0 commented on PR #13068: URL: https://github.com/apache/lucene/pull/13068#issuecomment-1934260844 Thank you @cpoerschke! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Moving quantization logic to make future quantizer work simpler [lucene]

2024-02-08 Thread via GitHub
benwtrent merged PR #13091: URL: https://github.com/apache/lucene/pull/13091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Prevent humongous allocations when calculating scalar quantiles [lucene]

2024-02-08 Thread via GitHub
benwtrent commented on code in PR #13090: URL: https://github.com/apache/lucene/pull/13090#discussion_r1483066398 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -281,24 +270,57 @@ static ScalarQuantizer fromVectors( } return new ScalarQu

Re: [PR] Prevent humongous allocations when calculating scalar quantiles [lucene]

2024-02-08 Thread via GitHub
mayya-sharipova commented on code in PR #13090: URL: https://github.com/apache/lucene/pull/13090#discussion_r1483044238 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -281,24 +270,57 @@ static ScalarQuantizer fromVectors( } return new Sc

Re: [I] Upgrade to OpenNLP 2.x and add [LUCENE-10621] [lucene]

2024-02-08 Thread via GitHub
cpoerschke commented on issue #11657: URL: https://github.com/apache/lucene/issues/11657#issuecomment-1934223088 Just noting that the _"add support for the new interface implementations in the OpenNLP analysis module that was added in #3973"_ part is not included in the scope of the #12674

Re: [PR] Prevent humongous allocations when calculating scalar quantiles [lucene]

2024-02-08 Thread via GitHub
mayya-sharipova commented on code in PR #13090: URL: https://github.com/apache/lucene/pull/13090#discussion_r1483044238 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -281,24 +270,57 @@ static ScalarQuantizer fromVectors( } return new Sc

[PR] Moving quantization logic to make future quantizer work simpler [lucene]

2024-02-08 Thread via GitHub
benwtrent opened a new pull request, #13091: URL: https://github.com/apache/lucene/pull/13091 While digging around and working on future scalar quantization work, I noticed that how these formats are is out of line with our other formats. Additionally, I am moving some things that were only

[PR] Prevent humongous allocations when calculating scalar quantiles [lucene]

2024-02-08 Thread via GitHub
benwtrent opened a new pull request, #13090: URL: https://github.com/apache/lucene/pull/13090 The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole. This commit adjusts

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-02-08 Thread via GitHub
dungba88 commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934080177 > We might instead just switch to off-heap building once the expected FST size crosses a threshold? We can use createTempOutput to make temporary files as needed for the non-root FSTs t

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934066710 > all the giant-baby FSTs Heh, this made me remember the awesome character, [Bôh](https://hero.fandom.com/wiki/B%C3%B4h), from the incredible movie [Spirited Away](https://en.w

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1934057526 `BlockTree` is kinda crazy how it builds up the final FST: each little backwards-recursive chunk of term-space stores its subset of terms into a baby FST, and then on grouping N such

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-02-08 Thread via GitHub
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1407985494 ## lucene/licenses/opennlp-tools-NOTICE.txt: ## @@ -1,11 +1,101 @@ Apache OpenNLP Review Comment: https://github.com/apache/opennlp/blob/opennlp-2.3.2/NOTICE

Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1934021978 > I'm not sure why FSTPostingsFormat is different from the rest, that it write both the metadata and data to the same file. I think writing to separate files would be cleaner and more

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-02-08 Thread via GitHub
cpoerschke commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1934017872 > FYI 2.3.1 was just released. 2.3.2 just released -- https://opennlp.apache.org/news/release-232.html -- I'll update this PR -- This is an automated message from the Apache

Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-08 Thread via GitHub
mikemccand commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1933954630 > > Thanks @gokaai -- I'll try to review soon! > > If possible please try not to force-push: it removes the history of the past commits and makes it harder to see what changed on th

Re: [I] Index custom ordinal data in the taxonomy [lucene]

2024-02-08 Thread via GitHub
stefanvodita closed issue #12336: Index custom ordinal data in the taxonomy URL: https://github.com/apache/lucene/issues/12336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Index arbitrary fields in taxonomy docs [lucene]

2024-02-08 Thread via GitHub
stefanvodita merged PR #12337: URL: https://github.com/apache/lucene/pull/12337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] Contributing a deep-learning, BERT-based analyzer [lucene]

2024-02-08 Thread via GitHub
lmessinger commented on issue #13065: URL: https://github.com/apache/lucene/issues/13065#issuecomment-1933785298 Hi, Got it. Pointing to the project from the documentation would actually be very valuable to the Hebrew community. How can that be done? is the documentation also on

Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-08 Thread via GitHub
gokaai commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1933683163 > Thanks @gokaai -- I'll try to review soon! > > If possible please try not to force-push: it removes the history of the past commits and makes it harder to see what changed on this

[PR] Fix test failure for TestDocumentsWriterDeleteQueue.testUpdateDeleteSlices [lucene]

2024-02-08 Thread via GitHub
easyice opened a new pull request, #13089: URL: https://github.com/apache/lucene/pull/13089 Some years ago, there was a patch for `TestDocumentsWriterDeleteQueue.testUpdateDeleteSlices` see https://issues.apache.org/jira/browse/LUCENE-4066, At the time, `DocumentsWriterDeleteQueue#num

Re: [PR] in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call [lucene]

2024-02-08 Thread via GitHub
dweiss commented on PR #13032: URL: https://github.com/apache/lucene/pull/13032#issuecomment-1933543878 Thank you, @cpoerschke and sorry for the delay (I guess the automatic nagging bot is working though). -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call [lucene]

2024-02-08 Thread via GitHub
dweiss merged PR #13032: URL: https://github.com/apache/lucene/pull/13032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac