Re: [PR] Random access term dictionary [lucene]

2023-12-14 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857417012 Here is the even more interesting stuff. After all those allocation optimizations. I also implemented the on-paper more "efficient" algorithm to intersect FST and FSA for Terms.intersect(

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-14 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1427647228 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -60,15 +60,13 @@ public JapaneseHiraganaUppercaseFilter(

Re: [PR] Random access term dictionary [lucene]

2023-12-14 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857380213 ## Non-trivial amount of allocations for? building IndexInput slice descriptions !? `jdk.internal.misc.Unsafe#allocateUninitializedArray()`. This was not trivial to find out

Re: [I] Where should we stream FST to disk directly? [lucene]

2023-12-14 Thread via GitHub
dungba88 commented on issue #12902: URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857375041 I realized FSTPostingsFormat is an experimental one, which is only being used in 5 places! Those Lucene9xPostingsFormat seem to be active ones, which in turn use `Lucene90BlockTree

Re: [PR] Random access term dictionary [lucene]

2023-12-14 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857371557 Since the first working version, I iterated with a list of profiling-guided allocation optimizations, as they stood out quite obviously from the merged JFR reports (thanks to luceneutil !

Re: [I] Where should we stream FST to disk directly? [lucene]

2023-12-14 Thread via GitHub
dungba88 commented on issue #12902: URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857304704 I just briefly looked at the code, but it seems `FSTTermsWriter` will write the field metadata (number of terms, term freq, doc freq, etc), FST metadata, and FST main body for each

Re: [I] Where should we stream FST to disk directly? [lucene]

2023-12-14 Thread via GitHub
dungba88 commented on issue #12902: URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857254522 A candidate could be the `FSTTermsWriter`, which can help building FSTPostingsFormat with much less heap size. -- This is an automated message from the Apache Git Service. To res

Re: [I] Try out a tantivy's term dictionary format [lucene]

2023-12-14 Thread via GitHub
dungba88 commented on issue #12513: URL: https://github.com/apache/lucene/issues/12513#issuecomment-1857252458 I'm still consuming this thread, pardon me if I ask something that's already discussed. > Yes, I actually tried to use FSTPostingsFormat in the benchmarks game and I had to

Re: [I] Should reseting a ByteBlockPool zero out the buffers? [lucene]

2023-12-14 Thread via GitHub
stefanvodita commented on issue #12734: URL: https://github.com/apache/lucene/issues/12734#issuecomment-1856645411 Both the follow-up PRs are merged. I don't think it's worth pursuing this further. Closing. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Should reseting a ByteBlockPool zero out the buffers? [lucene]

2023-12-14 Thread via GitHub
stefanvodita closed issue #12734: Should reseting a ByteBlockPool zero out the buffers? URL: https://github.com/apache/lucene/issues/12734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Test failure in TestHnswFloatVectorGraph [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on issue #12945: URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856557362 Bumping the searched vectors to 70 from 60 makes the test pass, but this still bugs be a bit as that commit shouldn't have changed any behavior... -- This is an automated messag

Re: [I] Test failure in TestHnswFloatVectorGraph [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on issue #12945: URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856480501 This is interesting, that commit shouldn't have changed anything, just a refactor. I have confirmed I can repeat it (after several attempts), but cannot when going to the

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
benwtrent closed issue #12940: Test failure in TestKnnGraph.testMultiThreadedSearch URL: https://github.com/apache/lucene/issues/12940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
benwtrent merged PR #12943: URL: https://github.com/apache/lucene/pull/12943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
dweiss commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856319652 +1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Use group-varint encode the positions [lucene]

2023-12-14 Thread via GitHub
easyice commented on PR #12842: URL: https://github.com/apache/lucene/pull/12842#issuecomment-1856293777 Sorry for the late update! i spent some more time on other PR, i encoded the positions with group-varint when `storeOffsets` is false and there are no payloads. with the last commit, it

Re: [I] Add a MergePolicy wrapper that preserves search concurrency? [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on issue #12877: URL: https://github.com/apache/lucene/issues/12877#issuecomment-1856109370 > > am I making this up? > > Ha! No, you are not hallucinating @jpountz! We do have something like this for Amazon product search -- it's crucial for our usage to keep long

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1426907008 ## lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java: ## @@ -389,13 +389,39 @@ private static void parseSegmentInfos( } long totalDocs = 0;

Re: [I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
rmuir commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856094262 we can still ban it and just use `@SuppressWarnings` before SleepingLockWrapper or any other exceptional cases? It prevents any new sleeps from creeping in without someone thinking tw

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1426901678 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -957,6 +974,9 @@ private Status.SegmentInfoStatus testSegment( SegmentReader reader = nu

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12875: URL: https://github.com/apache/lucene/pull/12875#issuecomment-1856089585 > I am looking at `TestUnifiedHighlighter*` tests. Does it mean that I need to use specific fieldType? Can I use any fieldType(s) from existing `UHTestHelper.parametersFactoryList()`?

[PR] Replace usage of deprecated size() with length() in ByteBuffersDataInput [lucene]

2023-12-14 Thread via GitHub
easyice opened a new pull request, #12948: URL: https://github.com/apache/lucene/pull/12948 In https://github.com/apache/lucene/pull/12594, we mark `ByteBuffersDataInput#size()` as `Deprecated`. For simplicity, maybe we should replace the usage of deprecated `size()` with `length()` ? -

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1426891701 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java: ## @@ -32,8 +32,8 @@ import org.apache.lucene.document.Field; im

Re: [I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856076075 +1 this does seem to be shaking out a lot of dust -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856074053 and `TimerThread` in `TimeLimitingCollector` has this historical artifact. Maybe it's time to clean up that TODO: public void run() { while (!stop) {

Re: [I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856070627 I agree with the idea, but we do have a lot of these now. EG SleepingLockWrapper, although I see this in its javadocs "this is not a good idea" LOL -- This is an automated messa

Re: [PR] Fix NPE on off-heap test and FST is null [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12894: URL: https://github.com/apache/lucene/pull/12894#discussion_r1426884836 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java: ## @@ -283,14 +283,17 @@ public FST doTest() throws IOException { } }

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
msokolov commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426843989 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [I] Test failure in TestHnswFloatVectorGraph [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12945: URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856057317 Here, `git bisect` identifies [18bb826564bb16fde70bab3c06a167280b6cc632] Extract the hnsw graph merging from being part of the vector writer (#12657) as the commit where this test

[PR] Update int array growth calls [lucene]

2023-12-14 Thread via GitHub
stefanvodita opened a new pull request, #12947: URL: https://github.com/apache/lucene/pull/12947 `LSBRadicSorter.sort` doesn't need the buffer to preserve the data that was written to it for a previous sort. `TaskSequence` doesn't need to grow arrays beyond the number of iterations i

Re: [I] Find and replace uses of unbounded array growth with `growInRange` [lucene]

2023-12-14 Thread via GitHub
stefanvodita commented on issue #12941: URL: https://github.com/apache/lucene/issues/12941#issuecomment-1856044036 I went through most of the calls to grow int arrays. There aren't a lot of places where there's an obvious way to improve, but I opened #12947 for the couple cases I spotted.

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12912: URL: https://github.com/apache/lucene/pull/12912#issuecomment-1856040733 > Solr used that, as solr is no longer part of our tree we could add sleep to fobiddenaps (maybe globally for both tests and main code). Maybe Lucene does not sleep in tests, so it co

[I] Can we ban `Thread.sleep`? [lucene]

2023-12-14 Thread via GitHub
mikemccand opened a new issue, #12946: URL: https://github.com/apache/lucene/issues/12946 ### Description Spinoff from #12912. `Thread.sleep` should ideally never appear in our main and test sources. Let's add it to forbidden APIs? -- This is an automated message from the A

Re: [PR] #12932: get monsters tests compiling/running again [lucene]

2023-12-14 Thread via GitHub
mikemccand merged PR #12942: URL: https://github.com/apache/lucene/pull/12942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[I] Test failure in TestHnswFloatVectorGraph [lucene]

2023-12-14 Thread via GitHub
msokolov opened a new issue, #12945: URL: https://github.com/apache/lucene/issues/12945 ### Description ./gradlew :lucene:core:test --tests "org.apache.lucene.util.hnsw.TestHnswFloatVectorGraph.testSortedAndUnsortedIndicesReturnSameResults" -Ptests.jvms=4 -Ptests.jvmargs= -Ptests

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
msokolov commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426843989 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426839005 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
msokolov commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426817545 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855974696 > I can see that in this test run we are using a quantizing scorer, but I don't think the test case explicitly calls for that. I wonder if we beefed up the test framework to rando

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426804219 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
msokolov commented on code in PR #12943: URL: https://github.com/apache/lucene/pull/12943#discussion_r1426801907 ## lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java: ## @@ -100,22 +100,18 @@ public KnnVectorsFormat getKnnVectorsFormatForField(String field) {

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855957479 ah, thanks @benwtrent I'll check your fix then -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855955234 I can see that in this test run we are using a quantizing scorer, but I don't think the test case explicitly calls for that. I wonder if we beefed up the test framework to randomly

[PR] Improve exception handling for readLongs/readInts/readFloats in ByteBufferIndexInput [lucene]

2023-12-14 Thread via GitHub
easyice opened a new pull request, #12944: URL: https://github.com/apache/lucene/pull/12944 Currently, the `readLongs/readInts/readFloats` in `ByteBufferIndexInput` may throws `NullPointerException` when `IndexInput` is closed, The expected should be `AlreadyClosedException`. -- This

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855929384 https://github.com/apache/lucene/pull/12943 @msokolov LOL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855929679 We both figured it out at the same time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Fix flaky tests that are caused by small float vectors [lucene]

2023-12-14 Thread via GitHub
benwtrent opened a new pull request, #12943: URL: https://github.com/apache/lucene/pull/12943 While quantization generally works well, when the number of dimensions is tiny (just two like in our tests), and we are indexing a circle, and we have random merge policies, we can end up getting u

Re: [I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-14 Thread via GitHub
msokolov commented on issue #12940: URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855926010 Thanks @vsop-479, it reproduces for me as well, both on main and 9x branches. The same test passes on 9.8.0 release. I'll try `git bisect` ... and it blames this commit: [a

Re: [PR] Writing a HOWTO migrate codec version [lucene]

2023-12-14 Thread via GitHub
slow-J commented on code in PR #12930: URL: https://github.com/apache/lucene/pull/12930#discussion_r1426770295 ## dev-docs/codec-version-bump-howto.md: ## @@ -0,0 +1,74 @@ + + +# Lucene Codec Version Bump How-To Manual + +Changing the name of the codec in Lucene is required for

Re: [I] Grow arrays up to a given limit to avoid overallocation where possible [lucene]

2023-12-14 Thread via GitHub
zhaih closed issue #12839: Grow arrays up to a given limit to avoid overallocation where possible URL: https://github.com/apache/lucene/issues/12839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-14 Thread via GitHub
zhaih merged PR #12844: URL: https://github.com/apache/lucene/pull/12844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] #12932: get monsters tests compiling/running again [lucene]

2023-12-14 Thread via GitHub
benwtrent commented on code in PR #12942: URL: https://github.com/apache/lucene/pull/12942#discussion_r1426699287 ## lucene/core/src/test/org/apache/lucene/index/Test2BPoints.java: ## @@ -143,6 +143,6 @@ public void test2D() throws Exception { } private static Codec getC

Re: [PR] LUCENE-10475: Merge o.a.l.a.[ja|ko].util into o.a.l.a.[ja|ko].dict [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #772: URL: https://github.com/apache/lucene/pull/772#issuecomment-1855783693 @mocobeta hello! I hit conflicts backporting https://github.com/apache/lucene/issues/12911 because this PR was never backported to 9.x. Is there any reason not to backport? It lo

Re: [PR] Ensure Nori/Kuromoji shipped binary FST is the latest version [lucene]

2023-12-14 Thread via GitHub
mikemccand merged PR #12933: URL: https://github.com/apache/lucene/pull/12933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Require bundled FSTs to be on the current FST version [lucene]

2023-12-14 Thread via GitHub
mikemccand closed issue #12911: Require bundled FSTs to be on the current FST version URL: https://github.com/apache/lucene/issues/12911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855769298 I opened https://github.com/mikemccand/luceneutil/issues/252 to try to measure the performance change of `addDocument` N times vs `addDocuments` once. -- This is an automated messag

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855755782 One small observation here: one can use the `add/updateDocuments` API today with no intention of using those as doc blocks at search time, purely as an optimization over calling separ

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-14 Thread via GitHub
uschindler commented on PR #12912: URL: https://github.com/apache/lucene/pull/12912#issuecomment-1855739201 Solr used that, as solr is no longer part of our tree we could add sleep to fobiddenaps (maybe globally for both tests and main code). Maybe Lucene does not sleep in tests, so it coul

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1426637555 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2164,6 +2166,83 @@ public void testSortedIndex() throws

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12912: URL: https://github.com/apache/lucene/pull/12912#issuecomment-1855716194 ... but at least it lead to discovering a horrifying `Thread.sleep` in our test code! Can we ban `Thread.sleep` throughout our code? Or are there actually useful places for it

Re: [PR] Ensure Nori/Kuromoji shipped binary FST is the latest version [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12933: URL: https://github.com/apache/lucene/pull/12933#discussion_r1426611498 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1236,5 +1236,9 @@ public FSTMetadata( this.version = version; this.numBytes = numB

Re: [PR] Ensure Nori/Kuromoji shipped binary FST is the latest version [lucene]

2023-12-14 Thread via GitHub
uschindler commented on PR #12933: URL: https://github.com/apache/lucene/pull/12933#issuecomment-1855691742 > > The test is not the nicest looking thing, but I accept it, because it doesn't break classloading of resources. 👍 > > Ha! I take this a strong positive feedback @uschindler ;

Re: [PR] Ensure Nori/Kuromoji shipped binary FST is the latest version [lucene]

2023-12-14 Thread via GitHub
uschindler commented on code in PR #12933: URL: https://github.com/apache/lucene/pull/12933#discussion_r1426604242 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1236,5 +1236,9 @@ public FSTMetadata( this.version = version; this.numBytes = numB

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1426601556 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -60,15 +60,13 @@ public JapaneseHiraganaUppercaseFilt

Re: [I] Create a simple JMH benchmark to measure FST compilation / traversal times [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on issue #12884: URL: https://github.com/apache/lucene/issues/12884#issuecomment-1855669679 Thanks @dungba88. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Create a simple JMH benchmark to measure FST compilation / traversal times [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on issue #12884: URL: https://github.com/apache/lucene/issues/12884#issuecomment-1855669391 > I can look into this. Is this place https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark the correct path to add the benchmark, or i

Re: [PR] Modernize LineFileDocs. [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12929: URL: https://github.com/apache/lucene/pull/12929#issuecomment-1855665643 > @mikemccand luceneutil is better at remaining up-to-date with Lucene than Lucene itself :) [mikemccand/luceneutil@76ff349](https://github.com/mikemccand/luceneutil/commit/76ff349499

Re: [PR] Ensure Nori/Kuromoji shipped binary FST is the latest version [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12933: URL: https://github.com/apache/lucene/pull/12933#issuecomment-1855660806 > The test is not the nicest looking thing, but I accept it, because it doesn't break classloading of resources. 👍 Ha! I take this a strong positive feedback @uschindler ;)

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on issue #12931: URL: https://github.com/apache/lucene/issues/12931#issuecomment-1855650638 Thanks @uschindler and @rmuir! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] enable error-prone's DisableUnicodeInCode check [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on PR #12936: URL: https://github.com/apache/lucene/pull/12936#issuecomment-1855649826 Thanks @rmuir! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Should we clean up the few remaining references to `Lucene/Solr`? [lucene]

2023-12-14 Thread via GitHub
mikemccand closed issue #12934: Should we clean up the few remaining references to `Lucene/Solr`? URL: https://github.com/apache/lucene/issues/12934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Attempting to clean up some remaining Solr references [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on code in PR #12939: URL: https://github.com/apache/lucene/pull/12939#discussion_r1426567725 ## gradle/help.gradle: ## @@ -46,7 +46,7 @@ configure(rootProject) { help { doLast { println "" - println "This is an experimental Lucene/Solr g

Re: [PR] Attempting to clean up some remaining Solr references [lucene]

2023-12-14 Thread via GitHub
mikemccand merged PR #12939: URL: https://github.com/apache/lucene/pull/12939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Let's run our Monster tests, at least once? [lucene]

2023-12-14 Thread via GitHub
mikemccand commented on issue #12932: URL: https://github.com/apache/lucene/issues/12932#issuecomment-1855574409 > For me it fails when running: `./gradlew check -Ptests.heapsize=16g -Dtests.monster=true` with It fails for me too -- it's silly (trying to write to an arbitrary "example

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-14 Thread via GitHub
stefanvodita commented on PR #12844: URL: https://github.com/apache/lucene/pull/12844#issuecomment-1855460382 Done, thank you @zhaih! I've opened #12941 to replace other uses of the unbounded growth API. -- This is an automated message from the Apache Git Service. To respond to the messag