Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-13 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1855355717 After copy paste code to `MemorysegmentIndexInput`, they work well: java19: ``` GroupVIntBenchmark.benchMMapDirectoryInputs_readGroupVInt64 thrpt5 9

Re: [PR] Writing a HOWTO migrate codec version [lucene]

2023-12-13 Thread via GitHub
shubhamvishu commented on code in PR #12930: URL: https://github.com/apache/lucene/pull/12930#discussion_r1426159471 ## dev-docs/codec-version-bump-howto.md: ## @@ -0,0 +1,74 @@ + + +# Lucene Codec Version Bump How-To Manual + +Changing the name of the codec in Lucene is require

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-13 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1855104523 > Let's keep ByteBufferIndexInput (without s) as is. Maybe work on that later. We would need to figure out what is causing slowness here. Yes, it is very strange. I have spent some

[I] Test failure in TestKnnGraph.testMultiThreadedSearch [lucene]

2023-12-13 Thread via GitHub
vsop-479 opened a new issue, #12940: URL: https://github.com/apache/lucene/issues/12940 ### Description @msokolov Please take a look when you get a chance! ### Gradle command to reproduce ./gradlew test --tests TestKnnGraph.testMultiThreadedSearch -Dtests.seed=15C41D6B33

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-13 Thread via GitHub
zhaih commented on PR #12844: URL: https://github.com/apache/lucene/pull/12844#issuecomment-1854889857 @stefanvodita Could you move the change entry to 9.10? Then I can merge it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Fix bug where NFARunAutomaton#getTransition does not set Transition correctly [lucene]

2023-12-13 Thread via GitHub
zhaih commented on code in PR #12909: URL: https://github.com/apache/lucene/pull/12909#discussion_r1426010905 ## lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java: ## @@ -73,6 +73,37 @@ public void testWithRandomRegex() { } } + public void

Re: [PR] Refactor around NeighborArray [lucene]

2023-12-13 Thread via GitHub
zhaih merged PR #12910: URL: https://github.com/apache/lucene/pull/12910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-13 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1854877951 Let's keep ByteBufferIndexInput (without s) as is. Maybe work on that later. We would need to figure out what is causing slowness here. So revert the change and copy paste code

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-13 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1854874240 I checked: niofs indexinput still uses an on heap buffer. So I have no idea why it is slower for that case. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-13 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1854866407 > I agree with you, the getInt() seems more expensive than `MemorySegment`, i prefer to revert the change on `ByteBufferIndexInput`, then do the similar improve on java20, java19 (to

Re: [I] Let's run our Monster tests, at least once? [lucene]

2023-12-13 Thread via GitHub
gsmiller commented on issue #12932: URL: https://github.com/apache/lucene/issues/12932#issuecomment-1854692766 @slow-J I think you’re hitting a know issue on AWS ec2 hosts with that exception. Have you tried on a non-ec2 host? -- This is an automated message from the Apache Git Service. T

Re: [I] Let's run our Monster tests, at least once? [lucene]

2023-12-13 Thread via GitHub
slow-J commented on issue #12932: URL: https://github.com/apache/lucene/issues/12932#issuecomment-1854630320 Is there a specific method to follow for running `Test20NewsgroupsClassification`? For me it fails when running: `./gradlew check -Ptests.heapsize=16g -Dtests.monster=true` with

Re: [PR] Attempting to clean up some remaining Solr references [lucene]

2023-12-13 Thread via GitHub
slow-J commented on PR #12939: URL: https://github.com/apache/lucene/pull/12939#issuecomment-1854550113 > Looks good to me. If anything removed is used by Solr then it's perhaps the highest time to tweak it over downstream. Thanks for the review @dweiss! -- This is an automated mes

Re: [PR] Fixing some potential cases of null value dereference (committing 8 year old patch) [lucene]

2023-12-13 Thread via GitHub
slow-J commented on PR #12935: URL: https://github.com/apache/lucene/pull/12935#issuecomment-1854467696 Cancelling as per the discussion in https://github.com/apache/lucene/issues/7721#issuecomment-1854032554 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Fixing some potential cases of null value dereference (committing 8 year old patch) [lucene]

2023-12-13 Thread via GitHub
slow-J closed pull request #12935: Fixing some potential cases of null value dereference (committing 8 year old patch) URL: https://github.com/apache/lucene/pull/12935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Null value dereference [LUCENE-6663] [lucene]

2023-12-13 Thread via GitHub
msokolov commented on issue #7721: URL: https://github.com/apache/lucene/issues/7721#issuecomment-1854464015 thanks for the ping, I explained why we won't do this above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Null value dereference [LUCENE-6663] [lucene]

2023-12-13 Thread via GitHub
msokolov closed issue #7721: Null value dereference [LUCENE-6663] URL: https://github.com/apache/lucene/issues/7721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] Null value dereference [LUCENE-6663] [lucene]

2023-12-13 Thread via GitHub
msokolov closed issue #7721: Null value dereference [LUCENE-6663] URL: https://github.com/apache/lucene/issues/7721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Attempting to clean up some remaining Solr references [lucene]

2023-12-13 Thread via GitHub
dweiss commented on code in PR #12939: URL: https://github.com/apache/lucene/pull/12939#discussion_r1425698715 ## gradle/help.gradle: ## @@ -46,7 +46,7 @@ configure(rootProject) { help { doLast { println "" - println "This is an experimental Lucene/Solr gradl

Re: [I] Null value dereference [LUCENE-6663] [lucene]

2023-12-13 Thread via GitHub
slow-J commented on issue #7721: URL: https://github.com/apache/lucene/issues/7721#issuecomment-1854443688 > If this has been open for 8 years, yet we have never had an NPE resulting from missing these checks, then I think we can concluded the checks are not needed. As a rule, we only need

[PR] Attempting to clean up some remaining Solr references [lucene]

2023-12-13 Thread via GitHub
slow-J opened a new pull request, #12939: URL: https://github.com/apache/lucene/pull/12939 Cleaned up some old references to Solr in the codebase, including references to some Solr packages and directories in tests. Please let me know if these changes would affect anything in Solr itself?

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-13 Thread via GitHub
uschindler commented on issue #12931: URL: https://github.com/apache/lucene/issues/12931#issuecomment-1854281422 To give a bit of context: I would also add ZWNJ (and if I do, 3 tests trigger the validation violation). But Robert does not like it because he wants to write tests without escap

Re: [PR] Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files [lucene]

2023-12-13 Thread via GitHub
uschindler merged PR #12937: URL: https://github.com/apache/lucene/pull/12937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2023-12-13 Thread via GitHub
ljak opened a new pull request, #12938: URL: https://github.com/apache/lucene/pull/12938 ### Description At [Lexum](https://lexum.com/en/), we deploy Solr with a slightly modified version of the UnifiedHighlighter: the `FunctionQuery` type is added as a knownLeaf to the `QueryVisitor

Re: [I] Reproducible failure in TestFloatVectorSimilarityQuery.testApproximate [lucene]

2023-12-13 Thread via GitHub
benwtrent closed issue #12921: Reproducible failure in TestFloatVectorSimilarityQuery.testApproximate URL: https://github.com/apache/lucene/issues/12921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Reproducible failure in TestFloatVectorSimilarityQuery.testApproximate [lucene]

2023-12-13 Thread via GitHub
benwtrent commented on issue #12921: URL: https://github.com/apache/lucene/issues/12921#issuecomment-1854147565 Fixed via: https://github.com/apache/lucene/pull/12922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix failing BaseVectorSimilarityQueryTestCase#testApproximate [lucene]

2023-12-13 Thread via GitHub
benwtrent merged PR #12922: URL: https://github.com/apache/lucene/pull/12922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Improve bytes copy in NodeHash [lucene]

2023-12-13 Thread via GitHub
dungba88 closed issue #12760: Improve bytes copy in NodeHash URL: https://github.com/apache/lucene/issues/12760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-13 Thread via GitHub
uschindler commented on issue #12931: URL: https://github.com/apache/lucene/issues/12931#issuecomment-1853992568 I will merge the other PR soon, will just make reporting better. The other PR also detects invalid UTF-8. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files [lucene]

2023-12-13 Thread via GitHub
uschindler commented on code in PR #12937: URL: https://github.com/apache/lucene/pull/12937#discussion_r1425402899 ## lucene/queryparser/docs/xml/cctree.js: ## @@ -1,16 +1,16 @@ /* This code is based on the one originally provided by - Geir Landr� in his dTree 2.05 package. Y

Re: [PR] Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files [lucene]

2023-12-13 Thread via GitHub
uschindler commented on PR #12937: URL: https://github.com/apache/lucene/pull/12937#issuecomment-1853977504 Thanks. I will merge this later to 9.x and main. I'd like to make the reporting a bit better. When a regex matches I'd like to show the fragment around (similar to highlighting).

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-13 Thread via GitHub
rmuir commented on issue #12931: URL: https://github.com/apache/lucene/issues/12931#issuecomment-1853928042 i'm ok with the ZWSP and BOM marker. just not other characters that one might consider invisible such as ZWJ. i want the tests to use human readable strings. Sorry, I'll be sticking t

Re: [PR] Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files [lucene]

2023-12-13 Thread via GitHub
rmuir commented on code in PR #12937: URL: https://github.com/apache/lucene/pull/12937#discussion_r1425354197 ## lucene/queryparser/docs/xml/cctree.js: ## @@ -1,16 +1,16 @@ /* This code is based on the one originally provided by - Geir Landr� in his dTree 2.05 package. You ca

Re: [PR] Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files [lucene]

2023-12-13 Thread via GitHub
rmuir commented on code in PR #12937: URL: https://github.com/apache/lucene/pull/12937#discussion_r1425346463 ## lucene/queryparser/docs/xml/cctree.js: ## @@ -1,16 +1,16 @@ /* This code is based on the one originally provided by - Geir Landr� in his dTree 2.05 package. You ca

Re: [PR] enable error-prone's DisableUnicodeInCode check [lucene]

2023-12-13 Thread via GitHub
rmuir merged PR #12936: URL: https://github.com/apache/lucene/pull/12936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-13 Thread via GitHub
rmuir closed issue #12931: Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources URL: https://github.com/apache/lucene/issues/12931 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Modernize LineFileDocs. [lucene]

2023-12-13 Thread via GitHub
jpountz commented on PR #12929: URL: https://github.com/apache/lucene/pull/12929#issuecomment-1853880383 @mikemccand luceneutil is better at remaining up-to-date with Lucene than Lucene itself :) https://github.com/mikemccand/luceneutil/commit/76ff349499fe6226c9ea74c37dd2fa9db3a46272 http

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-13 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421180794 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public JapaneseReadingFormFilter(TokenStrea

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-13 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421172943 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public JapaneseReadingFormFilter(TokenStrea

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-13 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1425322578 ## lucene/analysis/kuromoji/src/test/org/apache/lucene/analysis/ja/TestJapaneseReadingFormFilter.java: ## @@ -88,6 +88,11 @@ protected TokenStreamComponents createCo

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-13 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1425312895 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,38 @@ public JapaneseReadingFormFilter(TokenStrea

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-13 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r142534 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,38 @@ public JapaneseReadingFormFilter(TokenStrea

Re: [PR] Simple patch to prevent the common zero-width code points in our source and some types of resource files [lucene]

2023-12-13 Thread via GitHub
uschindler commented on PR #12937: URL: https://github.com/apache/lucene/pull/12937#issuecomment-1853848495 Hi, I further improved the source patterns checker and found another violation: ``` Execution failed for task ':lucene:queryparser:validateSourcePatterns'. > Found 2 sourc

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-13 Thread via GitHub
dungba88 commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1853744401 Besides the optimization of manipulating the internal byte[] directly, I think this is good to go. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Build should statically detect when an invisible unicode character, such as U+200B (zero width space), sneak into our sources [lucene]

2023-12-13 Thread via GitHub
uschindler commented on issue #12931: URL: https://github.com/apache/lucene/issues/12931#issuecomment-1853466501 Hi, here is my simplest one to just prevent 2 unicode characters: #12937 In `validateSourcePatterns` we have more of those. It detected also the violation I introduced y

[PR] Simple patch to prevent the common zero-width code points in our source and some types of resource files [lucene]

2023-12-13 Thread via GitHub
uschindler opened a new pull request, #12937: URL: https://github.com/apache/lucene/pull/12937 This would have prevented the problem with our gradle file. This is just another extension for `validateSourcePatterns` task. We have other bad codepoints already in there, so I'd suggest to add t

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-13 Thread via GitHub
ChrisHegarty closed issue #12895: Corruption read on term dictionaries in Lucene 9.9 URL: https://github.com/apache/lucene/issues/12895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-13 Thread via GitHub
ChrisHegarty commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1853461139 Closing as all work has been done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go