Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423575442 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1423547781 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java: ## @@ -89,6 +89,9 @@ final class IntersectTermsEnumFrame { final

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1851433406 Could we consider not changing `MemorySegmentIndexInput` for java 19 and java20? As a preview feature , it seems reasonable that we only do optimizations in higher versions, and they ar

[PR] Fix failing BaseVectorSimilarityQueryTestCase#testApproximate [lucene]

2023-12-11 Thread via GitHub
kaivalnp opened a new pull request, #12922: URL: https://github.com/apache/lucene/pull/12922 Discovered in #12921, and introduced in #12679 The first issue is that we weren't advancing the `VectorScorer` [here](https://github.com/apache/lucene/blob/cf13a9295052288b748ed8f279f05ee26f3

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1423505676 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java: ## @@ -89,6 +89,9 @@ final class IntersectTermsEnumFrame { final

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423470044 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423469570 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423469570 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423469570 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1423493276 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnum.java: ## @@ -198,6 +204,7 @@ private IntersectTermsEnumFrame pushFrame(int state)

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1423493276 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnum.java: ## @@ -198,6 +204,7 @@ private IntersectTermsEnumFrame pushFrame(int state)

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1851380728 I agree, I think it's a poor solution to something that is probably not a problem in the first place... -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423482123 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423470044 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423469570 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1851319229 > I have a better idea. Lets keep the 2 different method, but do another trick: With this great idea, the performance comes back! java21 ``` Benchmark

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1423403578 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { froz

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423402789 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423384044 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423383461 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,83 @@ +package org.apache.lucene.analysis.ja; +

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423382747 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423381320 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423380431 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423380431 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software F

[I] Reproducible failure in TestIndexWriter.testHasUncommittedChanges [lucene]

2023-12-11 Thread via GitHub
easyice opened a new issue, #12921: URL: https://github.com/apache/lucene/issues/12921 ### Description Seems to be related to a https://github.com/apache/lucene/pull/12679 ### Gradle command to reproduce ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestF

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1423374521 ## lucene/analysis/kuromoji/src/test/org/apache/lucene/analysis/ja/TestJapaneseReadingFormFilter.java: ## @@ -88,6 +88,11 @@ protected TokenStreamComponents createCom

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1423370558 ## lucene/analysis/kuromoji/src/test/org/apache/lucene/analysis/ja/TestJapaneseReadingFormFilter.java: ## @@ -88,6 +88,11 @@ protected TokenStreamComponents createCom

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1423365449 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,38 @@ public JapaneseReadingFormFilter(TokenStream

Re: [PR] Make TestDrillSideways#testCollectionTerminated less strict [lucene]

2023-12-11 Thread via GitHub
gsmiller merged PR #12920: URL: https://github.com/apache/lucene/pull/12920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-11 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1423317624 ## lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java: ## @@ -330,15 +330,36 @@ public static int[] growExact(int[] array, int newLength) { return copy;

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1851177452 Hi @mikemccand and @kojisekig, thank you for your reviews. I updated some codes along with the comments and add lines to module-info and resources to make `gradle check` green. -- T

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423277326 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseKatakanaUppercaseFilter.java: ## @@ -0,0 +1,83 @@ +package org.apache.lucene.analysis.ja; +

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423277099 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,65 @@ +package org.apache.lucene.analysis.ja; R

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1423277455 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,65 @@ +package org.apache.lucene.analysis.ja; +

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
kojisekig commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1851116639 From a Japanese perspective, the necessity sounds reasonable. Thank you for the contribution! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1851115440 Thanks for the detailed description, got it! looks better :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1851098019 Hi @easyice, I have a better idea. Lets keep the 2 different method, but do another trick: - The base class DataInput implements the public outer loop as a final implementation,

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-12-11 Thread via GitHub
epotyom commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1851065973 @mikemccand , > can you open a new issue for the followon tasks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-11 Thread via GitHub
epotyom commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1851062374 I see random test failures that could be related to this change: ``` > java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 123 > at

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1851052636 I don't understand the problem they'd like to solve. The DirectoryScanner class of ant is able to find those loops. Otherwise Gradle or Ant would have the same problem. Se

[I] Add Facets#getBulkSpecificValues [lucene]

2023-12-11 Thread via GitHub
epotyom opened a new issue, #12919: URL: https://github.com/apache/lucene/issues/12919 ### Description In #12180 we added TaxonomyReader#getBulkOrdinals method. Opening separate issue for 2 remaining tasks from #12180: 1. (#12862) Add `Facets#getBulkSpecificValues` method

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850924293 I asked Infra, here: https://issues.apache.org/jira/browse/INFRA-25269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850883618 On Policeman Jenkins I would be able to pass sysprops using the shell script, but that's non-standard. By defat it works with SSH automatically. ![Screenshot_20231211-2158

Re: [PR] Fix bug where NFARunAutomaton#getTransition does not set Transition correctly [lucene]

2023-12-11 Thread via GitHub
Tony-X commented on code in PR #12909: URL: https://github.com/apache/lucene/pull/12909#discussion_r1423089503 ## lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java: ## @@ -73,6 +73,37 @@ public void testWithRandomRegex() { } } + public voi

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850836944 > https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/7134/consoleText > > So, is the trick you found out applicable to Apache's CI? Could infra help us with that

Re: [I] JVM SIGSEGV crash when compiling computeCommonPrefixLengthAndBuildHistogram Lucene 9.9.0 [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty closed issue #12898: JVM SIGSEGV crash when compiling computeCommonPrefixLengthAndBuildHistogram Lucene 9.9.0 URL: https://github.com/apache/lucene/issues/12898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850813067 https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/7134/consoleText So, is the trick you found out applicable to Apache's CI? Could infra help us with that somehow?

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty merged PR #12903: URL: https://github.com/apache/lucene/pull/12903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty merged PR #12905: URL: https://github.com/apache/lucene/pull/12905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Allow FST builder to use different writer (alternative reverse BytesReader) [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12879: URL: https://github.com/apache/lucene/pull/12879#issuecomment-1850764585 Actually 300 -> 245 seconds is quite a performance gain. It seems FST's own store is quite a bit faster than the `ByteBuffer` backed store. I think we should make this change? I'll

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-11 Thread via GitHub
benwtrent merged PR #12679: URL: https://github.com/apache/lucene/pull/12679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-11 Thread via GitHub
gsmiller commented on PR #12880: URL: https://github.com/apache/lucene/pull/12880#issuecomment-1850668550 Went ahead and merged this on to main/branch_9x since I saw some support for this and no real blocking concerns. -- This is an automated message from the Apache Git Service. To respon

[I] Write a HOWTO migrate Codec format version [lucene]

2023-12-11 Thread via GitHub
slow-J opened a new issue, #12918: URL: https://github.com/apache/lucene/issues/12918 ### Description The change to changing PFOR encoding to FOR for doc blocks in https://github.com/apache/lucene/pull/12741, required bumping the Codec version from 95 to 99. This is not a stra

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-11 Thread via GitHub
gsmiller merged PR #12880: URL: https://github.com/apache/lucene/pull/12880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850602851 On Policeman Jenkins the stack traces are gone and the message, too @ Main, Mac, Win -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850587232 Otherwise I like the new code very much. In fact, this is a micro-benchmark. So the differences in speed won't be visible in query benchmarks. -- This is an automated message

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850576793 I figured out: The stack trace is only printed when it works remotely. The master agent does not print stack trace. Too bad. On Policeman Jenkins Windows and Jenkisn Mac logs you

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850567620 Looks like you can't pass sysprops to individual workers. There are different types of workers (I use shell script launcher as this works better with Virtualbox), but the default

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850559511 Let me wait on Policeman Jenkins how the global property affects the workers (they are still called slaves in the internals). There might be the option to ask special syspr

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850553512 This is a global option, you can only change it for the main jenkins. The workers are started remotely and get the same flag over the wire on startup. -- This is an automated m

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850545653 Thanks, Uwe! The problem is on ASF infrastructure - that's where I see those exceptions most in my emails/ log messages. I don't know if there is a way to tweak it just for Lucene wo

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850536421 But I can't change ASF Jenkins. In additio, Policeman Jenkins did not show a stack trace. It only had the message with 1 files. I have the feeling ASF jenkins is older versio

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850529839 Hi I found the problem, /etc/default/jenkins is no longer interpreted by jenkins systemd. I moved the stuff to override file (and deleted the defaults file). Now looks fine on Po

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850511554 Thank you @uschindler , thinking about that too, i will try to a second lambda tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz merged PR #12908: URL: https://github.com/apache/lucene/pull/12908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1422804214 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,65 @@ +package org.apache.lucene.analysis.ja;

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850500493 Hi, the problem of MMapDir is that the seek method has to update also the current block number. Maybe we pass a second lambda to update the position? Let's just try this out!

Re: [PR] Refactor around NeighborArray [lucene]

2023-12-11 Thread via GitHub
benwtrent commented on code in PR #12910: URL: https://github.com/apache/lucene/pull/12910#discussion_r1422799676 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -201,9 +225,69 @@ private int descSortFindRightMostInsertionPoint(float newScore, int

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850476822 Hi @dweiss: I am a bit confsed: On Policeman Jenkins the limit is already raised: ` -Dhudson.FilePath.VALIDATE_ANT_FILE_MASK_BOUND=6` This is in `/etc/default/jenkins`

Re: [I] IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs [lucene]

2023-12-11 Thread via GitHub
gsmiller closed issue #12558: IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs URL: https://github.com/apache/lucene/issues/12558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-12-11 Thread via GitHub
gsmiller closed issue #12418: Reproducible TestDrillSideways failure URL: https://github.com/apache/lucene/issues/12418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850470237 If you set the bound to `Integer#MAX_VALUE` then it uses default directory scanner without time-based limutations: https://github.com/jenkinsci/jenkins/blob/f9a777bc682963de46403

Re: [PR] Removing @lucene.experimental tags in testXXX methods in CheckIndex [lucene]

2023-12-11 Thread via GitHub
mikemccand merged PR #12893: URL: https://github.com/apache/lucene/pull/12893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850424855 The performance of the new approach seems regressed a bit more on java21_benchMMapDirectoryInputs_readGroupVInt, here is the difference of speed up relative to the baseline.

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850392837 I though changing the pattern to lucene/**/build/hs_err_pid* would help but there's way too many files/folders in there so it'll eventually hit that threshold anyway. Unless we set t

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1422705408 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnum.java: ## @@ -198,6 +204,7 @@ private IntersectTermsEnumFrame pushFrame(int st

[I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-11 Thread via GitHub
mikemccand opened a new issue, #12916: URL: https://github.com/apache/lucene/issues/12916 ### Description There is an exciting [upstream (OpenJ9) comment here](https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1846199023) (thank you @singh264), copied below: T

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422609478 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsumer

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850255060 > OK, let me try feeding LineFileDocs into this test case. FTR I will look into it, but it's probably best done in a follow-up PR rather than this one, let's merge this PR first?

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422599871 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsu

[PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque opened a new pull request, #12915: URL: https://github.com/apache/lucene/pull/12915 ### Description Sutegana (捨て仮名) is small letter of hiragana and katakana in Japanese. In the old Japanese text, sutegana (捨て仮名) is not used unlikely to modern one. For example: - "ストップウ

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422549348 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsumer

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422541990 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsu

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422538272 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -109,10 +109,20 @@ public enum INPUT_TYPE { // Increment version to change it private sta

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850181921 > Hmmm... I agree we can't expect `BasePostingsFormatTestCase` to catch all bw compat problems, but the `TestLucene90PostingsFormat` from this PR writes data in the 9.8 format of the

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850184681 OK, let me try feeding LineFileDocs into this test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Upgrade ECJ to 3.36.0 [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on PR #12888: URL: https://github.com/apache/lucene/pull/12888#issuecomment-1850178914 relates: #12753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-11 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1422523818 ## lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java: ## @@ -330,15 +330,36 @@ public static int[] growExact(int[] array, int newLength) { return c

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-11 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1422523360 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -757,6 +757,30 @@ public void testRamUsageEstimate() throws IOException { l

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422523889 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -109,10 +109,20 @@ public enum INPUT_TYPE { // Increment version to change it private static

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12900: URL: https://github.com/apache/lucene/pull/12900#issuecomment-1850151623 I've confirmed the new (failing) BWC test from #12901 now passes with this PR. I'll review ... -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422514730 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentInfos

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422507453 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422507453 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1850135758 > Actually I can un-@ignore at least in main. I'll go do that. D'oh! No, I cannot -- it will still fail in main, 9.x and 9.9.x until we get the fix (#12900) in. -- This

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850129755 Hmmm... I agree we can't expect `BasePostingsFormatTestCase` to catch all bw compat problems, but the `TestLucene90PostingsFormat` from this PR writes data in the 9.8 format of the terms

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1850129073 OK this is done -- I pushed the new BWC test case (@Ignore'd) to 9.9.x, 9.x and main. Actually I can un-@Ignore at least in main. I'll go do that. -- This is an automa

Re: [PR] #12901: add TestBackwardsCompatibility test case that reveals the block tree IntersectTermsEnum bug #12895 [lucene]

2023-12-11 Thread via GitHub
mikemccand merged PR #12913: URL: https://github.com/apache/lucene/pull/12913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422494949 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850117651 > It's actually a bad news that all tests pass here, as this means that our `BasePostingsFormatTestCase` is not good enough to uncover the recent problem with `Terms#intersect`... So

  1   2   >