[I] Lexical error using lucene 9 but lucene 2 works [lucene]

2024-05-29 Thread via GitHub
roykyle8 opened a new issue, #13437: URL: https://github.com/apache/lucene/issues/13437 ### Description I've a project using lucene 2 and trying to upgrade to lucene 9. Following simple program works in lucene 2. ``` public class CustomMultiFieldQueryParser { priv

Re: [I] How to speedup concurrent merge [lucene]

2024-05-29 Thread via GitHub
hanqiushi commented on issue #13432: URL: https://github.com/apache/lucene/issues/13432#issuecomment-2138559678 > What version are you using? Lucene 9.11 introduced concurrent merging to help speed up merging, see #13124. Now I'm using 9.9.1 version, I'll check this 9.11 version, than

Re: [I] How to speedup concurrent merge [lucene]

2024-05-29 Thread via GitHub
hanqiushi commented on issue #13432: URL: https://github.com/apache/lucene/issues/13432#issuecomment-2138552362 > When this is happening (you are waiting for merge) are you continuing to index new documents, or do you commit() and then during that time not submit any new docs (ie with Index

Re: [PR] Add a double addressing vector scorer [lucene]

2024-05-29 Thread via GitHub
github-actions[bot] commented on PR #13370: URL: https://github.com/apache/lucene/pull/13370#issuecomment-2138451221 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-05-29 Thread via GitHub
jpountz commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2138323749 > Am I correct in understanding that prefetching an already-fetched page is (at least approximately) a no-op? We tried to make it cheap (see e.g. the logic to disable calling

Re: [I] TestIndexWriterOnVMError.testUnknownError fails with unclosed file handles [lucene]

2024-05-29 Thread via GitHub
jpountz closed issue #13434: TestIndexWriterOnVMError.testUnknownError fails with unclosed file handles URL: https://github.com/apache/lucene/issues/13434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Fix TestIndexWriterOnError.testIOError failure. [lucene]

2024-05-29 Thread via GitHub
jpountz merged PR #13436: URL: https://github.com/apache/lucene/pull/13436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix TestIndexWriterOnError.testIOError failure. [lucene]

2024-05-29 Thread via GitHub
jpountz commented on code in PR #13436: URL: https://github.com/apache/lucene/pull/13436#discussion_r1619417391 ## lucene/core/src/java/org/apache/lucene/index/SegmentDocValues.java: ## @@ -76,10 +76,12 @@ synchronized DocValuesProducer getDocValuesProducer( /** Decrement t

Re: [PR] Fix TestIndexWriterOnError.testIOError failure. [lucene]

2024-05-29 Thread via GitHub
bruno-roustant commented on code in PR #13436: URL: https://github.com/apache/lucene/pull/13436#discussion_r1619393930 ## lucene/core/src/java/org/apache/lucene/index/SegmentDocValues.java: ## @@ -76,10 +76,12 @@ synchronized DocValuesProducer getDocValuesProducer( /** Decr

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-05-29 Thread via GitHub
sohami commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2138104220 > This should work, though I'm wary of making it the new way that collectors need to interact with doc values if they want to be able to take advantage of prefetching To add t

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-05-29 Thread via GitHub
carlosdelest commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2138089436 Thanks @mikemccand and @jpountz for your explanations! This is much clearer now. I've updated the PR, LMKWYT about the approach! I'm looking into effective ways of test

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-05-29 Thread via GitHub
msfroh commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2138047304 > This should work, though I'm wary of making it the new way that collectors need to interact with doc values if they want to be able to take advantage of prefetching. E.g. we also h

Re: [PR] Fix TestIndexWriterOnError.testIOError failure. [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on code in PR #13436: URL: https://github.com/apache/lucene/pull/13436#discussion_r1619307513 ## lucene/core/src/java/org/apache/lucene/index/SegmentDocValues.java: ## @@ -76,10 +76,12 @@ synchronized DocValuesProducer getDocValuesProducer( /** Decrement

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-29 Thread via GitHub
shatejas commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2137932850 For HNSW efSearch is a core parameters during search time. This is convenient for users to not have to have the logic to strip off top k values on their end. Another way is to e

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2137917437 I am not overly convinced that this is necessary. I am not hard and fast against it, I just don't see the need. -- This is an automated message from the Apache Git Service. To

[PR] Fix TestIndexWriterOnError.testIOError failure. [lucene]

2024-05-29 Thread via GitHub
jpountz opened a new pull request, #13436: URL: https://github.com/apache/lucene/pull/13436 Pull request #13406 inadvertly broke Lucene's handling of tragic exceptions by stopping after the first `DocValuesProducer` whose `close()` calls throws an exception, instead of keeping calling `clos

Re: [PR] Improve Test Coverage: added tests for IntRange [lucene]

2024-05-29 Thread via GitHub
stefanvodita commented on code in PR #13418: URL: https://github.com/apache/lucene/pull/13418#discussion_r1619076421 ## lucene/core/src/test/org/apache/lucene/document/TestIntRange.java: ## @@ -23,4 +23,16 @@ public void testToString() { IntRange range = new IntRange("foo",

Re: [PR] Add new test case "testGetLines" for lucene/core/analysis/WordlistLoader [lucene]

2024-05-29 Thread via GitHub
stefanvodita commented on code in PR #13419: URL: https://github.com/apache/lucene/pull/13419#discussion_r1619067114 ## lucene/core/src/test/org/apache/lucene/analysis/TestWordlistLoader.java: ## @@ -77,4 +82,17 @@ public void testSnowballListLoading() throws IOException {

[I] TestFloatVectorSimilarityQuery.testSomeDeletes fails in 9x [lucene]

2024-05-29 Thread via GitHub
benwtrent opened a new issue, #13435: URL: https://github.com/apache/lucene/issues/13435 ### Description TestFloatVectorSimilarityQuery.testSomeDeletes fails reliably in 9x with: ``` org.apache.lucene.search.TestFloatVectorSimilarityQuery > testSomeDeletes FAILED java.lan

Re: [I] TestIndexWriterOnVMError.testUnknownError fails with unclosed file handles [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on issue #13434: URL: https://github.com/apache/lucene/issues/13434#issuecomment-2137471698 Also note, ``` gradlew test --tests TestIndexWriterOnError.testIOError -Dtests.seed=D02646EAEA66A6CB -Dtests.nightly=true -Dtests.locale=ff-Adlm-BF -Dtests.timezone=Eur

Re: [PR] Add BitVectors format and make flat vectors format easier to extend [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on PR #13288: URL: https://github.com/apache/lucene/pull/13288#issuecomment-2137382358 1. No, BitVector format is not in the backwards compatible package. 2. Correct, there have been previous discussions in an effort to add it as a similarity value, but those conversat

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2137366217 @jmazanec15 this is accounted for in the corrections. The moving from signed to unsigned is still just a linear transformation, we are not manually flipping signs, but instead doi

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-05-29 Thread via GitHub
mikemccand commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2137365637 Thank you for tackling this @carlosdelest! What a hairy challenge ... TMP really is its own little baby chess engine, with many things it is trying to optimize towards (what the ML w

Re: [I] How to speedup concurrent merge [lucene]

2024-05-29 Thread via GitHub
jpountz commented on issue #13432: URL: https://github.com/apache/lucene/issues/13432#issuecomment-2137311843 What version are you using? Lucene 9.11 introduced concurrent merging to help speed up merging, see #13124. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-05-29 Thread via GitHub
jpountz commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2137276912 > There are at most 2 segment tiers. Well, there can be more tiers, but since tiers have exponential sizes (e.g. if you merge factor is 10, each tier has segments that are 10x bigg

[PR] Fix test failure on TestPoint#testEqualsAndHashCode [lucene]

2024-05-29 Thread via GitHub
easyice opened a new pull request, #13433: URL: https://github.com/apache/lucene/pull/13433 If two objects are different, it might have the same hashcode. in this case, `Point(-180.0,90.0)` and `Point(180.0,-90.0)` will have the same hashcode value. ``` org.apache.lucene.geo.Test

Re: [I] Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-29 Thread via GitHub
benwtrent closed issue #13380: Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults URL: https://github.com/apache/lucene/issues/13380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-29 Thread via GitHub
benwtrent commented on issue #13380: URL: https://github.com/apache/lucene/issues/13380#issuecomment-2137189569 closed by https://github.com/apache/lucene/pull/13361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-05-29 Thread via GitHub
carlosdelest commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2137112864 Hi @jpountz , thanks for your thoughts! Let me check whether I got your suggestion right: - There are at most 2 segment tiers. The first has `minNumSegments` at most, the s

Re: [PR] Improve Test Coverage: added tests for SlowLog [lucene]

2024-05-29 Thread via GitHub
stefanvodita commented on code in PR #13417: URL: https://github.com/apache/lucene/pull/13417#discussion_r1618536495 ## lucene/monitor/src/test/org/apache/lucene/monitor/TestSlowLog.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [I] Support getting counts from "association" facets [LUCENE-10246] [lucene]

2024-05-29 Thread via GitHub
stefanvodita commented on issue #11282: URL: https://github.com/apache/lucene/issues/11282#issuecomment-2136753263 > Should we just put another "count" field in LabelAndValue and have both value and count be populated with a count for non-association cases? That sounds weird. We did

Re: [PR] Add prefetching support to stored fields. [lucene]

2024-05-29 Thread via GitHub
gf2121 commented on code in PR #13424: URL: https://github.com/apache/lucene/pull/13424#discussion_r1618348032 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java: ## @@ -609,6 +622,23 @@ public void skipBytes(long num

[I] How to speedup concurrent merge [lucene]

2024-05-29 Thread via GitHub
hanqiushi opened a new issue, #13432: URL: https://github.com/apache/lucene/issues/13432 ### Description Hi, I have encountered a problem when using lucene in windows. When I run IndexWriter.commit(), I may wait several minutes for doing background index merge. The amount of dat

Re: [PR] Add prefetching support to stored fields. [lucene]

2024-05-29 Thread via GitHub
jpountz commented on code in PR #13424: URL: https://github.com/apache/lucene/pull/13424#discussion_r1618325471 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java: ## @@ -609,6 +622,23 @@ public void skipBytes(long nu