[I] org.apache.lucene.index.IndexFormatTooNewException on ARM64 [lucene]

2024-06-04 Thread via GitHub
suddendust opened a new issue, #13452: URL: https://github.com/apache/lucene/issues/13452 ### Description I am trying to run Apache Pinot on ARM64 (Graviton). While the table partitions are loaded successfully on x86, I get the following exception on ARM: ``` org.apache.luce

[I] The Closeable interface of CloseableThreadLocal seems redundent [lucene]

2024-06-04 Thread via GitHub
Daniel-Chang-T opened a new issue, #13451: URL: https://github.com/apache/lucene/issues/13451 ### Description ## Description While reading the source code, I noticed that the `CloseableThreadLocal` implementation should release the stored hard references even without invoking `clo

Re: [PR] Give a hint to `IndexInput` about slices that have a forward-only access pattern. [lucene]

2024-06-04 Thread via GitHub
rmuir commented on code in PR #13450: URL: https://github.com/apache/lucene/pull/13450#discussion_r1626828513 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -370,6 +370,16 @@ public void prefetch(long offset, long length) throws IOExceptio

Re: [PR] Add BitVectors format and make flat vectors format easier to extend [lucene]

2024-06-04 Thread via GitHub
navneet1v commented on PR #13288: URL: https://github.com/apache/lucene/pull/13288#issuecomment-2148642670 @benwtrent I am little confused here. I am still looking for an ans of this question: `Does this mean now Lucene supports BitVectorsFormat officially? Or it was more a prototyp

Re: [PR] Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by unordered matches [lucene]

2024-06-04 Thread via GitHub
github-actions[bot] commented on PR #13315: URL: https://github.com/apache/lucene/pull/13315#issuecomment-2148615171 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-04 Thread via GitHub
navneet1v commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2148532097 > There has been talk of adding a "flat" codec to Lucene, that simply takes advantage of quantization and stores the vectors not in the HNSW graph. In that instance, what would efSearc

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-06-04 Thread via GitHub
sohami commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2148405028 > @sohami I gave a try at a possible approach at #13450 in case you're curious. @jpountz Thanks for sharing this. Originally I was thinking the prefetch optimization only in c

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-04 Thread via GitHub
benwtrent commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2148390886 > For HNSW efSearch is a core parameters during search time. This is convenient for users to not have to have the logic to strip off top k values on their end. I understand, but

Re: [PR] Sparse index: optional skip list on top of doc values [lucene]

2024-06-04 Thread via GitHub
ChrisHegarty commented on code in PR #13449: URL: https://github.com/apache/lucene/pull/13449#discussion_r1626571087 ## lucene/core/src/java/org/apache/lucene/index/DocValuesSkipper.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Sparse index: optional skip list on top of doc values [lucene]

2024-06-04 Thread via GitHub
ChrisHegarty commented on code in PR #13449: URL: https://github.com/apache/lucene/pull/13449#discussion_r1626526797 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1690,4 +1722,78 @@ long getLongValue(long index) throws IOExcepti

Re: [I] AnalyzingSuggester exception because of length restriction: java.lang.IllegalArgumentException: len must be <= 32767; got 38751 [LUCENE-6012] [lucene]

2024-06-04 Thread via GitHub
dmaziuk commented on issue #7074: URL: https://github.com/apache/lucene/issues/7074#issuecomment-2148222112 +1: trying to set up the suggester, got `len must be <= 32767; got 38822` How am I supposed to guarantee that the field I'm pulling from external sources? Use the `string` field

Re: [PR] Give a hint to `IndexInput` about slices that have a forward-only access pattern. [lucene]

2024-06-04 Thread via GitHub
rmuir commented on code in PR #13450: URL: https://github.com/apache/lucene/pull/13450#discussion_r1626474331 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -370,6 +370,16 @@ public void prefetch(long offset, long length) throws IOExceptio

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-06-04 Thread via GitHub
carlosdelest commented on code in PR #13430: URL: https://github.com/apache/lucene/pull/13430#discussion_r1626064947 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -522,21 +550,28 @@ private MergeSpecification doFindMerges( final List cand

Re: [PR] Rewrite newSlowRangeQuery to MatchNoDocsQuery when upper > lower [lucene]

2024-06-04 Thread via GitHub
jpountz merged PR #13425: URL: https://github.com/apache/lucene/pull/13425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Removed Scorer#getWeight [lucene]

2024-06-04 Thread via GitHub
jpountz commented on code in PR #13440: URL: https://github.com/apache/lucene/pull/13440#discussion_r1620860434 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -39,15 +39,6 @@ protected Scorer(Weight weight) { this.weight = Objects.requireNonNull(weight)

Re: [PR] Removed Scorer#getWeight [lucene]

2024-06-04 Thread via GitHub
jpountz commented on code in PR #13440: URL: https://github.com/apache/lucene/pull/13440#discussion_r1622334950 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -26,27 +25,7 @@ * increasing order of doc id. */ public abstract class Scorer extends Scorable

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-06-04 Thread via GitHub
jpountz commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2147824728 @sohami I gave a try at a possible approach at #13450 in case you're curious. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] Give a hint to `IndexInput` about slices that have a forward-only access pattern. [lucene]

2024-06-04 Thread via GitHub
jpountz opened a new pull request, #13450: URL: https://github.com/apache/lucene/pull/13450 This introduces a new API that allows directories to optimize access to `IndexInput`s that have a forward-only access pattern by reading ahead of the current position. It would be applicable to:

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-04 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2147657867 > do we have such a class already (that would distinguish the tenants via filename prefix or so)? That's a nice idea all by itself (separate from this use case) -- maybe open a spin

Re: [I] Instrument IndexOrDocValuesQuery to report on its decisions [lucene]

2024-06-04 Thread via GitHub
jpountz commented on issue #13442: URL: https://github.com/apache/lucene/issues/13442#issuecomment-2147633407 > A general framework on IndexSearcher sounds nice, but it's hard to generalize with just this one use case? Can it be something like IndexWriter's InfoStream, but for search?

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-06-04 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1625269547 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

[PR] Sparse index: optional skip list on top of doc values [lucene]

2024-06-04 Thread via GitHub
iverase opened a new pull request, #13449: URL: https://github.com/apache/lucene/pull/13449 Speaking to Adrien about how a sparse index would look like in lucene, he suggested that the sparse indexing does not need to be a new format bit an additional responsibility if `DocValuesFormat`.

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-06-04 Thread via GitHub
jpountz commented on code in PR #13430: URL: https://github.com/apache/lucene/pull/13430#discussion_r1625497128 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -522,21 +550,28 @@ private MergeSpecification doFindMerges( final List candidate