[PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 opened a new pull request, #14417: URL: https://github.com/apache/lucene/pull/14417 ### Description This PR aims to address issue [14362](https://github.com/apache/lucene/issues/14362). This issue includes a discussion of the benefits of this modification.

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2756943569 It'll work with those indented lines as well - it actually will align block indentation to the column they should be starting at. So: ``` /// [Collector] for cutter+recorder

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2756947753 [here's the gjf issue I asked for the direction they'd like to follow](https://github.com/google/google-java-format/issues/1193) -- This is an automated message from the Apache Git

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-27 Thread via GitHub
mayya-sharipova commented on code in PR #14331: URL: https://github.com/apache/lucene/pull/14331#discussion_r2005461835 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswMerger.java: ## @@ -51,19 +57,85 @@ protected HnswBuilder createBuilder(KnnVectorValues merg

Re: [PR] Revert "gh-12627: HnswGraphBuilder connects disconnected HNSW graph components (#13566)" [lucene]

2025-03-27 Thread via GitHub
benwtrent commented on PR #14411: URL: https://github.com/apache/lucene/pull/14411#issuecomment-2757791544 @txwei have you seen this behavior in production? I am wondering on the urgency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-03-27 Thread via GitHub
benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2757791987 > Can we expose a graph construction parameter in Lucene99HnswVectorsFormat to gate the connectComponents() call? This would allow us to mitigate this issue while a more comprehen

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2757854863 Let's wait a few days and see if there's any feedback. Like I mentioned above, what we use for formatting/checking format adherence shouldn't really matter for anybody who uses grad

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2757832416 awesome! I really hope the patch is accepted: this will definitely give us a path forward. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2757929583 yeah: only thing I will say about choice of formatter is that normally I try to configure the editor in the repo to match what the build expects. e.g. for languages like python and ty

Re: [PR] #14410 - Add Anytime Ranking Searching - SLA-constrained ranking With Range Boosting and Dynamic SLA [lucene]

2025-03-27 Thread via GitHub
atris commented on PR #14409: URL: https://github.com/apache/lucene/pull/14409#issuecomment-2758008422 My bad - should have set some more context. In reference to https://github.com/apache/lucene/issues/13675 The paper referred to by Adrien has a component on anytime sea

Re: [PR] #14410 - Add Anytime Ranking Searching - SLA-constrained ranking With Range Boosting and Dynamic SLA [lucene]

2025-03-27 Thread via GitHub
benwtrent commented on PR #14409: URL: https://github.com/apache/lucene/pull/14409#issuecomment-2758025254 @atris Ah, thank you, I will take a look at the paper first. Do you have any benchmarking to replicate the paper's findings within the Apache Lucene context? -- This is an aut

Re: [PR] #14410 - Add Anytime Ranking Searching - SLA-constrained ranking With Range Boosting and Dynamic SLA [lucene]

2025-03-27 Thread via GitHub
benwtrent commented on PR #14409: URL: https://github.com/apache/lucene/pull/14409#issuecomment-2757970091 @atris I am gonna be frank, I haven't a clue what this is doing :D. Why is this being added? What is it supposed to accomplish? Maybe there is context I am missing... --

Re: [PR] skip keyword in German Normalization Filter [lucene]

2025-03-27 Thread via GitHub
rmuir commented on PR #14416: URL: https://github.com/apache/lucene/pull/14416#issuecomment-2757816185 i think it will introduce a ton more complexity: that's why we've pushed back on doing this for anything that isn't a stemmer. otherwise people will want LowerCaseFilter to respect it too.

Re: [PR] Add Issue Tracker Link under 'Editing Content on the Luceneā„¢ Sites' [lucene-site]

2025-03-27 Thread via GitHub
dweiss commented on code in PR #78: URL: https://github.com/apache/lucene-site/pull/78#discussion_r2003502701 ## content/pages/site-instructions.md: ## @@ -3,8 +3,10 @@ URL: site-instructions.html save_as: site-instructions.html template: lucene/tlp/page + ## Editing Conten

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017546602 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017552582 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorer.java: ## @@ -207,8 +164,32 @@ private void scoreWindowIntoBitSetAndReplay( acceptDocs.applyMask(m

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017556183 ## lucene/core/src/java/org/apache/lucene/search/BitSetDocIdStream.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-27 Thread via GitHub
msfroh commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2759357863 Does it make sense to create a separate `QueryProfilerBreakDown` per leaf? Or should it create one per slice? Can this be implemented as part of the `CollectorManager#newCollector`

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017544601 ## lucene/core/src/java/org/apache/lucene/search/DocIdStream.java: ## @@ -34,12 +33,34 @@ protected DocIdStream() {} * Iterate over doc IDs contained in this strea

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-27 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2017693759 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -122,12 +123,30 @@ public LRUQueryCache( long maxRamBytesUsed, Predicate leave

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-27 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r2006889147 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/TrieBuilder.java: ## @@ -0,0 +1,552 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2759494063 > This would also benefit https://github.com/apache/lucene/pull/14273 I don't think so, or rather taking advantage of range collection shouldn't help more than what #14273 does wit

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2017616847 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -99,7 +100,7 @@ public class LRUQueryCache implements QueryCache, Accountable { private

Re: [I] Leverage sparse doc value indexes for range and value facet collection [lucene]

2025-03-27 Thread via GitHub
jpountz commented on issue #14406: URL: https://github.com/apache/lucene/issues/14406#issuecomment-2759414888 Thanks for the explanation, that makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Revert "gh-12627: HnswGraphBuilder connects disconnected HNSW graph components (#13566)" [lucene]

2025-03-27 Thread via GitHub
txwei commented on PR #14411: URL: https://github.com/apache/lucene/pull/14411#issuecomment-2759555261 @benwtrent we haven't released this to prod yet. We spotted a severe perf regression with the degenerate case of all vectors being the same. Since we have enough customers using all kinds

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-27 Thread via GitHub
jpountz commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2759377415 @jainankitk OK. In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017721363 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -204,6 +205,40 @@ public int cardinality() { return Math.toIntExact(tot); } + /** +

Re: [PR] Avoid reload block when seeking backward in SegmentTermsEnum. [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #13253: URL: https://github.com/apache/lucene/pull/13253#issuecomment-2759878401 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017767190 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015958210 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -5037,4 +5037,47 @@ public void testDocValuesSkippingIndexWithoutDocValues() throws

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015951779 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -1427,6 +1427,24 @@ public synchronized void advanceSegmentInfosVersion(long newVersion)

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015954336 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -1427,6 +1427,24 @@ public synchronized void advanceSegmentInfosVersion(long newVersion)

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015957918 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -5037,4 +5037,47 @@ public void testDocValuesSkippingIndexWithoutDocValues() throws

Re: [PR] #14410 - Add Anytime Ranking Searching - SLA-constrained ranking With Range Boosting and Dynamic SLA [lucene]

2025-03-27 Thread via GitHub
atris commented on PR #14409: URL: https://github.com/apache/lucene/pull/14409#issuecomment-2756755022 @jpountz @benwtrent requesting your review please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on PR #14417: URL: https://github.com/apache/lucene/pull/14417#issuecomment-2757291661 Thanks for helping with the code review, I have made modifications according to the suggestions -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
guojialiang92 commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015872376 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -5037,4 +5037,45 @@ public void testDocValuesSkippingIndexWithoutDocValues() throws

Re: [PR] skip keyword in German Normalization Filter [lucene]

2025-03-27 Thread via GitHub
xzhang9292 commented on PR #14416: URL: https://github.com/apache/lucene/pull/14416#issuecomment-2756971573 > This keyword is legacy, for stemmers not normalizers. Just use ProtectedTermFilter which works with any tokenfilter without requiring modification to its code? @rmuir Thank y

Re: [I] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
vigyasharma commented on issue #14362: URL: https://github.com/apache/lucene/issues/14362#issuecomment-2756963298 Sounds good to me. Will watch out for your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
vigyasharma commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015817265 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -5037,4 +5037,45 @@ public void testDocValuesSkippingIndexWithoutDocValues() throws Ex

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
vigyasharma commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015825615 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -1427,6 +1427,24 @@ public synchronized void advanceSegmentInfosVersion(long newVersion) {

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-27 Thread via GitHub
vigyasharma commented on code in PR #14417: URL: https://github.com/apache/lucene/pull/14417#discussion_r2015832195 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -5037,4 +5037,47 @@ public void testDocValuesSkippingIndexWithoutDocValues() throws Ex

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2758283399 I played with the geonames dataset, by filtering out docs that don't have a value for the `elevation` field (2.3M docs left), enabling index sorting on the `elevation` field and computin

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-27 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2758276309 You're right - this may be a problem. I use intellij and even gjf there does not work exactly the same way (https://github.com/google/google-java-format/pull/1165). I stopped caring

Re: [PR] Make PointValues.intersect iterative instead of recursive [lucene]

2025-03-27 Thread via GitHub
iverase commented on code in PR #14391: URL: https://github.com/apache/lucene/pull/14391#discussion_r2009830096 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -351,35 +351,32 @@ public final void intersect(IntersectVisitor visitor) throws IOException {

Re: [PR] #14410 - Add Anytime Ranking Searching - SLA-constrained ranking With Range Boosting and Dynamic SLA [lucene]

2025-03-27 Thread via GitHub
jpountz commented on PR #14409: URL: https://github.com/apache/lucene/pull/14409#issuecomment-2758339188 I don't think that the timout support that you are introducing buys anything compared with the existing timeout support via `IndexSearcher#setTimeout` and `TimeLimitingBulkScorer`. To me

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-27 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2758741304 > Can you explain why we need two impls? I would have assumed that the ConcurrentQueryProfilerBreakdown could also be used for searches that are not concurrent? `ConcurrentQuer

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2758696779 It makes sense to me to expose the idea of doc range collection as a first-class API on leaf collectors for the reasons you outlined above. This would also benefit #14273 as well right?

Re: [PR] skip keyword in German Normalization Filter [lucene]

2025-03-27 Thread via GitHub
xzhang9292 closed pull request #14416: skip keyword in German Normalization Filter URL: https://github.com/apache/lucene/pull/14416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-27 Thread via GitHub
sgup432 commented on code in PR #14412: URL: https://github.com/apache/lucene/pull/14412#discussion_r2017693759 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -122,12 +123,30 @@ public LRUQueryCache( long maxRamBytesUsed, Predicate leave

[PR] quick exit on filter query matching no docs when rewriting knn query [lucene]

2025-03-27 Thread via GitHub
bugmakerr opened a new pull request, #14418: URL: https://github.com/apache/lucene/pull/14418 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fix TestHnswByteVectorGraph.testBuildingJoinSet [lucene]

2025-03-27 Thread via GitHub
mayya-sharipova merged PR #14398: URL: https://github.com/apache/lucene/pull/14398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-27 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2759703236 > Does it make sense to create a separate QueryProfilerBreakDown per leaf? Or should it create one per slice? Actually, create one per slice makes lot of sense. > Can thi

Re: [PR] Preparing existing profiler for adding concurrent profiling [lucene]

2025-03-27 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2759709664 > In my opinion, it's more important to handle the concurrent and non-concurrent cases consistently than to save some overhead when searches are not concurrent. I'd really like non-co

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017239514 ## lucene/core/src/java/org/apache/lucene/search/DISIDocIdStream.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [I] Leverage sparse doc value indexes for range and value facet collection [lucene]

2025-03-27 Thread via GitHub
gsmiller commented on issue #14406: URL: https://github.com/apache/lucene/issues/14406#issuecomment-2759371300 > Out of curiosity, is it common for the union of the configured ranges to only match a small subset of the index? I would naively expect users to want to collect stats about all t

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-27 Thread via GitHub
jpountz commented on code in PR #14273: URL: https://github.com/apache/lucene/pull/14273#discussion_r2017547274 ## lucene/core/src/java/org/apache/lucene/search/BitSetDocIdStream.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Case-insensitive TermInSetQuery Implementation (Proof of Concept) [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #14349: URL: https://github.com/apache/lucene/pull/14349#issuecomment-2759877032 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #14267: URL: https://github.com/apache/lucene/pull/14267#issuecomment-2759877138 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Disable the query cache by default. [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #14187: URL: https://github.com/apache/lucene/pull/14187#issuecomment-2759877242 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] PointInSetQuery clips segments by lower and upper [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #14268: URL: https://github.com/apache/lucene/pull/14268#issuecomment-2759877100 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Terminate automaton when it can match all suffixes, and match suffixes directly. [lucene]

2025-03-27 Thread via GitHub
github-actions[bot] commented on PR #13072: URL: https://github.com/apache/lucene/pull/13072#issuecomment-2759878644 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi