Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-16 Thread via GitHub
jpountz commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1678955876 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1792,61 +1794,88 @@ public DocValuesSkipper getSkipper(FieldInfo field

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput when IndexInputs are closed in other threads (deoptimizations,...) [lucene]

2024-07-16 Thread via GitHub
uschindler commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2230338249 To be clear: With raising number of threads, the process of closing an indexinput gets slower. With recent Lucene improve,ments this is mitigated a bit by grouping files together

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679005037 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679005565 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679007057 ## lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java: ## @@ -199,20 +202,27 @@ public IndexInput openInput(String name, IOContext context) throws IOE

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1678835899 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [PR] Reduce overhead for FSTs in FieldReader [lucene]

2024-07-16 Thread via GitHub
jpountz commented on PR #13524: URL: https://github.com/apache/lucene/pull/13524#issuecomment-2230451789 > Happy to :) I was meaning to look into testing this stuff and was way overthinking it :D just counting clones+slices seems like a really cool idea here :) I'll see what I can do!

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679167237 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -115,6 +116,69 @@ public final LongValuesSource toLongValuesSource() { return ne

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679168485 ## lucene/core/src/java21/org/apache/lucene/store/RefCountedSharedArena.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679170017 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } ret

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679175111 ## lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java: ## @@ -199,20 +202,27 @@ public IndexInput openInput(String name, IOContext context) throws I

[I] Luke does not support `spanNear` queries [lucene]

2024-07-16 Thread via GitHub
stdedos opened a new issue, #13573: URL: https://github.com/apache/lucene/issues/13573 ### Description The following query ``` +((id:*me*)^0.6 (label:*me*)^1.2 (group:*me*)^0.9 (renderer:*me*)^0.9 (prettyFormula:*me*)^0.1) +((id:*sa*)^0.6 (label:*sa*)^1.2 (group:*sa*)^0.9 (

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-16 Thread via GitHub
benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2230645036 @MilindShyani that is just a bad comment. All that will have to be cleaned up. The 127 is when the int7 is the default. You can see in the ctor, we use the provided bits to

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-07-16 Thread via GitHub
jpountz commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2230692592 > We might also get memory locality benefits This is what got me to thinking of BP for HNSW search: intuitively, it could help a lot when the dataset size exceeds the size of

[PR] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty opened a new pull request, #13574: URL: https://github.com/apache/lucene/pull/13574 This commit uses IOContext.READONCE in more places where the index input is clearly being read once by the thread opening it. We can then enforce that segment files are only opened with READONC

Re: [PR] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty commented on code in PR #13574: URL: https://github.com/apache/lucene/pull/13574#discussion_r1679257142 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -172,12 +172,12 @@ public ChecksumIndexInput openChecksumInput(String name) throws IOExcept

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679257863 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } ret

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
stefanvodita commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679392013 ## lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679400752 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -630,27 +630,47 @@ private TopFieldDocs searchAfter( */ public T search(Query query,

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679415416 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -0,0 +1,714 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679427106 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -0,0 +1,714 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679580447 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetCutter.java: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679590672 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/taxonomy/TaxonomyFacetsCutter.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679663901 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/taxonomy/TaxonomyFacetsCutter.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679664044 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/taxonomy/TaxonomyFacetsCutter.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679672923 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetRecorder.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679680303 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679687355 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679689124 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/package-info.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679695936 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/taxonomy/TaxonomyFacetsCutter.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679700068 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldLeafCollector.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add target search concurrency to TieredMergePolicy [lucene]

2024-07-16 Thread via GitHub
carlosdelest commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2231325670 > I would suggest improving LuceneTestCase#newTieredMergePolicy(Random) should randomly set the target concurrency on the merge policy. And remove the call to the setter to the Base

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679702196 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetRecorder.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679706408 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ComparableUtils.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
dsmiley commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679708555 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } return s

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679711129 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ComparableUtils.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
Shradha26 commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679714451 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/taxonomy/TaxonomyChildrenOrdinalIterator.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679741852 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679741852 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [PR] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13574: URL: https://github.com/apache/lucene/pull/13574#discussion_r1679759640 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -177,7 +177,7 @@ public ChecksumIndexInput openChecksumInput(String name) throws IOException

Re: [PR] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-16 Thread via GitHub
uschindler commented on PR #13574: URL: https://github.com/apache/lucene/pull/13574#issuecomment-2231403877 If you find other use cases for READONCE, be free to add them. I really like the MockDirectory check! -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-16 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1679766918 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -125,4 +134,31 @@ private final MemorySegment[] map( } retur

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput when IndexInputs are closed in other threads (deoptimizations,...) [lucene]

2024-07-16 Thread via GitHub
dsmiley commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2231607373 Solr already takes care to ensure a SolrCore is closed while nobody is accessing it via its refcount mechanism on the SolrCore. I understand the JVM doesn't know that, and needs to

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679962280 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetLeafRecorder.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679971693 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetLeafRecorder.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1679992791 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/FacetRecorder.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1680024142 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

[PR] WIP do not merge [lucene]

2024-07-16 Thread via GitHub
msokolov opened a new pull request, #13577: URL: https://github.com/apache/lucene/pull/13577 ### Description This is a follow-on to https://github.com/apache/lucene/pull/13566 that adds some unit testing for HnswUtil plus an implementation that identifies strongly-connected components o

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-07-16 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2231810578 > This is what got me to thinking of BP for HNSW search: intuitively, it could help a lot when the dataset size exceeds the size of the page cache? Yes, and as we keep compre

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1680091617 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1680094312 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Compute facets while collecting [lucene]

2024-07-16 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1680096226 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-07-16 Thread via GitHub
navneet1v commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2231959095 @vigyasharma is there a reason to adding the multi vector field support and not use the parent child relationship of the documents to fulfill this use case? -- This is an automated m

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-07-16 Thread via GitHub
benwtrent commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2232052499 @navneet1v The pattern doesn't work well with ColBERT esque models. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Nrt snapshot 9x [lucene]

2024-07-16 Thread via GitHub
github-actions[bot] commented on PR #13534: URL: https://github.com/apache/lucene/pull/13534#issuecomment-2232053198 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-07-16 Thread via GitHub
github-actions[bot] commented on PR #12868: URL: https://github.com/apache/lucene/pull/12868#issuecomment-2232054089 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-16 Thread via GitHub
vsop-479 commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-2232216545 FWIW, I also measured performance with luceneutil on `wikimediumall`: TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in Cā€¦ [lucene]

2024-07-16 Thread via GitHub
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1680353139 ## lucene/core/build.gradle: ## @@ -14,10 +14,43 @@ * See the License for the specific language governing permissions and * limitations under the License. */ +pl

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-16 Thread via GitHub
MilindShyani commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2232355996 The math is quite simple as we saw above, but the code (I guess because its trying to do 7 and 8 bits at the same time) is giving me a really hard time šŸ˜… How is the quadr