[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
zacharymorn commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1463441417 Thanks @gsmiller for your review and suggestions! > What about updating FixedBitSet#or(disi) to use this? That's used when rewriting MultiTermQuery instances, and I would think

[GitHub] [lucene] zacharymorn commented on a diff in pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
zacharymorn commented on code in PR #12194: URL: https://github.com/apache/lucene/pull/12194#discussion_r1132077572 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java: ## @@ -479,6 +481,31 @@ private void refillDocs() throws IOException {

[GitHub] [lucene] alessandrobenedetti commented on pull request #12169: Introduced the Word2VecSynonymFilter

2023-03-10 Thread via GitHub
alessandrobenedetti commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1463774563 I'll leave the BoostAttribute discussion for another time as I don't have access to the code right now, but javaDocs or not, it seems extremely suspicious to have a public cl

[GitHub] [lucene] jpountz opened a new pull request, #12198: Reduce contention in DocumentsWriterFlushControl.

2023-03-10 Thread via GitHub
jpountz opened a new pull request, #12198: URL: https://github.com/apache/lucene/pull/12198 lucene-util's `IndexGeoNames` benchmark is heavily contended when running with many indexing threads, 20 in my case. The main offender is `DocumentsWriterFlushControl#doAfterDocument`, which runs aft

[GitHub] [lucene] gsmiller closed pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available

2023-03-10 Thread via GitHub
gsmiller closed pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available URL: https://github.com/apache/lucene/pull/12089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [lucene] gsmiller commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available

2023-03-10 Thread via GitHub
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1463886798 Closing this out for now. I think my last comment provides a pretty good summary of where things landed here. I did experiment a bit with other ways of estimating cost within `TermInSet

[GitHub] [lucene] gsmiller commented on a diff in pull request #12184: [nocommit] Introduce ExpressionFacets along with a demo.

2023-03-10 Thread via GitHub
gsmiller commented on code in PR #12184: URL: https://github.com/apache/lucene/pull/12184#discussion_r1132530610 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/ExpressionFacets.java: ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [lucene] gsmiller commented on pull request #11746: Deprecate LongValueFacetCounts#getTopChildrenSortByCount since it provides redundant functionality

2023-03-10 Thread via GitHub
gsmiller commented on PR #11746: URL: https://github.com/apache/lucene/pull/11746#issuecomment-1463976493 I back ported this manually. Closing out the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [lucene] gsmiller commented on pull request #12171: Add APIs to get ordinal and category cache hit/miss count and hit rate in DirectoryTaxonomyReader

2023-03-10 Thread via GitHub
gsmiller commented on PR #12171: URL: https://github.com/apache/lucene/pull/12171#issuecomment-1463977181 Closing this out for now since I don't think we want to add these APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] gsmiller closed pull request #12171: Add APIs to get ordinal and category cache hit/miss count and hit rate in DirectoryTaxonomyReader

2023-03-10 Thread via GitHub
gsmiller closed pull request #12171: Add APIs to get ordinal and category cache hit/miss count and hit rate in DirectoryTaxonomyReader URL: https://github.com/apache/lucene/pull/12171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] gsmiller closed pull request #11746: Deprecate LongValueFacetCounts#getTopChildrenSortByCount since it provides redundant functionality

2023-03-10 Thread via GitHub
gsmiller closed pull request #11746: Deprecate LongValueFacetCounts#getTopChildrenSortByCount since it provides redundant functionality URL: https://github.com/apache/lucene/pull/11746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [lucene] gsmiller commented on pull request #12171: Add APIs to get ordinal and category cache hit/miss count and hit rate in DirectoryTaxonomyReader

2023-03-10 Thread via GitHub
gsmiller commented on PR #12171: URL: https://github.com/apache/lucene/pull/12171#issuecomment-1463977754 Thanks again @shubhamvishu for raising the idea and looking into the effectiveness of the caches more generally! -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] gsmiller commented on pull request #11739: DRAFT: TermInSetQuery refactored to extend MultiTermsQuery

2023-03-10 Thread via GitHub
gsmiller commented on PR #11739: URL: https://github.com/apache/lucene/pull/11739#issuecomment-1463987636 We finally did this in #12156! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [lucene] gsmiller closed pull request #11739: DRAFT: TermInSetQuery refactored to extend MultiTermsQuery

2023-03-10 Thread via GitHub
gsmiller closed pull request #11739: DRAFT: TermInSetQuery refactored to extend MultiTermsQuery URL: https://github.com/apache/lucene/pull/11739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [lucene] gsmiller closed pull request #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost

2023-03-10 Thread via GitHub
gsmiller closed pull request #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost URL: https://github.com/apache/lucene/pull/11741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] jpountz opened a new pull request, #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
jpountz opened a new pull request, #12199: URL: https://github.com/apache/lucene/pull/12199 Obtaining a DWPT and putting it back into the pool is subject to contention. This change reduces contention by using 8 sub pools that are tried sequentially. When applied on top of #12198, this reduc

[GitHub] [lucene] gsmiller commented on pull request #11780: GH#11601: Add ability to compute reader states after refresh

2023-03-10 Thread via GitHub
gsmiller commented on PR #11780: URL: https://github.com/apache/lucene/pull/11780#issuecomment-1464091101 Sorry @stefanvodita, just now coming back to this after a break. OK, so current state-of-the-world: Right now, users have to instantiate reader state instances for each field t

[GitHub] [lucene] gsmiller commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh

2023-03-10 Thread via GitHub
gsmiller commented on code in PR #11780: URL: https://github.com/apache/lucene/pull/11780#discussion_r1132608504 ## lucene/facet/src/java/org/apache/lucene/facet/sortedset/SSDVReaderStatesCalculator.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132628795 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() {

[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
jpountz commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132647604 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter()

[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132698882 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() {

[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132703519 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() {

[GitHub] [lucene] mdmarshmallow commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
mdmarshmallow commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1464188468 > > I think @mdmarshmallow might be working on this as per [#11915 (comment)](https://github.com/apache/lucene/issues/11915#issuecomment-1459502217). As part of this PR, I'

[GitHub] [lucene] jpountz commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
jpountz commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132726733 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter()

[GitHub] [lucene] dweiss commented on a diff in pull request #12199: Reduce contention in DocumentsWriterPerThreadPool.

2023-03-10 Thread via GitHub
dweiss commented on code in PR #12199: URL: https://github.com/apache/lucene/pull/12199#discussion_r1132739002 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -113,15 +113,19 @@ private synchronized DocumentsWriterPerThread newWriter() {

[GitHub] [lucene] frzhanguber opened a new issue, #12200: Lucene94HnswVectorsReader integer overflow when calculating graphOffsetsByLevel

2023-03-10 Thread via GitHub
frzhanguber opened a new issue, #12200: URL: https://github.com/apache/lucene/issues/12200 ### Description when loading with a dense graph with M=64 and beamWidth=400, the following part graphOffsetsByLevel calculation overflows, since both M, Integer.BYTES and numNodesOnLevel0 are i

[GitHub] [lucene] gsmiller commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
gsmiller commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1464329664 Woohoo! Thanks @zacharymorn / @mdmarshmallow! I suspect we may not really see any benefit though if the DISI can only expose the next non-matching doc within its current block. I think

[GitHub] [lucene] shaikhu opened a new pull request, #12201: Github 10633 Update Javadoc comment to mention gradle instead of ant

2023-03-10 Thread via GitHub
shaikhu opened a new pull request, #12201: URL: https://github.com/apache/lucene/pull/12201 I believe this is a fix for issue [10633](https://github.com/apache/lucene/issues/10633). Although the path is slightly different, this is the only class I could find with the name `TestBackwa

[GitHub] [lucene] rmuir closed issue #12200: Lucene94HnswVectorsReader integer overflow when calculating graphOffsetsByLevel

2023-03-10 Thread via GitHub
rmuir closed issue #12200: Lucene94HnswVectorsReader integer overflow when calculating graphOffsetsByLevel URL: https://github.com/apache/lucene/issues/12200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [lucene] rmuir commented on issue #12200: Lucene94HnswVectorsReader integer overflow when calculating graphOffsetsByLevel

2023-03-10 Thread via GitHub
rmuir commented on issue #12200: URL: https://github.com/apache/lucene/issues/12200#issuecomment-1464410196 duplicate of #11905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [lucene] erikhatcher commented on pull request #12159: Remove the Now Unused Class `pointInPolygon`.

2023-03-10 Thread via GitHub
erikhatcher commented on PR #12159: URL: https://github.com/apache/lucene/pull/12159#issuecomment-1464504770 Looks good to me. Eagle eye Marcus. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] gsmiller commented on pull request #12153: Unrelated code in TestIndexSortSortedNumericDocValuesRangeQuery

2023-03-10 Thread via GitHub
gsmiller commented on PR #12153: URL: https://github.com/apache/lucene/pull/12153#issuecomment-1464578469 +1 Did a little digging, and it appears to have been carried over from some copy/paste of `TestDocValuesQueries#testToString`. Let's remove. We've already got this (exact) cover

[GitHub] [lucene] zacharymorn commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
zacharymorn commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1464741062 Thanks @mdmarshmallow for working on it! Btw, I just pushed a commit (https://github.com/apache/lucene/pull/12194/commits/f78182bae7e92b23136b5975dbb9d3199e5e3065) that fixed some bu

[GitHub] [lucene] mdmarshmallow commented on pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-10 Thread via GitHub
mdmarshmallow commented on PR #12194: URL: https://github.com/apache/lucene/pull/12194#issuecomment-1464803846 Yeah I was doing some of my own debugging and saw some of those issues. I think this fixed a decent amount of the issues I was seeing but I'm still seeing problems with some tests.