[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-20 Thread GitBox
mdmarshmallow commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1251965881 Thanks for taking the time to look into this! I think 5% would be a good start, it would be near the threshold we want to test (we were thinking 2% but looking at your initial

[GitHub] [lucene] jpountz opened a new pull request, #11792: Fix handling of ghost fields in string sorts.

2022-09-20 Thread GitBox
jpountz opened a new pull request, #11792: URL: https://github.com/apache/lucene/pull/11792 Introduction of dynamic pruning for string sorts (#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is

[GitHub] [lucene] javanna opened a new pull request, #11793: Prvent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
javanna opened a new pull request, #11793: URL: https://github.com/apache/lucene/pull/11793 getPointValues may currently return null for unknown fields or fields that don't index points. It can happen that a field no longer has points for any document in a segment after delete+merge, which

[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
javanna commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r975046441 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) { /** Retur

[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
javanna commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r975048307 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) { /** Retur

[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
jpountz commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r975049657 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -467,4 +467,91 @@ public final long estimateDocCount(IntersectVisitor visitor) { /** Retur

[GitHub] [lucene] jpountz commented on pull request #11781: Diversity check bugfix

2022-09-20 Thread GitBox
jpountz commented on PR #11781: URL: https://github.com/apache/lucene/pull/11781#issuecomment-1252026867 It looks like some recent test failures affecting the 9.4 branch are caused by this change, e.g. https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.4/18/ -- This is an

[GitHub] [lucene] jpountz commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
jpountz commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252033446 The cheapest way to get an estimation of the "cost" of a filter is to run `Weight#scorerSupplier` to retrieve a `ScorerSupplier` and then `ScorerSupplier#cost` to get an estimation

[GitHub] [lucene] jpountz closed issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
jpountz closed issue #11791: cardinality estimation for query filters URL: https://github.com/apache/lucene/issues/11791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [lucene] jpountz commented on pull request #11781: Diversity check bugfix

2022-09-20 Thread GitBox
jpountz commented on PR #11781: URL: https://github.com/apache/lucene/pull/11781#issuecomment-1252034476 Apologies, I just noticed that you opened https://github.com/apache/lucene/issues/11787. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
jpountz commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r975074845 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -467,4 +467,90 @@ public final long estimateDocCount(IntersectVisitor visitor) { /** Retur

[GitHub] [lucene] javanna opened a new pull request, #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
javanna opened a new pull request, #11794: URL: https://github.com/apache/lucene/pull/11794 FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a cer

[GitHub] [lucene] jpountz commented on a diff in pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
jpountz commented on code in PR #11794: URL: https://github.com/apache/lucene/pull/11794#discussion_r975093030 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -702,6 +702,28 @@ private float[] randomVector(int dim) { return v; } + pub

[GitHub] [lucene] javanna commented on a diff in pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
javanna commented on code in PR #11794: URL: https://github.com/apache/lucene/pull/11794#discussion_r975098043 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -702,6 +702,28 @@ private float[] randomVector(int dim) { return v; } + pub

[GitHub] [lucene] dweiss commented on issue #11246: Gradle wrapper validation gh workflow step fails with odd messages [LUCENE-10209]

2022-09-20 Thread GitBox
dweiss commented on issue #11246: URL: https://github.com/apache/lucene/issues/11246#issuecomment-1252075673 Does not reproduce. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [lucene] dweiss closed issue #11246: Gradle wrapper validation gh workflow step fails with odd messages [LUCENE-10209]

2022-09-20 Thread GitBox
dweiss closed issue #11246: Gradle wrapper validation gh workflow step fails with odd messages [LUCENE-10209] URL: https://github.com/apache/lucene/issues/11246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [lucene] dweiss commented on issue #11450: Add fn:fuzzyTerm interval function to flexible query parser [LUCENE-10414]

2022-09-20 Thread GitBox
dweiss commented on issue #11450: URL: https://github.com/apache/lucene/issues/11450#issuecomment-1252082733 Merged to 9x and main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [lucene] dweiss closed issue #11450: Add fn:fuzzyTerm interval function to flexible query parser [LUCENE-10414]

2022-09-20 Thread GitBox
dweiss closed issue #11450: Add fn:fuzzyTerm interval function to flexible query parser [LUCENE-10414] URL: https://github.com/apache/lucene/issues/11450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] dweiss commented on pull request #11789: GitHub Workflows security hardening

2022-09-20 Thread GitBox
dweiss commented on PR #11789: URL: https://github.com/apache/lucene/pull/11789#issuecomment-1252085469 Three is a test failure in precommit but it's unrelated. I'll merge this in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] dweiss merged pull request #11789: GitHub Workflows security hardening

2022-09-20 Thread GitBox
dweiss merged PR #11789: URL: https://github.com/apache/lucene/pull/11789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss commented on pull request #11734: Fix repeating token sentence boundary bug

2022-09-20 Thread GitBox
dweiss commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1252099555 Hi @kotman12 . Sorry for the delay. I'm not that familiar with this part of the codebase but I think I see what's happening and how you managed to fix it. Looks good to me. It'd be good t

[GitHub] [lucene] dweiss commented on a diff in pull request #11734: Fix repeating token sentence boundary bug

2022-09-20 Thread GitBox
dweiss commented on code in PR #11734: URL: https://github.com/apache/lucene/pull/11734#discussion_r975127196 ## lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java: ## @@ -290,4 +299,61 @@ public void testKeywordAttribute

[GitHub] [lucene] janhoy commented on pull request #591: LUCENE-10365 Wizard changes contributed from Solr

2022-09-20 Thread GitBox
janhoy commented on PR #591: URL: https://github.com/apache/lucene/pull/591#issuecomment-1252112200 I'm going to merge this in now, and then any bug fixing can happen in followup commits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [lucene] javanna commented on pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
javanna commented on PR #11793: URL: https://github.com/apache/lucene/pull/11793#issuecomment-1252119563 I pushed an update. I have removed null checks from consumers that were already checking field info. In most cases we already check field info for compatibility, hence we can expect poin

[GitHub] [lucene] janhoy merged pull request #591: LUCENE-10365 Wizard changes contributed from Solr

2022-09-20 Thread GitBox
janhoy merged PR #591: URL: https://github.com/apache/lucene/pull/591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.o

[GitHub] [lucene] janhoy closed issue #11401: releaseWizard improvements from the Solr 9.0 release [LUCENE-10365]

2022-09-20 Thread GitBox
janhoy closed issue #11401: releaseWizard improvements from the Solr 9.0 release [LUCENE-10365] URL: https://github.com/apache/lucene/issues/11401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] jpountz commented on a diff in pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
jpountz commented on code in PR #11794: URL: https://github.com/apache/lucene/pull/11794#discussion_r975170911 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -711,13 +711,15 @@ public void testDeleteAllPointDocs() throws Exception { doc.

[GitHub] [lucene] jpountz merged pull request #11792: Fix handling of ghost fields in string sorts.

2022-09-20 Thread GitBox
jpountz merged PR #11792: URL: https://github.com/apache/lucene/pull/11792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on a diff in pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
jpountz commented on code in PR #11794: URL: https://github.com/apache/lucene/pull/11794#discussion_r975269403 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -702,6 +704,30 @@ private float[] randomVector(int dim) { return v; } + pub

[GitHub] [lucene] javanna commented on a diff in pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
javanna commented on code in PR #11794: URL: https://github.com/apache/lucene/pull/11794#discussion_r975282324 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -702,6 +704,30 @@ private float[] randomVector(int dim) { return v; } + pub

[GitHub] [lucene] jpountz commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
jpountz commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r975300662 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -104,28 +104,28 @@ public abstract class NumericLeafComparator implements Le

[GitHub] [lucene] jpountz merged pull request #11794: Guard FieldExistsQuery against null pointers

2022-09-20 Thread GitBox
jpountz merged PR #11794: URL: https://github.com/apache/lucene/pull/11794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-20 Thread GitBox
jpountz commented on PR #11793: URL: https://github.com/apache/lucene/pull/11793#issuecomment-1252371627 I merged your other PR that adds a null check in FieldExistsQuery, we should now be able to remove this null check with this change? -- This is an automated message from the Apache Git

[GitHub] [lucene] jpountz commented on a diff in pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-09-20 Thread GitBox
jpountz commented on code in PR #687: URL: https://github.com/apache/lucene/pull/687#discussion_r975393095 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -214,12 +220,166 @@ public int count(LeafReaderContext con

[GitHub] [lucene] patelprateek commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
patelprateek commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252575078 @jpountz : will this work for any query and not just filter ? Since I am new to lucene , can you please elaborate a bit on how much cheaper would it be relatively to actually

[GitHub] [lucene] jpountz commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
jpountz commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252597988 Yes, Lucene makes no difference between queries and filters. It's cheap in the sense that it only performs terms dictionary lookups and quick checks in KD tree indexes but doe

[GitHub] [lucene] patelprateek commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
patelprateek commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252641368 Thanks , that makes sense , will do some benchmarking on my end as well to get better understanding if it fits our sla requirements. Can you give me some more details on the

[GitHub] [lucene] mdmarshmallow opened a new issue, #11795: Add FilterDirectory to track write amplification factor

2022-09-20 Thread GitBox
mdmarshmallow opened a new issue, #11795: URL: https://github.com/apache/lucene/issues/11795 ### Description I recently opened another issue to lower the allowable delete percentage in `TieredMergePolicy` [here](https://github.com/apache/lucene/issues/11761). One of the concerns that

[GitHub] [lucene] patelprateek commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
patelprateek commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252655054 @jpountz : I do have some question regarding estimation accuracy , usually what i have seen some sketch data structures being used , which have different error bounds based on

[GitHub] [lucene] jpountz commented on issue #11791: cardinality estimation for query filters

2022-09-20 Thread GitBox
jpountz commented on issue #11791: URL: https://github.com/apache/lucene/issues/11791#issuecomment-1252680394 > Can you give me some more details on the KD Tree , IIRC KD tree were used for n-dim data points like geo-spatial right ? This is correct, we also use them for the 1D case, i

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11790: Mark HNSW search results incomplete when fewer than topK are found

2022-09-20 Thread GitBox
jtibshirani commented on code in PR #11790: URL: https://github.com/apache/lucene/pull/11790#discussion_r975678640 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -267,6 +267,9 @@ private NeighborQueue searchLevel( while (results.size() > t

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11790: Mark HNSW search results incomplete when fewer than topK are found

2022-09-20 Thread GitBox
jtibshirani commented on code in PR #11790: URL: https://github.com/apache/lucene/pull/11790#discussion_r975678640 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -267,6 +267,9 @@ private NeighborQueue searchLevel( while (results.size() > t

[GitHub] [lucene] gsmiller commented on a diff in pull request #11775: Minor refactoring and cleanup to taxonomy index code

2022-09-20 Thread GitBox
gsmiller commented on code in PR #11775: URL: https://github.com/apache/lucene/pull/11775#discussion_r975666927 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FacetLabel.java: ## @@ -120,11 +120,10 @@ public int compareTo(FacetLabel other) { @Override public b

[GitHub] [lucene] stevenschlansker commented on issue #11674: PrimaryNode close waits for replicas to close, but there is no guarantee they ever will [LUCENE-10638]

2022-09-20 Thread GitBox
stevenschlansker commented on issue #11674: URL: https://github.com/apache/lucene/issues/11674#issuecomment-1252817249 We just hit this bug in production again - a Primary node hung forever waiting for a replica that never closed, amplifying an unrelated problem with cluster configuration.

[GitHub] [lucene] mdmarshmallow opened a new pull request, #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-20 Thread GitBox
mdmarshmallow opened a new pull request, #11796: URL: https://github.com/apache/lucene/pull/11796 Added a `WriteAmplificationTrackingDirectoryWrapper` that simply keeps track of bytes flushed and bytes merged. This allows us to get an estimate of the write amplification factor by doing `(by

[GitHub] [lucene] mdmarshmallow commented on issue #11795: Add FilterDirectory to track write amplification factor

2022-09-20 Thread GitBox
mdmarshmallow commented on issue #11795: URL: https://github.com/apache/lucene/issues/11795#issuecomment-1252988900 I'm not sure how PR's get automatically linked to the issue, but here is the PR I created: https://github.com/apache/lucene/pull/11796 -- This is an automated message from t

[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-20 Thread GitBox
mdmarshmallow commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1252993085 Here is the issue I created with a PR attached in case you were interested: https://github.com/apache/lucene/issues/11795 -- This is an automated message from the Apache Git

[GitHub] [lucene] msokolov commented on a diff in pull request #11790: Mark HNSW search results incomplete when fewer than topK are found

2022-09-20 Thread GitBox
msokolov commented on code in PR #11790: URL: https://github.com/apache/lucene/pull/11790#discussion_r975881444 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -267,6 +267,9 @@ private NeighborQueue searchLevel( while (results.size() > topK

[GitHub] [lucene] msokolov closed pull request #11790: Mark HNSW search results incomplete when fewer than topK are found

2022-09-20 Thread GitBox
msokolov closed pull request #11790: Mark HNSW search results incomplete when fewer than topK are found URL: https://github.com/apache/lucene/pull/11790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] gsmiller opened a new pull request, #11797: DrillSideways uses advance instead of next when multiple dims miss

2022-09-20 Thread GitBox
gsmiller opened a new pull request, #11797: URL: https://github.com/apache/lucene/pull/11797 ### Description We can do better than calling `next` in drill sideways when two "sideways" dims miss. When two dims miss, we know that the next candidate doc we need to consider can't come be

[GitHub] [lucene] shaie commented on a diff in pull request #11775: Minor refactoring and cleanup to taxonomy index code

2022-09-20 Thread GitBox
shaie commented on code in PR #11775: URL: https://github.com/apache/lucene/pull/11775#discussion_r976038157 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FacetLabel.java: ## @@ -120,11 +120,10 @@ public int compareTo(FacetLabel other) { @Override public bool

[GitHub] [lucene] LuXugang commented on issue #11773: Could `PointRangeQuery`'s boundary values used for `NumericComparator` to calculate `estimatedNumberOfMatches`

2022-09-20 Thread GitBox
LuXugang commented on issue #11773: URL: https://github.com/apache/lucene/issues/11773#issuecomment-1253245181 > The estimatedNumberOfMatches should still be very close to the actual number Actually `estimatedNumberOfMatches` may far away from the actual number. I wrote a [te