[GitHub] [lucene] rmuir commented on pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

2023-01-10 Thread GitBox
rmuir commented on PR #12064: URL: https://github.com/apache/lucene/pull/12064#issuecomment-1377152361 > I agree. We need to address this. Makes me wonder about the work done here: #10177. Seems promising, though the cost of flush increases (because of clustering), but the data structure se

[GitHub] [lucene] rmuir commented on issue #12069: Long rewrite times for deeply nested, non-scoring Boolean queries

2023-01-10 Thread GitBox
rmuir commented on issue #12069: URL: https://github.com/apache/lucene/issues/12069#issuecomment-1377193266 As long as we don't trade off a single nanosecond of performance for *real use-cases* for this garbage... -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] rmuir commented on issue #12067: Getting exception on search after upgrading to Lucene 9.4

2023-01-10 Thread GitBox
rmuir commented on issue #12067: URL: https://github.com/apache/lucene/issues/12067#issuecomment-137713 If you use a different field name for the lowercased version, then you won't have an issue. For example `description_lowercase`. The problem is that you call it `description` s

[GitHub] [lucene] rmuir commented on issue #12067: Getting exception on search after upgrading to Lucene 9.4

2023-01-10 Thread GitBox
rmuir commented on issue #12067: URL: https://github.com/apache/lucene/issues/12067#issuecomment-1377207063 @vstrout You may also use this setter as a quick-fix: https://github.com/apache/lucene/blob/branch_9x/lucene/core/src/java/org/apache/lucene/search/SortField.java#L636-L657 But

[GitHub] [lucene] jpountz commented on issue #12068: Is it right to throttle the creation of compound files?

2023-01-10 Thread GitBox
jpountz commented on issue #12068: URL: https://github.com/apache/lucene/issues/12068#issuecomment-1377296297 I wanted to see if any tests would fail if I would remove throttling when creating compound files, and none did, so I opened a PR. I'm still interested in thoughts regarding whether

[GitHub] [lucene] jpountz opened a new pull request, #12070: Never throttle creation of compound files.

2023-01-10 Thread GitBox
jpountz opened a new pull request, #12070: URL: https://github.com/apache/lucene/pull/12070 `ConcurrentMergeScheduler` uses the rate at which a merge writes bytes as a proxy for CPU usage, in order to prevent merging from disrupting searches too much. However creating compound files are lig

[GitHub] [lucene] jpountz opened a new issue, #12071: Can we better take advantage of compact strings?

2023-01-10 Thread GitBox
jpountz opened a new issue, #12071: URL: https://github.com/apache/lucene/issues/12071 ### Description There's a non-negligible time that we spend on UTF-16 / UTF-8 conversions using our own `UnicodeUtil`, e.g. via the `BytesRef(String)` constructor. But since the introduction of com

[GitHub] [lucene] benwtrent opened a new pull request, #12072: Fix exponential runtime for Boolean#rewrite

2023-01-10 Thread GitBox
benwtrent opened a new pull request, #12072: URL: https://github.com/apache/lucene/pull/12072 When https://github.com/apache/lucene/pull/672 was introduced, it added many nice rewrite optimizations. However, in the case when there are many multiple nested `Boolean` queries under a top level

[GitHub] [lucene] benwtrent commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-10 Thread GitBox
benwtrent commented on code in PR #12072: URL: https://github.com/apache/lucene/pull/12072#discussion_r1065950478 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -203,9 +203,17 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) throws IOExce

[GitHub] [lucene] benwtrent commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-10 Thread GitBox
benwtrent commented on PR #12072: URL: https://github.com/apache/lucene/pull/12072#issuecomment-1377475349 @jpountz You probably want to review this one as it relates to your original optimizations. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [lucene] jpountz commented on a diff in pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

2023-01-10 Thread GitBox
jpountz commented on code in PR #12064: URL: https://github.com/apache/lucene/pull/12064#discussion_r1066124189 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [lucene] benwtrent commented on a diff in pull request #12064: Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

2023-01-10 Thread GitBox
benwtrent commented on code in PR #12064: URL: https://github.com/apache/lucene/pull/12064#discussion_r1066187876 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] gsmiller opened a new pull request, #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-10 Thread GitBox
gsmiller opened a new pull request, #12073: URL: https://github.com/apache/lucene/pull/12073 ### Description When the exclusion scorer in `ReqExclScorer` has no second-phase check, we can move the exclusion checking into the first phase fairly easily. This intuitively seems like the

[GitHub] [lucene] rmuir commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-10 Thread GitBox
rmuir commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378122618 But the benchmark checks only the best case scenario here... TermQuery? What about more expensive excl? e.g. a wildcard or something (with the ConstantScoreScorer that has no two-phase)

[GitHub] [lucene] rmuir commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-10 Thread GitBox
rmuir commented on PR #12073: URL: https://github.com/apache/lucene/pull/12073#issuecomment-1378128948 and maybe wildcard query isn't great example either: confirming matches is gonna be fast since its a bitset. But I am still suspicious of the logic here, I don't think we can infer