[GitHub] [lucene] benwtrent opened a new pull request, #12130: Fix TestFeatureField#testBasicsNonScoringCase test
benwtrent opened a new pull request, #12130: URL: https://github.com/apache/lucene/pull/12130 Sometimes the random search lucene test searcher will wrap the reader. Consequently, we need to make sure to use the reader provided by the test `IndexSearcher` or the reader may be different between creating the weight with the searcher vs. accessing the leaf context for the scorer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1419161514 @rmuir I grabbed your patch for adding a `ScoreSupplier` to `DocValuesTermsQuery` (#12129) and reran benchmarks. The gap between IndexOrDV and the "self-optimizing" TermInSetQuery have closed with this change. It looks like I was wrong about the way IndexOrDV plans PK-type queries. I thought it was choosing to use doc values based on what I saw in profiler output, but what I was really seeing was the up-front ordinal lookups in `DocValuesTermsQuery` as a result of not having the `ScoreSupplier` abstraction. With your patch, that goes away. The only gap that remains now is when the field is _not_ a PK-style field but the terms being used in the disjunction have a low aggregate cost (relative to the other terms in the field; e.g., `Medium Cardinality + Low Cost Country Code Filter Terms`). In this case, IndexOrDV is always using doc values (due to the field-level stats used for cost), but—by doing some term-seeking—we could better decide to use postings. Here are updated benchmark results: [TiSBenchResults_Simplified_DVSSPatch.md.txt](https://github.com/apache/lucene/files/10663766/TiSBenchResults_Simplified_DVSSPatch.md.txt) (Note that "low cardinality" cases are kind of terrible still because the TiSQuery is being rewritten to a BooleanQuery) > to me the issue is a problem with TermInSetQuery ScorerSupplier cost method +1. Maybe there's a way to address this remaining gap by being smarter about the cost function without term-seeking? That would be ideal. I also played around with the idea of a "cost iterator" abstraction on `ScoreSupplier` as a way to allow something like `TermInSetQuery` to provide incremental costs to `IndexOrDocValuesQuery` as it term-seeks. This feels clunky to me, and I'm not proposing it as a "good idea" right now, but I'll share it as another approach. I was able to get comparable benchmark results with this technique, and it still allows `IndexOrDocValuesQuery` to "own" the decision between postings and doc values: https://github.com/apache/lucene/compare/main...gsmiller:lucene:explore/tis-score-supplier-cost-iterator. Benchmark results for this approach are here: [TiSBenchResults_SSIterator.md.txt](https://github.com/apache/lucene/files/10663947/TiSBenchResults_SSIterator.md.txt). It feels overly complicated though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available
rmuir commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1419214500 That's good that it made progress. I will look more into it tonight. I want to get these patches landed to simplify benchmarking. its true there is one benchmark where this combined query does better ("Medium Cardinality + Low Cost Country Code Filter Terms") but there is also one benchmark where it does substantially worse ("Low Cardinality + High Cost Country Code Filter Terms"). so net/net i would say they are equivalent. But I will look into this case to see if we can still do better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent merged pull request #12130: Fix TestFeatureField#testBasicsNonScoringCase test
benwtrent merged PR #12130: URL: https://github.com/apache/lucene/pull/12130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12131: Port over gradle setting generator from Solr
uschindler opened a new pull request, #12131: URL: https://github.com/apache/lucene/pull/12131 In Apache Solr we improved the local settings generation to be done directly in gardlew startup (similar to gradle downloader). This has several positive effects: - We can do our Github CI and Jenkins checks in one go, as the file is now generated before gradle even starts, so the build will succeed on first run. - The template file is editable by committers without going into script files. Number of processors for threads is inserted by templating See https://github.com/apache/solr/pull/1320 and https://issues.apache.org/jira/browse/SOLR-16641 for details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available
jpountz commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1097579916 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -380,21 +431,28 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti // cost estimates. final long cost; final long queryTermsCount = termData.size(); -long potentialExtraCost = indexTerms.getSumDocFreq(); +final long sumDocFreq = indexTerms.getSumDocFreq(); +long potentialExtraCost = sumDocFreq; final long indexedTermCount = indexTerms.size(); if (indexedTermCount != -1) { potentialExtraCost -= indexedTermCount; } cost = queryTermsCount + potentialExtraCost; +final boolean isPrimaryKeyField = indexedTermCount != -1 && sumDocFreq == indexedTermCount; Review Comment: Since `terms.size()` is an optional index statistic, maybe check `sumDocFreq == docCount` instead? ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -258,13 +271,41 @@ public Matches matches(LeafReaderContext context, int doc) throws IOException { * On the given leaf context, try to either rewrite to a disjunction if there are few matching * terms, or build a bitset containing matching docs. */ - private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { + private WeightOrDocIdSet rewrite( + LeafReaderContext context, long leadCost, boolean isPrimaryKeyField, DocValuesType dvType) + throws IOException { final LeafReader reader = context.reader(); Terms terms = reader.terms(field); if (terms == null) { return null; } + +long costThreshold = Long.MAX_VALUE; +if (dvType == DocValuesType.SORTED || dvType == DocValuesType.SORTED_SET) { + // Establish a threshold for switching to doc values. Give postings a significant + // advantage for the primary-key case, since many of the primary-key terms may not + // actually be in this segment. The 8x factor is arbitrary, based on IndexOrDVQuery, + // but has performed well in benchmarks: + costThreshold = isPrimaryKeyField ? leadCost << 3 : leadCost; + + if (termData.size() > costThreshold) { +// If the number of terms is > the number of candidates, DV should perform better. Review Comment: I wonder if this is right given that the doc-values query still eagerly evaluates all terms against the terms dictionary. For this to work correctly, we'd need a query that looks up terms lazily rather than eagerly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12116: Improve document API for stored fields.
jpountz merged PR #12116: URL: https://github.com/apache/lucene/pull/12116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] colvinco commented on pull request #12131: Port over gradle setting generator from Solr
colvinco commented on PR #12131: URL: https://github.com/apache/lucene/pull/12131#issuecomment-1419339752 There's another reference in smokeTestRelease.py https://github.com/apache/lucene/blob/8df59fc878795dd94e10d4c15a7bc4f1a919843b/dev-tools/scripts/smokeTestRelease.py#L612-L613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] colvinco closed pull request #12123: Generate gradle.properties from gradlew
colvinco closed pull request #12123: Generate gradle.properties from gradlew URL: https://github.com/apache/lucene/pull/12123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available
gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1419344879 > but there is also one benchmark where it does substantially worse ("Low Cardinality + High Cost Country Code Filter Terms"). 100%. The issue here is that `TermInSetQuery` gets rewritten to a `BooleanQuery` because there are fewer than 16 terms, so it doesn't have a chance to "self-optimize" to use doc values. We can fix this by not eagerly rewriting to a `BooleanQuery`, but I held off doing that for now. So this is "easily" fixable I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12131: Port over gradle setting generator from Solr
uschindler commented on PR #12131: URL: https://github.com/apache/lucene/pull/12131#issuecomment-1419364815 > There's another reference in smokeTestRelease.py > > https://github.com/apache/lucene/blob/8df59fc878795dd94e10d4c15a7bc4f1a919843b/dev-tools/scripts/smokeTestRelease.py#L612-L613 Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: Modify TermInSetQuery to "self optimize" if doc values are available
gsmiller commented on code in PR #12089: URL: https://github.com/apache/lucene/pull/12089#discussion_r1097623311 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -258,13 +271,41 @@ public Matches matches(LeafReaderContext context, int doc) throws IOException { * On the given leaf context, try to either rewrite to a disjunction if there are few matching * terms, or build a bitset containing matching docs. */ - private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { + private WeightOrDocIdSet rewrite( + LeafReaderContext context, long leadCost, boolean isPrimaryKeyField, DocValuesType dvType) + throws IOException { final LeafReader reader = context.reader(); Terms terms = reader.terms(field); if (terms == null) { return null; } + +long costThreshold = Long.MAX_VALUE; +if (dvType == DocValuesType.SORTED || dvType == DocValuesType.SORTED_SET) { + // Establish a threshold for switching to doc values. Give postings a significant + // advantage for the primary-key case, since many of the primary-key terms may not + // actually be in this segment. The 8x factor is arbitrary, based on IndexOrDVQuery, + // but has performed well in benchmarks: + costThreshold = isPrimaryKeyField ? leadCost << 3 : leadCost; + + if (termData.size() > costThreshold) { +// If the number of terms is > the number of candidates, DV should perform better. Review Comment: I'm not sure actually. The up-front term-seeking you refer to is certainly a cost, but it doesn't scale with the number of lead hits. So this can still be cheaper. But also, +1 to the idea of trying out on-demand term seeking for these situations! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12127: Remove useless abstractions in DocValues-based queries
rmuir merged PR #12127: URL: https://github.com/apache/lucene/pull/12127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12128: Speed up docvalues set query by making use of sortedness
rmuir merged PR #12128: URL: https://github.com/apache/lucene/pull/12128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12054: Introduce a new `KeywordField`.
jpountz commented on PR #12054: URL: https://github.com/apache/lucene/pull/12054#issuecomment-1419458041 I updated this PR to - add a `Field.Store` parameter to the constructor that does not rely on Field's guessing - update the demo to pass Field.Store.YES as a value for this parameter instead of adding a separate StoredField - added a `newSetQuery` that creates a `TermInSetQuery` and hopefully soon benefits from @gsmiller 's optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12129: Speedup sandbox/DocValuesTermsQuery
rmuir merged PR #12129: URL: https://github.com/apache/lucene/pull/12129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1097736939 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.util.Collection; +import java.util.Objects; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.ConstantScoreQuery; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SortField; +import org.apache.lucene.search.SortedSetSelector; +import org.apache.lucene.search.SortedSetSortField; +import org.apache.lucene.search.TermInSetQuery; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.util.BytesRef; + +/** + * Field that indexes a per-document String or {@link BytesRef} into an inverted index for fast + * filtering, stores values in a columnar fashion using {@link DocValuesType#SORTED_SET} doc values + * for sorting and faceting, and optionally stores values as stored fields for top-hits retrieval. + * This field does not support scoring: queries produce constant scores. If you also need to store Review Comment: We can nuke this sentence about "if you also need to store the value" now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #12065: Update copyright year in NOTICE.txt file.
janhoy commented on PR #12065: URL: https://github.com/apache/lucene/pull/12065#issuecomment-1419513356 Intereting find. At least we don't include years in every single file as some projects do, so not a huge burden and we are not obliged to keep or remove years, we can do as we want. It's not a big deal to me, but I think I lean towards keeping only the year of initial publication, as proposed [here](https://matija.suklje.name/how-and-why-to-properly-write-copyright-statements-in-your-code#why-keep-the-year) and by Roy Fielding [here](https://daniel.haxx.se/blog/2023/01/08/copyright-without-years/#comment-26544). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.
rmuir commented on code in PR #12054: URL: https://github.com/apache/lucene/pull/12054#discussion_r1097738909 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.util.Collection; +import java.util.Objects; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.ConstantScoreQuery; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SortField; +import org.apache.lucene.search.SortedSetSelector; +import org.apache.lucene.search.SortedSetSortField; +import org.apache.lucene.search.TermInSetQuery; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.util.BytesRef; + +/** + * Field that indexes a per-document String or {@link BytesRef} into an inverted index for fast + * filtering, stores values in a columnar fashion using {@link DocValuesType#SORTED_SET} doc values + * for sorting and faceting, and optionally stores values as stored fields for top-hits retrieval. + * This field does not support scoring: queries produce constant scores. If you also need to store + * the value, you should add a separate {@link StoredField} instance. If you need more fine-grained + * control you can use {@link StringField}, {@link SortedDocValuesField} or {@link + * SortedSetDocValuesField}, and {@link StoredField}. + * + * This field defines static factory methods for creating common query objects: + * + * + * {@link #newExactQuery} for matching a value. + * {@link #newSetQuery} for matching any of the values coming from a set. + * {@link #newSortField} for matching a value. + * + */ +public class KeywordField extends Field { + + private static final FieldType FIELD_TYPE = new FieldType(); + private static final FieldType FIELD_TYPE_STORED; + + static { +FIELD_TYPE.setIndexOptions(IndexOptions.DOCS); +FIELD_TYPE.setOmitNorms(true); +FIELD_TYPE.setTokenized(false); +FIELD_TYPE.setDocValuesType(DocValuesType.SORTED_SET); +FIELD_TYPE.freeze(); + +FIELD_TYPE_STORED = new FieldType(FIELD_TYPE); +FIELD_TYPE_STORED.setStored(true); +FIELD_TYPE_STORED.freeze(); + } + + private final StoredValue storedValue; + + /** + * Creates a new KeywordField. + * + * @param name field name + * @param value the BytesRef value + * @param stored whether to store the field + * @throws IllegalArgumentException if the field name or value is null. + */ + public KeywordField(String name, BytesRef value, Store stored) { +super(name, value, stored == Field.Store.YES ? FIELD_TYPE_STORED : FIELD_TYPE); +if (stored == Store.YES) { + storedValue = new StoredValue(value); +} else { + storedValue = null; +} + } + + /** + * Creates a new KeywordField from a String value, by indexing its UTF-8 representation. + * + * @param name field name + * @param value the BytesRef value + * @param stored whether to store the field + * @throws IllegalArgumentException if the field name or value is null. + */ + public KeywordField(String name, String value, Store stored) { +super(name, value, stored == Field.Store.YES ? FIELD_TYPE_STORED : FIELD_TYPE); +if (stored == Store.YES) { + storedValue = new StoredValue(value); +} else { + storedValue = null; +} + } + + @Override + public BytesRef binaryValue() { +BytesRef binaryValue = super.binaryValue(); +if (binaryValue != null) { + return binaryValue; +} else { + return new BytesRef(stringValue()); +} + } + + @Override + public void setStringValue(String value) { +super.setStringValue(value); +if (storedValue != null) { + storedValue.setStringValue(value); +} + } + + @Override + public void setBytesValue(BytesRef value) { +super.setBytesValue(value); +if (storedValue != null) { + storedValue.setBinaryValue(value); +} + } + + @Override + public StoredValue storedValue() { +return storedValue; + } + + /** + * Create a query for matching an exact {@link BytesRef} valu
[GitHub] [lucene] uschindler commented on pull request #12123: Generate gradle.properties from gradlew
uschindler commented on PR #12123: URL: https://github.com/apache/lucene/pull/12123#issuecomment-1419549026 Oh I did not see that PR. Sorry created a duplicate! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12123: Generate gradle.properties from gradlew
uschindler commented on PR #12123: URL: https://github.com/apache/lucene/pull/12123#issuecomment-141914 See #12131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12131: Port over gradle setting generator from Solr
uschindler commented on PR #12131: URL: https://github.com/apache/lucene/pull/12131#issuecomment-1419575660 Hi @colvinco, I merged your PR into my branch and found only a small difference in the windows script, which I fixed. Not sure why Solr did not apply the JAVA_OPTS for the generator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12131: Port over gradle setting generator from Solr
uschindler merged PR #12131: URL: https://github.com/apache/lucene/pull/12131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #12132: Implement ScorerSupplier for Sorted(Set)DocValuesField#newSlowRangeQuery
rmuir opened a new pull request, #12132: URL: https://github.com/apache/lucene/pull/12132 Similar to use of ScorerSupplier in #12129, implement it here too, because creation of a Scorer requires `lookupTerm()` operations in the DV terms dictionary. This results in wasted effort/random accesses, if, based on the cost(), IndexOrDocValuesQuery decides not to use this query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12133: Simplify LongHashSet by completely removing java.util.Set APIs
uschindler opened a new pull request, #12133: URL: https://github.com/apache/lucene/pull/12133 Instead return LongStream for toString() and testing (and possible other use-cases) This is a followup of @rmuir's PR #12128 and trashes even more code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12133: Simplify LongHashSet by completely removing java.util.Set APIs
uschindler merged PR #12133: URL: https://github.com/apache/lucene/pull/12133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12134: Add tests for size() and contains() to LongHashSet
uschindler opened a new pull request, #12134: URL: https://github.com/apache/lucene/pull/12134 Another followup for #12128: Due to previously only testing the `java.util.Set` interface, the actual testing code never verified that `size()` and the actual call to `contains(long)` worked correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12134: Add tests for size() and contains() to LongHashSet
uschindler commented on PR #12134: URL: https://github.com/apache/lucene/pull/12134#issuecomment-1419932992 I found a bug, first test works, second one does not work: ```java public void testSameValue() { LongHashSet set2 = new LongHashSet(new long[] {42L, 42L}); assertEquals(1, set2.size()); assertEquals(42L, set2.minValue); assertEquals(42L, set2.maxValue); } public void testSameMissingPlaceholder() { LongHashSet set2 = new LongHashSet(new long[] {Long.MIN_VALUE, Long.MIN_VALUE}); assertEquals(1, set2.size()); assertEquals(Long.MIN_VALUE, set2.minValue); assertEquals(Long.MIN_VALUE, set2.maxValue); } ``` The problem is that `MISSING` is counted twice, because it is not added to the hashtable and handled separately in ctor. The fix is easy... Will commit a fix, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jimmykobe1171 commented on a diff in pull request #12126: Refactor part of IndexFileDeleter and ReplicaFileDeleter into a common utility class
jimmykobe1171 commented on code in PR #12126: URL: https://github.com/apache/lucene/pull/12126#discussion_r1098024345 ## lucene/replicator/src/java/org/apache/lucene/replicator/nrt/CopyJob.java: ## @@ -206,7 +206,7 @@ private synchronized void _transferAndCancel(CopyJob prevJob) throws IOException if (Node.VERBOSE_FILES) { dest.message("remove partial file " + prevJob.current.tmpName); } - dest.deleter.deleteNewFile(prevJob.current.tmpName); + dest.deleter.deleteIfNoRef(prevJob.current.tmpName); Review Comment: Seems like **deleteIfNoRef** is always safer than **deleteNewFile**. Do we still need the method deleteNewFile? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12134: Add tests for size() and contains() to LongHashSet
uschindler commented on PR #12134: URL: https://github.com/apache/lucene/pull/12134#issuecomment-1419949170 Fixed. Actually code is better readable now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12134: Add tests for size() and contains() to LongHashSet
uschindler merged PR #12134: URL: https://github.com/apache/lucene/pull/12134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on issue #11428: Handle soft deletes via LiveDocsFormat [LUCENE-10392]
zacharymorn commented on issue #11428: URL: https://github.com/apache/lucene/issues/11428#issuecomment-1420073239 Thanks @dnhatn @rmuir @s1monw for the additional information! Yeah I can see now how changing it to use liv doc and not relying on an explicit field, will potentially require changes to the `softUpdateDocument` API to differentiate between "regular" doc vs. tombstone doc, and also make liv doc format itself and its usage more complicated (indeed nothing can beat bitset in terms of simplicity!). I'll pause exploring on this from my end then, but will be happy to work on it further if there's any preference change down the road. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org