[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
jpountz commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047526553 In my opinion the API as it is today isn't bad. The only thing we might want to change is to make `DocIdSetBuilder#grow` take a long instead of an int. Maybe it's a javadocs issue because `DocIdSetBuilder#grow` says that it returns "a `BulkAdder` object that can be used to add up to `numDocs` documents", which might suggest that `numDocs` is the number of unique documents contributed, when in fact this number is simply an upper bound of the number of times that you may call `BulkAdder#add` on the returned `BulkAdder` object. > I'm still a bit confused about why we need to grow(long) on a bitset that can only hold Integer.MAX_VALUE elements. This doesn't have anything to do with the `long counter` that you looked at. The point of `BulkAdder#add` is to call it every time we find a matching (docID, value) pair, and the number of matching pairs may be larger than `Integer#MAX_VALUE` (e.g. a range over a multi-valued field that matches all docs but one), hence the long. This is the same reason why e.g. `SortedSetDocValues#nextOrd` returns a long. > in the sparse/buffer case, wouldn't a much simpler estimation simply be the length of int array? This is already the case today, see the `else` block in `DocIdSetBuilder#build`. The cost estimation logic only happens in the dense case when a `FixedBitSet` is used to hold the set of matching docs. FWIW we could change the estimation logic to perform a popCount over a subset of the `FixedBitSet` and scale it according to the size of the bitset or something along these lines, if we think that it would be better than tracking this counter and dividing it by the number of values per doc. > I'm also confused why we have this sorted array buffer case instead of using SparseFixedBitSet `SparseFixedBitSet` is the right choice for the sparse case when you need something that implements the `BitSet` API. Here we only need to produce a `DocIdSet` and buffering doc IDs into an array and sorting them using radix sort proved to be faster than accumulating doc IDs into a `SparseFixedBitSet`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #698: LUCENE-10429: Change how DocIdSetBuilder compute the cost of the dense iterator
jpountz commented on pull request #698: URL: https://github.com/apache/lucene/pull/698#issuecomment-1047529128 > This is inconsistent with the #grow method where the counter is increased as it expects grow to be called for documents and no values. Actually my expectation is that `grow()` is called with a number of values, not unique documents. Javadocs say "documents" today, which might be a source of confusion, but it is really an upper bound of the number of times `BulkAdder#add` may be called, ie. an upper bound of the number of matching *values*? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495945#comment-17495945 ] Adrien Grand commented on LUCENE-10424: --- With the linked pull request, we limit this new case to single-valued 1D fields, but it actually works with fields that have multiple dimensions and/or that are multi-valued? > Optimize the "everything matches" case for count query in PointRangeQuery > - > > Key: LUCENE-10424 > URL: https://issues.apache.org/jira/browse/LUCENE-10424 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In Implement of Weight#count in PointRangeQuery, Whether additional > consideration is needed that when PointValues#getDocCount() == > IndexReader#maxDoc() and the range's lower bound is less that the field's min > value and the range's upper bound is greater than the field's max value, then > return reader.maxDoc() directly? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #698: LUCENE-10429: Change how DocIdSetBuilder compute the cost of the dense iterator
iverase commented on pull request #698: URL: https://github.com/apache/lucene/pull/698#issuecomment-1047575915 > Actually my expectation is that grow() is called with a number of values, not unique documents. Then it is wrong that accepts an int and should accept a long? which is what Robert complains about -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()
Michael Bien created LUCENE-10431: - Summary: AssertionError in BooleanQuery.hashCode() Key: LUCENE-10431 URL: https://issues.apache.org/jira/browse/LUCENE-10431 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.11.1 Reporter: Michael Bien Hello devs, the constructor of BooleanQuery can under some circumstances trigger a hash code computation before "clauseSets" is fully filled. Since BooleanClause is using its query field for the hash code too, it can happen that the "wrong" hash code is stored, since adding the clause to the set triggers its hashCode(). If assertions are enabled the check in BooleanQuery, which recomputes the hash code, will notice it and throw an error. exception: {code:java} java.lang.AssertionError at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614) at java.base/java.util.Objects.hashCode(Objects.java:103) at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298) at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527) at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119) at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717) at java.base/java.util.EnumMap.hashCode(EnumMap.java:709) at java.base/java.util.Arrays.hashCode(Arrays.java:4498) at java.base/java.util.Objects.hash(Objects.java:133) at org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597) at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611) at java.base/java.util.HashMap.hash(HashMap.java:340) at java.base/java.util.HashMap.put(HashMap.java:612) at org.apache.lucene.search.Multiset.add(Multiset.java:82) at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154) at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42) at org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133) {code} I noticed this while trying to upgrade the NetBeans maven indexer modules from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r811815617 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); Review comment: Yes. @romseygeek Do you think it could make sense using the purge executor here too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r811817151 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + long search(final Query query, QueryCollector matcher) throws IOException { +QueryBuilder builder = termFilter -> query; +return search(builder, matcher); + } + + @Override + public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws IOException { +IndexSearcher searcher = null; +try { + searcher = manager.acquire(); + return searchInMemory(queryBuilder, matcher, searcher, this.queries); +} finally { + if (searcher != null) { +manager.release(searcher); + } +} + } + + @Override + void purgeCache(CachePopulator populator) throws IOException { +final ConcurrentMap newCache = new ConcurrentHashMap<>(); Review comment: True, but then we have to assign iot to `queries` that it is in the abstract class and it is concurrent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori
[ https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496032#comment-17496032 ] ASF subversion and git services commented on LUCENE-10416: -- Commit c22d6d09d9b9b9d44fd88e886ed3105c5a927a63 in lucene's branch refs/heads/branch_9x from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c22d6d0 ] Revert "LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori" This reverts commit b2b35964663bfbf2063884d7dcda6818d5b590e1. > Update Korean Dictionary for Nori > - > > Key: LUCENE-10416 > URL: https://issues.apache.org/jira/browse/LUCENE-10416 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Uihyun Kim >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10416.patch > > > For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, > which is available under an Apache license here: > [https://bitbucket.org/eunjeon/mecab-ko-dic] > > The dictionary hasn't been updated in Nori although it has some updates to > provide better analysis results. Downloading is available here: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads] > * Currently used in Nori: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > * Latest: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > > There are changes between the currently used version and the latest release > version(change log: > [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md]) > * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태 > * Fix: correct unexpectedly huge cost on NNG/장소 > * New words > > There's no issue with testing :lucene:analysis:nori:test and building a new > binary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori
[ https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496034#comment-17496034 ] ASF subversion and git services commented on LUCENE-10416: -- Commit f8040d565fc25c6b7388d9300c2cc890315bc9cd in lucene's branch refs/heads/main from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f8040d5 ] LUCENE-10416: move changes entry to v10.0.0 > Update Korean Dictionary for Nori > - > > Key: LUCENE-10416 > URL: https://issues.apache.org/jira/browse/LUCENE-10416 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Uihyun Kim >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10416.patch > > > For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, > which is available under an Apache license here: > [https://bitbucket.org/eunjeon/mecab-ko-dic] > > The dictionary hasn't been updated in Nori although it has some updates to > provide better analysis results. Downloading is available here: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads] > * Currently used in Nori: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > * Latest: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > > There are changes between the currently used version and the latest release > version(change log: > [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md]) > * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태 > * Fix: correct unexpectedly huge cost on NNG/장소 > * New words > > There's no issue with testing :lucene:analysis:nori:test and building a new > binary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10416) Update Korean Dictionary for Nori
[ https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10416: --- Fix Version/s: (was: 9.1) > Update Korean Dictionary for Nori > - > > Key: LUCENE-10416 > URL: https://issues.apache.org/jira/browse/LUCENE-10416 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Uihyun Kim >Priority: Minor > Fix For: 10.0 (main) > > Attachments: LUCENE-10416.patch > > > For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, > which is available under an Apache license here: > [https://bitbucket.org/eunjeon/mecab-ko-dic] > > The dictionary hasn't been updated in Nori although it has some updates to > provide better analysis results. Downloading is available here: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads] > * Currently used in Nori: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > * Latest: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > > There are changes between the currently used version and the latest release > version(change log: > [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md]) > * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태 > * Fix: correct unexpectedly huge cost on NNG/장소 > * New words > > There's no issue with testing :lucene:analysis:nori:test and building a new > binary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori
[ https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496038#comment-17496038 ] Tomoko Uchida commented on LUCENE-10416: I'd revert it from the 9x branch since I can't estimate the impact. It'd be easy to backport this again to 9x. Let me know if you'd like to have this in 9.1. > Update Korean Dictionary for Nori > - > > Key: LUCENE-10416 > URL: https://issues.apache.org/jira/browse/LUCENE-10416 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Uihyun Kim >Priority: Minor > Fix For: 10.0 (main) > > Attachments: LUCENE-10416.patch > > > For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, > which is available under an Apache license here: > [https://bitbucket.org/eunjeon/mecab-ko-dic] > > The dictionary hasn't been updated in Nori although it has some updates to > provide better analysis results. Downloading is available here: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads] > * Currently used in Nori: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > * Latest: > [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz] > > There are changes between the currently used version and the latest release > version(change log: > [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md]) > * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태 > * Fix: correct unexpectedly huge cost on NNG/장소 > * New words > > There's no issue with testing :lucene:analysis:nori:test and building a new > binary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r811871746 ## File path: lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.nio.file.Path; +import java.util.Collections; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.core.WhitespaceAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.index.IndexNotFoundException; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.store.FSDirectory; +import org.junit.Test; + +public class TestMonitorReadonly extends MonitorTestBase { + private static final Analyzer ANALYZER = new WhitespaceAnalyzer(); + + @Test + public void testReadonlyMonitorThrowsOnInexistentIndex() { +Path indexDirectory = createTempDir(); +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +assertThrows( +IndexNotFoundException.class, +() -> { + new Monitor(ANALYZER, config); +}); + } + + @Test + public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws IOException { +Path indexDirectory = createTempDir(); +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setIndexPath( +indexDirectory, MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +// this will create the index +Monitor writeMonitor = new Monitor(ANALYZER, writeConfig); +writeMonitor.close(); + +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +try (Monitor monitor = new Monitor(ANALYZER, config)) { + assertThrows( + IllegalStateException.class, + () -> { +TermQuery query = new TermQuery(new Term(FIELD, "test")); +monitor.register( +new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + }); + + assertThrows( + IllegalStateException.class, + () -> { +monitor.deleteById("query1"); + }); + + assertThrows( + IllegalStateException.class, + () -> { +monitor.clear(); + }); +} + } + + @Test + public void testSettingCustomDirectory() throws IOException { +Path indexDirectory = createTempDir(); +Document doc = new Document(); +doc.add(newTextField(FIELD, "This is a Foobar test document", Field.Store.NO)); + +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) { + TermQuery query = new TermQuery(new Term(FIELD, "test")); + writeMonitor.register( + new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar")); + writeMonitor.register( + new MonitorQuery("query2", query2, query.toString(), Collections.emptyMap())); + MatchingQueries matches = writeMonitor.match(doc, QueryMatch.SIMPLE_MATCHER); + assertNotNull(matches.getMatches()); + assertEquals(2, matches.getMatchCount()); + assertNotNull(matches.matches("query2")); +} + } + + public void testMonitorReadOnlyCouldReadOnTheSameIndex() throws IOException { +Path indexDirectory = createTempDir(); +Document doc = new Document(); +doc.add(newTextField(FIELD, "This is a te
[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
rmuir commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047729980 > In my opinion the API as it is today isn't bad. The only thing we might want to change is to make `DocIdSetBuilder#grow` take a long instead of an int. I've really tried, I think I have to just give up. Having a `grow(long)` on something with `DocIdSet` in its name is beyond bad, it is terrible. Please, please, please don't make this change to take a long. > > I'm still a bit confused about why we need to grow(long) on a bitset that can only hold Integer.MAX_VALUE elements. > > This doesn't have anything to do with the `long counter` that you looked at. > > The point of `BulkAdder#add` is to call it every time we find a matching (docID, value) pair, and the number of matching pairs may be larger than `Integer#MAX_VALUE` (e.g. a range over a multi-valued field that matches all docs but one), hence the long. This is the same reason why e.g. `SortedSetDocValues#nextOrd` returns a long. Sure it does. I'm looking at the only code using the 64-bit value, and that's the `counter`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
jpountz commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047748712 > Having a grow(long) on something with DocIdSet in its name is beyond bad, it is terrible. Would it look better if we gave it a different name that doesn't suggest that it relates to the number of docs in the set, e.g. `prepareAdd` or something along these lines? > Please, please, please don't make this change to take a long. I have a preference for making it a long but I'm ok with keeping it an integer. The downside is that it pushes the problem to callers, which need to make sure that they never add more than `Integer.MAX_VALUE` documents with the same `BulkAdder`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
rmuir commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047750276 https://user-images.githubusercontent.com/504194/155133007-71ec1d81-a2bd-485d-b7e6-17a10cd78fdf.png";> I've uploaded a screenshot here of how the only thing using 64-bits is this stupid `counter`. Guys, we really have to agree on this simple fact to proceed. It is a fact! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
rmuir commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752932 Yeah, there seems to be some disagreement about what the code is actually doing. Probably because it is too confusing. Recommend (as i did before) to temporarily remove `counter` and cost estimation from here. Then you will see that 64 bits is not needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
iverase commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752116 I don't understand all this discussion. Looking at the cost of a DocIdSetIterator: ``` /** * Returns the estimated cost of this {@link DocIdSetIterator}. * * This is generally an upper bound of the number of documents this iterator might match, but * may be a rough heuristic, hardcoded value, or otherwise completely inaccurate. */ public abstract long cost(); ``` Why it is ok a long here? I think the dance we are doing on the BKD reader when wee are visiting more that Integer.MAX_VALUE documents is wrong and should be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase edited a comment on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
iverase edited a comment on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752116 I don't understand all this discussion. Looking at the cost of a DocIdSetIterator: ``` /** * Returns the estimated cost of this {@link DocIdSetIterator}. * * This is generally an upper bound of the number of documents this iterator might match, but * may be a rough heuristic, hardcoded value, or otherwise completely inaccurate. */ public abstract long cost(); ``` Why it is ok a long here? I think the dance we are doing on the BKD reader when wee are visiting more that Integer.MAX_VALUE ~documents~ points is wrong and should be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
iverase commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047786510 If you go a bit higher top in that class: https://user-images.githubusercontent.com/29038686/155139791-fb87fedb-22a0-44a7-86a6-60b6af84f177.png";> We are throwing 32 bits there now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
rmuir commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047788256 it's fine to do that since only 32 bits are needed. nothing uses 64-bits here, hence changing the api signature to a `long` is wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms
rmuir commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1047789904 Seriously, let's remove this `counter` and cost estimation. @jpountz tells me I am wrong, but you can plainly see from the code, this issue is all about that. Everything else is only using 32 bits. If we remove the silly `counter` and bad cost estimator, it will be clear that adding a `long` to this API is not needed: nothing needs the extra 32 bits, nothing uses the extra 32 bits! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496168#comment-17496168 ] Lu Xugang commented on LUCENE-10424: ??but it actually works with fields that have multiple dimensions and/or that are multi-valued?? Yes, but I am not sure why in the implementation of Weight#count , only 1D fields case was considered, it seems count query can work on multi dimensions, please tell me if I missed something. ??we limit this new case to single-valued 1D fields?? If so, maybe we should support multi dimensions in Weight#count? > Optimize the "everything matches" case for count query in PointRangeQuery > - > > Key: LUCENE-10424 > URL: https://issues.apache.org/jira/browse/LUCENE-10424 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In Implement of Weight#count in PointRangeQuery, Whether additional > consideration is needed that when PointValues#getDocCount() == > IndexReader#maxDoc() and the range's lower bound is less that the field's min > value and the range's upper bound is greater than the field's max value, then > return reader.maxDoc() directly? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496168#comment-17496168 ] Lu Xugang edited comment on LUCENE-10424 at 2/22/22, 3:22 PM: -- {quote}but it actually works with fields that have multiple dimensions and/or that are multi-valued{quote} Yes, but I am not sure why in the implementation of Weight#count , only 1D fields case was considered, it seems count query can work on multi dimensions, please tell me if I missed something. {quote}we limit this new case to single-valued 1D fields{quote} If so, maybe we should support multi dimensions in Weight#count? was (Author: chrislu): ??but it actually works with fields that have multiple dimensions and/or that are multi-valued?? Yes, but I am not sure why in the implementation of Weight#count , only 1D fields case was considered, it seems count query can work on multi dimensions, please tell me if I missed something. ??we limit this new case to single-valued 1D fields?? If so, maybe we should support multi dimensions in Weight#count? > Optimize the "everything matches" case for count query in PointRangeQuery > - > > Key: LUCENE-10424 > URL: https://issues.apache.org/jira/browse/LUCENE-10424 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In Implement of Weight#count in PointRangeQuery, Whether additional > consideration is needed that when PointValues#getDocCount() == > IndexReader#maxDoc() and the range's lower bound is less that the field's min > value and the range's upper bound is greater than the field's max value, then > return reader.maxDoc() directly? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r811871746 ## File path: lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.nio.file.Path; +import java.util.Collections; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.core.WhitespaceAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.index.IndexNotFoundException; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.store.FSDirectory; +import org.junit.Test; + +public class TestMonitorReadonly extends MonitorTestBase { + private static final Analyzer ANALYZER = new WhitespaceAnalyzer(); + + @Test + public void testReadonlyMonitorThrowsOnInexistentIndex() { +Path indexDirectory = createTempDir(); +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +assertThrows( +IndexNotFoundException.class, +() -> { + new Monitor(ANALYZER, config); +}); + } + + @Test + public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws IOException { +Path indexDirectory = createTempDir(); +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setIndexPath( +indexDirectory, MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +// this will create the index +Monitor writeMonitor = new Monitor(ANALYZER, writeConfig); +writeMonitor.close(); + +MonitorConfiguration config = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse), +true); +try (Monitor monitor = new Monitor(ANALYZER, config)) { + assertThrows( + IllegalStateException.class, + () -> { +TermQuery query = new TermQuery(new Term(FIELD, "test")); +monitor.register( +new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + }); + + assertThrows( + IllegalStateException.class, + () -> { +monitor.deleteById("query1"); + }); + + assertThrows( + IllegalStateException.class, + () -> { +monitor.clear(); + }); +} + } + + @Test + public void testSettingCustomDirectory() throws IOException { +Path indexDirectory = createTempDir(); +Document doc = new Document(); +doc.add(newTextField(FIELD, "This is a Foobar test document", Field.Store.NO)); + +MonitorConfiguration writeConfig = +new MonitorConfiguration() +.setDirectoryProvider( +() -> FSDirectory.open(indexDirectory), +MonitorQuerySerializer.fromParser(MonitorTestBase::parse)); + +try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) { + TermQuery query = new TermQuery(new Term(FIELD, "test")); + writeMonitor.register( + new MonitorQuery("query1", query, query.toString(), Collections.emptyMap())); + TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar")); + writeMonitor.register( + new MonitorQuery("query2", query2, query.toString(), Collections.emptyMap())); + MatchingQueries matches = writeMonitor.match(doc, QueryMatch.SIMPLE_MATCHER); + assertNotNull(matches.getMatches()); + assertEquals(2, matches.getMatchCount()); + assertNotNull(matches.matches("query2")); +} + } + + public void testMonitorReadOnlyCouldReadOnTheSameIndex() throws IOException { +Path indexDirectory = createTempDir(); +Document doc = new Document(); +doc.add(newTextField(FIELD, "This is a te
[jira] [Resolved] (LUCENE-10412) Improve handling of MatchNoDocsQuery in rewrite rules
[ https://issues.apache.org/jira/browse/LUCENE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10412. --- Fix Version/s: 9.1 Resolution: Fixed > Improve handling of MatchNoDocsQuery in rewrite rules > - > > Key: LUCENE-10412 > URL: https://issues.apache.org/jira/browse/LUCENE-10412 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Having MatchNoDocsQuery in your query tree usually doesn't make the query > slower, but by recognizing it in rewrite rules, we could perform rewrites > which would then sometimes unlock other rewrite rules. > For instance if you have a boolean query with 2 should clauses where one is a > MatchAllDocsQuery and the other one is a MatchNoDocsQuery, we would naively > run it as a disjunction today, while we could rewrite it to a > MatchAllDocsQuery and leverage its specialized bulk scorer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on pull request #679: Monitor Improvements LUCENE-10422
mogui commented on pull request #679: URL: https://github.com/apache/lucene/pull/679#issuecomment-1047984458 @romseygeek I should have fixed everything, also added few lines of docs to explain read-only behaviour. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10432) Add optional 'name' property to org.apache.lucene.search.Explanation
Andriy Redko created LUCENE-10432: - Summary: Add optional 'name' property to org.apache.lucene.search.Explanation Key: LUCENE-10432 URL: https://issues.apache.org/jira/browse/LUCENE-10432 Project: Lucene - Core Issue Type: Improvement Reporter: Andriy Redko Right now, the `Explanation` class has the `description` property which is used pretty much as placeholder for free-style, human readable summary of what is happening. This is totally fine but it would be great to have a bit more formal way to link the explanation with corresponding function / query / filter if supported by the underlying engine. Example: Opensearch / Elasticseach has the concept of named queries / filters [1]. This is not supported by Apache Lucene at the moment but it would be helpful to propagate this information back as part of Explanation tree, for example by introducing optional 'name' property: {noformat} { "value": 0.0, "description": "script score function, computed with script: ...", "name": "script1", "details": [ { "value": 1.0, "description": "_score: ", "details": [ { "value": 1.0, "description": "*:*", "details": [] } ] } ] }{noformat} >From the other side, the `name` property may look like not belonging here, the >alternative suggestion would be to add support of `properties` / `parameters` >/ `tags` key/value bag, for example: {noformat} { "value": 0.0, "description": "script score function, computed with script: ...", "tags": [ { "name": "script1" } ], "details": [ { "value": 1.0, "description": "_score: ", "details": [ { "value": 1.0, "description": "*:*", "details": [] } ] } ] }{noformat} The change should be non-breaking but quite useful for engines to enrich the `Explanation` with additional context. [1] https://www.elastic.co/guide/en/elasticsearch/reference/7.16/query-dsl-bool-query.html#named-queries -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10432) Add optional 'name' property to org.apache.lucene.search.Explanation
[ https://issues.apache.org/jira/browse/LUCENE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496292#comment-17496292 ] Andriy Redko commented on LUCENE-10432: --- [~jpountz] my apologies for pinging you directly, curious if this small improvement makes sense or not really, before doing any work on pull request, thank you! > Add optional 'name' property to org.apache.lucene.search.Explanation > - > > Key: LUCENE-10432 > URL: https://issues.apache.org/jira/browse/LUCENE-10432 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Andriy Redko >Priority: Minor > > Right now, the `Explanation` class has the `description` property which is > used pretty much as placeholder for free-style, human readable summary of > what is happening. This is totally fine but it would be great to have a bit > more formal way to link the explanation with corresponding function / query / > filter if supported by the underlying engine. > Example: Opensearch / Elasticseach has the concept of named queries / filters > [1]. This is not supported by Apache Lucene at the moment but it would be > helpful to propagate this information back as part of Explanation tree, for > example by introducing optional 'name' property: > > {noformat} > { > "value": 0.0, > "description": "script score function, computed with script: ...", > > "name": "script1", > "details": [ > { > "value": 1.0, > "description": "_score: ", > "details": [ > { > "value": 1.0, > "description": "*:*", > "details": [] >} > ] > } > ] > }{noformat} > > From the other side, the `name` property may look like not belonging here, > the alternative suggestion would be to add support of `properties` / > `parameters` / `tags` key/value bag, for example: > > {noformat} > { > "value": 0.0, > "description": "script score function, computed with script: ...", > > "tags": [ >{ "name": "script1" } > ], > "details": [ > { > "value": 1.0, > "description": "_score: ", > "details": [ > { > "value": 1.0, > "description": "*:*", > "details": [] >} > ] > } > ] > }{noformat} > The change should be non-breaking but quite useful for engines to enrich the > `Explanation` with additional context. > [1] > https://www.elastic.co/guide/en/elasticsearch/reference/7.16/query-dsl-bool-query.html#named-queries > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 opened a new pull request #2643: Make Config API work for warming queries
andywebb1975 opened a new pull request #2643: URL: https://github.com/apache/lucene-solr/pull/2643 This is my attempt at resolving https://issues.apache.org/jira/browse/SOLR-9359 - it's still very work-in-progress, hence all the debug output etc, but if anyone has thoughts on it please let me know. I don't know if there's a better way to do this without all the `getClass()`/`instanceof` checking? With this patch in place it becomes possible to send `add/update-listener` commands to the Config API like this, and they take effect as expected rather than throwing a `ClassCastException`: ``` { "update-listener": { "name": "warming-queries", "event": "newSearcher", "class": "solr.QuerySenderListener", "queries": [ [ { "q": "foo" }, { "q": "bar" } ] ] } } ``` Note the nested array: without that, only the first query in the list is picked up - the rest don't appear in the `getArgs().get("queries")` response at all. I don't know if that's fixable but I suspect it'd require more widespread changes so I've steered clear of that thus far. (Also, this class is virtually the same in the new Solr repo - I'd raise a PR for that too.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()
kkewwei created LUCENE-10433: Summary: we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select() Key: LUCENE-10433 URL: https://issues.apache.org/jira/browse/LUCENE-10433 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.6.2 Reporter: kkewwei In the `RadixSelector.select` {code:java} private void select(int from, int to, int k, int d, int l) { if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { getFallbackSelector(d).select(from, to, k); } else { radixSelect(from, to, k, d, l); } } {code} we know that `l` represent the levels of recursion, not the `d`, but when we check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()
[ https://issues.apache.org/jira/browse/LUCENE-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kkewwei updated LUCENE-10433: - Component/s: core/other > we should pass l instead of d to getFallbackSelector(d).select in > RadixSelector.select() > > > Key: LUCENE-10433 > URL: https://issues.apache.org/jira/browse/LUCENE-10433 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 8.6.2 >Reporter: kkewwei >Priority: Major > > In the `RadixSelector.select` > {code:java} > private void select(int from, int to, int k, int d, int l) { > if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { > getFallbackSelector(d).select(from, to, k); > } else { > radixSelect(from, to, k, d, l); > } > } > {code} > we know that `l` represent the levels of recursion, not the `d`, but when we > check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()
[ https://issues.apache.org/jira/browse/LUCENE-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kkewwei resolved LUCENE-10433. -- Resolution: Resolved > we should pass l instead of d to getFallbackSelector(d).select in > RadixSelector.select() > > > Key: LUCENE-10433 > URL: https://issues.apache.org/jira/browse/LUCENE-10433 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 8.6.2 >Reporter: kkewwei >Priority: Major > > In the `RadixSelector.select` > {code:java} > private void select(int from, int to, int k, int d, int l) { > if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { > getFallbackSelector(d).select(from, to, k); > } else { > radixSelect(from, to, k, d, l); > } > } > {code} > we know that `l` represent the levels of recursion, not the `d`, but when we > check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #699: LUCENE-10054: Make sure to use Lucene90 codec in unit tests
jtibshirani opened a new pull request #699: URL: https://github.com/apache/lucene/pull/699 Before we were using the default Lucene91 codec, so we weren't exercising the old format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs
jtibshirani opened a new pull request #700: URL: https://github.com/apache/lucene/pull/700 The original PR that added kNN filtering support overlooked non-default codecs. This follow-up ensures that other codecs work with the new filtering logic: * Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader` and `Lucene90HnswVectorsReader` * Add a test `BaseKnnVectorsFormatTestCase` to cover this case * Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose assumptions don't hold when SimpleText is used This PR also clarifies the limit checking logic for `Lucene91HnswVectorsReader`. Now we always check the limit before visiting a new node, whereas before we only checked it in an outer loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs
jtibshirani commented on pull request #700: URL: https://github.com/apache/lucene/pull/700#issuecomment-1048394775 This will fix the nightly test failures. Example repro: ``` ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter -Dtests.seed=C4BEEB7EDCFB4E6C -Dtests.slow=true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules
Lu Xugang created LUCENE-10434: -- Summary: Improve handling of DocValuesRangeQuery in rewrite rules Key: LUCENE-10434 URL: https://issues.apache.org/jira/browse/LUCENE-10434 Project: Lucene - Core Issue Type: Improvement Reporter: Lu Xugang Since DocValuesFieldExistsQuery's rewrite rule has been implemented in [LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe those Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite further? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules
[ https://issues.apache.org/jira/browse/LUCENE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang resolved LUCENE-10434. Resolution: Not A Problem > Improve handling of DocValuesRangeQuery in rewrite rules > > > Key: LUCENE-10434 > URL: https://issues.apache.org/jira/browse/LUCENE-10434 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Minor > > Since DocValuesFieldExistsQuery's rewrite rule has been implemented in > [LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe > those Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite > further? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules
[ https://issues.apache.org/jira/browse/LUCENE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496460#comment-17496460 ] Lu Xugang commented on LUCENE-10434: Oh, it seems IndexSearch#rewrite will handle this > Improve handling of DocValuesRangeQuery in rewrite rules > > > Key: LUCENE-10434 > URL: https://issues.apache.org/jira/browse/LUCENE-10434 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Minor > > Since DocValuesFieldExistsQuery's rewrite rule has been implemented in > [LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe > those Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite > further? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a change in pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field
LuXugang commented on a change in pull request #677: URL: https://github.com/apache/lucene/pull/677#discussion_r812597747 ## File path: lucene/core/src/java/org/apache/lucene/search/DocValuesFieldExistsQuery.java ## @@ -64,6 +67,24 @@ public void visit(QueryVisitor visitor) { } } + @Override + public Query rewrite(IndexReader reader) throws IOException { +int rewritableReaders = 0; +for (LeafReaderContext context : reader.leaves()) { + LeafReader leaf = context.reader(); + Terms terms = leaf.terms(field); + PointValues pointValues = leaf.getPointValues(field); + if ((terms != null && terms.getDocCount() == leaf.maxDoc()) Review comment: If condition false, maybe we should break for loop early? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery
Lu Xugang created LUCENE-10435: -- Summary: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery Key: LUCENE-10435 URL: https://issues.apache.org/jira/browse/LUCENE-10435 Project: Lucene - Core Issue Type: Improvement Reporter: Lu Xugang In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one Segment can't match the condition occurs, maybe we should break loop directly. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery
LuXugang opened a new pull request #701: URL: https://github.com/apache/lucene/pull/701 In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one Segment can't match the condition occurs, maybe we should break loop directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org