[jira] [Created] (LUCENE-10424) Optimize the "everything matches" case for count queries in PointRangeQuery
Lu Xugang created LUCENE-10424: -- Summary: Optimize the "everything matches" case for count queries in PointRangeQuery Key: LUCENE-10424 URL: https://issues.apache.org/jira/browse/LUCENE-10424 Project: Lucene - Core Issue Type: Improvement Affects Versions: 9.1 Reporter: Lu Xugang In Implement of Weight#count in PointRangeQuery, Whether additional consideration is needed that when PointValues#getDocCount() == IndexReader#maxDoc() and the range's lower bound is less that the field's min value and the range's upper bound is greater than the field's max value, then return reader.maxDoc() directly? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r808862885 ## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ## @@ -455,6 +484,61 @@ public void testRandom() throws IOException { } } + /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */ + public void testRandomWithFilter() throws IOException { +int numDocs = 200; +int dimension = atLeast(5); +int numIters = atLeast(10); +try (Directory d = newDirectory()) { + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +doc.add(new KnnVectorField("field", randomVector(dimension))); +doc.add(new NumericDocValuesField("tag", i)); +doc.add(new IntPoint("tag", i)); +w.addDocument(doc); + } + w.close(); + + try (IndexReader reader = DirectoryReader.open(d)) { +IndexSearcher searcher = newSearcher(reader); +for (int i = 0; i < numIters; i++) { + int lower = random().nextInt(50); + + // Check that when filter is restrictive, we use exact search + Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6); + KnnVectorQuery query = new KnnVectorQuery("field", randomVector(dimension), 5, filter); + TopDocs results = searcher.search(query, numDocs); + assertEquals(TotalHits.Relation.EQUAL_TO, results.totalHits.relation); + assertEquals(results.totalHits.value, 5); Review comment: Thanks for the explanation, I missed a part about rewriting to `DocAndScoreQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova merged pull request #649: URL: https://github.com/apache/lucene/pull/649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808905427 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: AIUI, this implementation doesn't have an in-memory query cache, and re-parses the queries every time we do a match. I think having a lazy parser is definitely a valid use-case but I think we should decouple it from the notion of a read-only monitor. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java ## @@ -125,14 +108,21 @@ public Monitor(Analyzer analyzer, Presearcher presearcher, MonitorConfiguration * Monitor's queryindex * * @param listener listener to register + * @throws IllegalStateException when Monitor is readonly */ public void addQueryIndexUpdateListener(MonitorUpdateListener listener) { -listeners.add(listener); Review comment: I think we can just make `addListener()` a method on `QueryIndex` and delegate there? And then we don't need the `readOnly` member variable on `Monitor` ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new Ill
[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493858#comment-17493858 ] Mayya Sharipova commented on LUCENE-10408: -- This issue is concerned with dense case where all documents have vectors. We will explore a sparse case in follow-up. > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-10408: - Summary: Better dense encoding of doc Ids in Lucene91HnswVectorsFormat (was: Better encoding of doc Ids in Lucene91HnswVectorsFormat) > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova resolved LUCENE-10408. -- Fix Version/s: 9.1 Resolution: Fixed > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova closed LUCENE-10408. > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808924539 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java ## @@ -125,14 +108,21 @@ public Monitor(Analyzer analyzer, Presearcher presearcher, MonitorConfiguration * Monitor's queryindex * * @param listener listener to register + * @throws IllegalStateException when Monitor is readonly */ public void addQueryIndexUpdateListener(MonitorUpdateListener listener) { -listeners.add(listener); Review comment: yes better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808924635 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/MonitorConfiguration.java ## @@ -47,16 +49,39 @@ private static IndexWriterConfig defaultIndexWriterConfig() { return iwc; } + public Boolean isReadOnly() { Review comment: sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493867#comment-17493867 ] ASF subversion and git services commented on LUCENE-10408: -- Commit 3355273630b8396cfd51a770caf6213a9e2fba3f in lucene's branch refs/heads/branch_9x from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3355273 ] LUCENE-10408 Better encoding of doc Ids in vectors (#649) Better encoding of doc Ids in Lucene91HnswVectorsFormat for a dense case where all docs have vectors. Currently we write doc Ids of all documents that have vectors not very efficiently. This improve their encoding by for a case when all documents have vectors, we don't write document IDs, but just write a single short value – a dense marker. > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493866#comment-17493866 ] ASF subversion and git services commented on LUCENE-10408: -- Commit f8c5408be78fe98e1e8ed61ce999d6fb1f643eb2 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f8c5408 ] LUCENE-10408 Better encoding of doc Ids in vectors (#649) Better encoding of doc Ids in Lucene91HnswVectorsFormat for a dense case where all docs have vectors. Currently we write doc Ids of all documents that have vectors not very efficiently. This improve their encoding by for a case when all documents have vectors, we don't write document IDs, but just write a single short value – a dense marker. > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808925141 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + public MonitorQuery getQuery(String queryId) throws IOException { +if (serializer == null) { + throw new IllegalStateException( + "Cannot get queries from an index with no MonitorQuerySerializer"); +} +BytesRef[] bytesHolder = new BytesRef[1]; +search( +new TermQuery(new Term(WritableQueryIndex.FIELDS.query_id, queryId)), Review comment: yes it' something that I thought about, I'll drop the interface for an abstract class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808930507 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: Yes, it all goes to the `MonitorQueryCollector` that relies on the in-memory query cache, and it is an internal class of WritableQueryIndex. Are you suggesting to decouple `ReadonlyMonitorQueryCollector` as a lazy query parser, outside Readonly Monitor ? Am I getting it right? @romseygeek -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r808938064 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: Or maybe restore alle the query cache logic in the abstract class and selectively choose if use it or not for both implementations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10420) Move functional interfaces in IOUtils to top-level interfaces
[ https://issues.apache.org/jira/browse/LUCENE-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida resolved LUCENE-10420. Fix Version/s: 9.1 10.0 (main) Resolution: Fixed > Move functional interfaces in IOUtils to top-level interfaces > - > > Key: LUCENE-10420 > URL: https://issues.apache.org/jira/browse/LUCENE-10420 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Suggested at https://github.com/apache/lucene/pull/643#discussion_r802285404. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list
jianping weng created LUCENE-10425: -- Summary: Lucene supports bkd binary search and return current index of posting list Key: LUCENE-10425 URL: https://issues.apache.org/jira/browse/LUCENE-10425 Project: Lucene - Core Issue Type: New Feature Reporter: jianping weng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 opened a new pull request #687: support binary search in PointValues
wjp719 opened a new pull request #687: URL: https://github.com/apache/lucene/pull/687 add common function for caller to binary search in bkdPointTree. One possible use case is: for log data, when indexd sort in ascend order by @timestamp field, when we want to run count aggregation query to find the count of document in many time interval, we can use the binary search to find out the min/max docId in on time interval, and the doc count=max docId- min docId +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 opened a new pull request #688: PostingsEnum supports to return current index of postings
wjp719 opened a new pull request #688: URL: https://github.com/apache/lucene/pull/688 PostingsEnum supports to return current index of postings As we known, the docId list in .doc is like an array of integer, every element of this array is a docId, when we call nextDoc or advance method, it will move to another element of this array. this pr support to return current index of the element in the posting list. We can use method to find the diff value when we call advance, the diff value means we skip how many doc in this docId list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Component/s: core/search Description: In log scenario, we usually want to know the doc count of documents between every time intervals. One possible optimized method is to sort the docuemt in ascend order according to @timestamp field in one segment. then we can use pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId in on time interval. If there is no other filter query, the doc count of one time interval is (max docId- min docId +1) if there is only one another term filter query, we can use pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, when we call advance(minId) and advance(maxId), the diff value is also the doc count of one time interval > Lucene supports bkd binary search and return current index of posting list > -- > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId > in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use > pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of > index, when we call advance(minId) and advance(maxId), the diff value is also > the doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Description: In log scenario, we usually want to know the doc count of documents between every time intervals. One possible optimized method is to sort the docuemt in ascend order according to @timestamp field in one segment. then we can use [pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max docId in on time interval. If there is no other filter query, the doc count of one time interval is (max docId- min docId +1) if there is only one another term filter query, we can use pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, when we call advance(minId) and advance(maxId), the diff value is also the doc count of one time interval was: In log scenario, we usually want to know the doc count of documents between every time intervals. One possible optimized method is to sort the docuemt in ascend order according to @timestamp field in one segment. then we can use pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId in on time interval. If there is no other filter query, the doc count of one time interval is (max docId- min docId +1) if there is only one another term filter query, we can use pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, when we call advance(minId) and advance(maxId), the diff value is also the doc count of one time interval > Lucene supports bkd binary search and return current index of posting list > -- > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > [pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max > docId in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use > pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of > index, when we call advance(minId) and advance(maxId), the diff value is also > the doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Description: In log scenario, we usually want to know the doc count of documents between every time intervals. One possible optimized method is to sort the docuemt in ascend order according to @timestamp field in one segment. then we can use this pr [https://github.com/apache/lucene/pull/687] to find out the min/max docId in on time interval. If there is no other filter query, the doc count of one time interval is (max docId- min docId +1) if there is only one another term filter query, we can use this pr [https://github.com/apache/lucene/pull/688 |https://github.com/apache/lucene/pull/688]to get the diff value of index, when we call advance(minId) and advance(maxId), the diff value is also the doc count of one time interval was: In log scenario, we usually want to know the doc count of documents between every time intervals. One possible optimized method is to sort the docuemt in ascend order according to @timestamp field in one segment. then we can use [pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max docId in on time interval. If there is no other filter query, the doc count of one time interval is (max docId- min docId +1) if there is only one another term filter query, we can use pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, when we call advance(minId) and advance(maxId), the diff value is also the doc count of one time interval > Lucene supports bkd binary search and return current index of posting list > -- > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > this pr [https://github.com/apache/lucene/pull/687] to find out the min/max > docId in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use this pr > [https://github.com/apache/lucene/pull/688 > |https://github.com/apache/lucene/pull/688]to get the diff value of index, > when we call advance(minId) and advance(maxId), the diff value is also the > doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809038865 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: So the way the Monitor works at the moment is that it parses all the serialized queries in its QueryIndex on startup, and stores them in an in-memory cache. This means that when we run a document through the Monitor, once it has identified which candidate queries to run against it we don't need to re-parse them, they are already instantiated in RAM. The alternative, which I think is what you've implemented here, is to re-parse the query every time we need to run it. This is perfectly reasonable (in fact it's what elasticsearch's percolator does) but it is a significant change in behaviour so I don't think we should fold it in as part of this ticket. The Writeable query index needs to have all the tricksy behaviour around re-populating the cache, as we need to remove deleted entries or replace updated entries when queries are added or deleted (cache invalidation is hard, apparently!); the read-only index can just hold everything in a single Map that is populated on startup and never changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809070886 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: Ok it's more clear now. I can do it that way, but The readonly monitor would need a way to repopulate the cache too, assuming there are other writer that insert and delete on the same index, the readonly would never get the delta untile it gets re-instantiated to populate its Map. This way would not be so useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809076507 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: What do you think if I keep all the in-memory cache things along with the purgeExecutor in the abstract QueryIndex class and let the readonly monitor use it too, closing this round of improvements. Then open another changeset to implement the lazy parsing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809078793 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: At the moment you wouldn't pick up any changes anyway, because you're not calling `maybeRefresh()` on the SearcherManager so you would always get the same view of the index. If you want a dynamic view then you'll need a background refresh thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809081950 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: Ah, good to know! what do you think it's best to do? Make sense what I have proposed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r809081950 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex implements QueryIndex { + private final SearcherManager manager; + private final QueryDecomposer decomposer; + private final MonitorQuerySerializer serializer; + + final Map termFilters = new HashMap<>(); + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); Review comment: Ah, good to know! What do you think it's best to do? Makes sense what I have proposed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #688: LUCENE-10425:PostingsEnum supports to return current index of postings
msokolov commented on pull request #688: URL: https://github.com/apache/lucene/pull/688#issuecomment-1043123035 I'm concerned exposing this could limit future implementations by requiring them to support it. Do you have a proposed use case that would justify needing to add this? Maybe post to java-u...@lucene.apache.org explaining the use case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller commented on pull request #678: URL: https://github.com/apache/lucene/pull/678#issuecomment-1043211273 Going ahead and merging this now since there's been no opposition (of course it could always be backed out later if someone did take strong issue with this). Thanks again @spike-liu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494092#comment-17494092 ] ASF subversion and git services commented on LUCENE-10398: -- Commit fc3c790ab421122e7aa2f20453cb468def712123 in lucene's branch refs/heads/main from spike.liu [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fc3c790 ] LUCENE-10398: Add static method for getting Terms from LeafReader (#678) Co-authored-by: cheng...@ctrip.com > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller merged pull request #678: URL: https://github.com/apache/lucene/pull/678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494102#comment-17494102 ] ASF subversion and git services commented on LUCENE-10398: -- Commit 00029f1ec4a952b4345d966c00dc5abe7b9b8af1 in lucene's branch refs/heads/main from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=00029f1 ] Add CHANGES entry for LUCENE-10398 > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request #689: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller opened a new pull request #689: URL: https://github.com/apache/lucene/pull/689 Backporting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #689: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller merged pull request #689: URL: https://github.com/apache/lucene/pull/689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494136#comment-17494136 ] ASF subversion and git services commented on LUCENE-10398: -- Commit db2cd347a7aa543d123d157a9ca8e2c63844a82d in lucene's branch refs/heads/branch_9x from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=db2cd34 ] LUCENE-10398: Add static method for getting Terms from LeafReader (#689) Co-authored-by: spike.liu > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-10398. -- Fix Version/s: 9.1 Resolution: Fixed > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Fix For: 9.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494137#comment-17494137 ] Greg Miller commented on LUCENE-10398: -- Merged onto {{main}} and {{{}branch_9x{}}}. Resolving. Thanks again [~spike.liu] ! > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10426) Should we create a static factory method for loading VectorValues?
Greg Miller created LUCENE-10426: Summary: Should we create a static factory method for loading VectorValues? Key: LUCENE-10426 URL: https://issues.apache.org/jira/browse/LUCENE-10426 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Greg Miller Similar to the recent work in LUCENE-10398, it might be useful to add a static factory method for loading {{VectorValues}} that returns an "empty" {{VectorValues}} instance if the field doesn't exist in a segment (and also throws if the field is not configured as a vector field). This follows the same pattern of the static factory methods found in {{{}DocValues{}}}. I'm less convinced this is useful to add right now since I don't really see any existing usages of {{{}LeafReader#getVectorValues{}}}, so maybe the value isn't there right now? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field
jtibshirani merged pull request #677: URL: https://github.com/apache/lucene/pull/677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc
[ https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494169#comment-17494169 ] ASF subversion and git services commented on LUCENE-10084: -- Commit c132bbf677b5eb4d3ff0acf838b4d8f2c4e0327e in lucene's branch refs/heads/main from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c132bbf ] LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc. > Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == > maxDoc > > > Key: LUCENE-10084 > URL: https://issues.apache.org/jira/browse/LUCENE-10084 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms > or points have a docCount that is equal to maxDoc. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis
[ https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494170#comment-17494170 ] ASF subversion and git services commented on LUCENE-9334: - Commit c132bbf677b5eb4d3ff0acf838b4d8f2c4e0327e in lucene's branch refs/heads/main from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c132bbf ] LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc. > Require consistency between data-structures on a per-field basis > > > Key: LUCENE-9334 > URL: https://issues.apache.org/jira/browse/LUCENE-9334 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Blocker > Fix For: 9.0 > > Time Spent: 14.5h > Remaining Estimate: 0h > > Follow-up of > https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E. > We would like to start requiring consitency across data-structures on a > per-field basis in order to make it easier to do the right thing by default: > range queries can run faster if doc values are enabled, sorted queries can > run faster if points by indexed, etc. > This would be a big change, so it should be rolled out in a major. > Strict validation is tricky to implement, but we should still implement > best-effort validation: > - Documents all use the same data-structures, e.g. it is illegal for a > document to only enable points and another document to only enable doc values, > - When possible, check whether values are consistent too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis
[ https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494176#comment-17494176 ] ASF subversion and git services commented on LUCENE-9334: - Commit a9532f32866cad79c65fb4b0f220140b69757c42 in lucene's branch refs/heads/branch_9x from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a9532f3 ] LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc. > Require consistency between data-structures on a per-field basis > > > Key: LUCENE-9334 > URL: https://issues.apache.org/jira/browse/LUCENE-9334 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Blocker > Fix For: 9.0 > > Time Spent: 14.5h > Remaining Estimate: 0h > > Follow-up of > https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E. > We would like to start requiring consitency across data-structures on a > per-field basis in order to make it easier to do the right thing by default: > range queries can run faster if doc values are enabled, sorted queries can > run faster if points by indexed, etc. > This would be a big change, so it should be rolled out in a major. > Strict validation is tricky to implement, but we should still implement > best-effort validation: > - Documents all use the same data-structures, e.g. it is illegal for a > document to only enable points and another document to only enable doc values, > - When possible, check whether values are consistent too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc
[ https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494175#comment-17494175 ] ASF subversion and git services commented on LUCENE-10084: -- Commit a9532f32866cad79c65fb4b0f220140b69757c42 in lucene's branch refs/heads/branch_9x from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a9532f3 ] LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc. > Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == > maxDoc > > > Key: LUCENE-10084 > URL: https://issues.apache.org/jira/browse/LUCENE-10084 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms > or points have a docCount that is equal to maxDoc. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc
[ https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani resolved LUCENE-10084. --- Fix Version/s: 9.1 Resolution: Fixed > Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == > maxDoc > > > Key: LUCENE-10084 > URL: https://issues.apache.org/jira/browse/LUCENE-10084 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms > or points have a docCount that is equal to maxDoc. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani merged pull request #656: URL: https://github.com/apache/lucene/pull/656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs
[ https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494181#comment-17494181 ] ASF subversion and git services commented on LUCENE-10382: -- Commit 8ca372573dba0f4755b982b0c36a2b87aaf4705b in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8ca3725 ] LUCENE-10382: Support filtering in KnnVectorQuery (#656) This PR adds support for a query filter in KnnVectorQuery. First, we gather the query results for each leaf as a bit set. Then the HNSW search skips over the non-matching documents (using the same approach as for live docs). To prevent HNSW search from visiting too many documents when the filter is very selective, we short-circuit if HNSW has already visited more than the number of documents that match the filter, and execute an exact search instead. This bounds the number of visited documents at roughly 2x the cost of just running the exact filter, while in most cases HNSW completes successfully and does a lot better. Co-authored-by: Joel Bernstein > Allow KnnVectorQuery to operate over a subset of liveDocs > - > > Key: LUCENE-10382 > URL: https://issues.apache.org/jira/browse/LUCENE-10382 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.0 >Reporter: Joel Bernstein >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently the KnnVectorQuery selects the top K vectors from all live docs. > This ticket will change the interface to make it possible for the top K > vectors to be selected from a subset of the live docs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #690: LUCENE-10408: Fix vector valuese iteration bug
jtibshirani opened a new pull request #690: URL: https://github.com/apache/lucene/pull/690 Now that there is special logic to handle the dense case, we need to adjust some assertions in VectorValues#advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #690: LUCENE-10408: Fix vector values iteration bug
jtibshirani commented on pull request #690: URL: https://github.com/apache/lucene/pull/690#issuecomment-1043407860 This fixes recent test failures in `TestKnnVectorQuery`. One example: ``` ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter -Dtests.seed=1748D2B9616D6A6B ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
[ https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494235#comment-17494235 ] Julie Tibshirani commented on LUCENE-10408: --- A test we recently added caught a small issue with the iteration logic. I opened https://github.com/apache/lucene/pull/690 to address it. > Better dense encoding of doc Ids in Lucene91HnswVectorsFormat > - > > Key: LUCENE-10408 > URL: https://issues.apache.org/jira/browse/LUCENE-10408 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.1 > > Time Spent: 5h > Remaining Estimate: 0h > > Currently we write doc Ids of all documents that have vectors as is. We > should improve their encoding either using delta encoding or bitset. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs
[ https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494283#comment-17494283 ] ASF subversion and git services commented on LUCENE-10382: -- Commit af40b448227e07e93d12c62f9dcf083b92f6eb51 in lucene's branch refs/heads/branch_9x from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=af40b44 ] LUCENE-10382: Support filtering in KnnVectorQuery (#656) This PR adds support for a query filter in KnnVectorQuery. First, we gather the query results for each leaf as a bit set. Then the HNSW search skips over the non-matching documents (using the same approach as for live docs). To prevent HNSW search from visiting too many documents when the filter is very selective, we short-circuit if HNSW has already visited more than the number of documents that match the filter, and execute an exact search instead. This bounds the number of visited documents at roughly 2x the cost of just running the exact filter, while in most cases HNSW completes successfully and does a lot better. Co-authored-by: Joel Bernstein > Allow KnnVectorQuery to operate over a subset of liveDocs > - > > Key: LUCENE-10382 > URL: https://issues.apache.org/jira/browse/LUCENE-10382 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.0 >Reporter: Joel Bernstein >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently the KnnVectorQuery selects the top K vectors from all live docs. > This ticket will change the interface to make it possible for the top K > vectors to be selected from a subset of the live docs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on pull request #678: URL: https://github.com/apache/lucene/pull/678#issuecomment-1043674843 > Going ahead and merging this now since there's been no opposition (of course it could always be backed out later if someone did take strong issue with this). Thanks again @spike-liu ! It is my pleasure, Greg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu edited a comment on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu edited a comment on pull request #678: URL: https://github.com/apache/lucene/pull/678#issuecomment-1043674843 > Going ahead and merging this now since there's been no opposition (of course it could always be backed out later if someone did take strong issue with this). Thanks again @spike-liu ! It is my pleasure, Greg. Really appreciate your note of force pushing changes, which would save me a lot of time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494306#comment-17494306 ] spike liu commented on LUCENE-10398: It is my pleasure, Greg. > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494306#comment-17494306 ] spike liu edited comment on LUCENE-10398 at 2/18/22, 12:41 AM: --- It is my pleasure, [~gsmiller] was (Author: spike.liu): It is my pleasure, Greg. > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Fix For: 9.1 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r809637709 ## File path: lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java ## @@ -111,12 +111,9 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio @Override public Scorer scorer(LeafReaderContext context) throws IOException { -Terms terms = context.reader().terms(fieldName); -if (terms == null) { - return null; -} +Terms terms = Terms.terms(context.reader(), fieldName); TermsEnum termsEnum = terms.iterator(); -if (termsEnum.seekExact(new BytesRef(featureName)) == false) { +if (!termsEnum.seekExact(new BytesRef(featureName))) { Review comment: Gotcha. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang updated LUCENE-10424: --- Summary: Optimize the "everything matches" case for count query in PointRangeQuery (was: Optimize the "everything matches" case for count queries in PointRangeQuery) > Optimize the "everything matches" case for count query in PointRangeQuery > - > > Key: LUCENE-10424 > URL: https://issues.apache.org/jira/browse/LUCENE-10424 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Priority: Minor > > In Implement of Weight#count in PointRangeQuery, Whether additional > consideration is needed that when PointValues#getDocCount() == > IndexReader#maxDoc() and the range's lower bound is less that the field's min > value and the range's upper bound is greater than the field's max value, then > return reader.maxDoc() directly? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request #691: LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery
LuXugang opened a new pull request #691: URL: https://github.com/apache/lucene/pull/691 In Implement of Weight#count in PointRangeQuery, Whether additional consideration is needed that when PointValues#getDocCount() == IndexReader#maxDoc() and the range's lower bound is less that the field's min value and the range's upper bound is greater than the field's max value, then return reader.maxDoc() directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on pull request #688: LUCENE-10425:PostingsEnum supports to return current index of postings
wjp719 commented on pull request #688: URL: https://github.com/apache/lucene/pull/688#issuecomment-1043888635 @msokolov Hi, I describe a use case in this issue https://issues.apache.org/jira/browse/LUCENE-10425 , thanks a lot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) log count aggregation optimization inside one segment
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Summary: log count aggregation optimization inside one segment (was: Lucene supports bkd binary search and return current index of posting list) > log count aggregation optimization inside one segment > - > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > this pr [https://github.com/apache/lucene/pull/687] to find out the min/max > docId in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use this pr > [https://github.com/apache/lucene/pull/688 > |https://github.com/apache/lucene/pull/688]to get the diff value of index, > when we call advance(minId) and advance(maxId), the diff value is also the > doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) count aggregation optimization inside one segment in log scenario
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Summary: count aggregation optimization inside one segment in log scenario (was: count aggregation optimization inside one segment in log sc) > count aggregation optimization inside one segment in log scenario > - > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > this pr [https://github.com/apache/lucene/pull/687] to find out the min/max > docId in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use this pr > [https://github.com/apache/lucene/pull/688 > |https://github.com/apache/lucene/pull/688]to get the diff value of index, > when we call advance(minId) and advance(maxId), the diff value is also the > doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10425) count aggregation optimization inside one segment in log sc
[ https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10425: --- Summary: count aggregation optimization inside one segment in log sc (was: log count aggregation optimization inside one segment) > count aggregation optimization inside one segment in log sc > --- > > Key: LUCENE-10425 > URL: https://issues.apache.org/jira/browse/LUCENE-10425 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In log scenario, we usually want to know the doc count of documents between > every time intervals. One possible optimized method is to sort the docuemt in > ascend order according to @timestamp field in one segment. then we can use > this pr [https://github.com/apache/lucene/pull/687] to find out the min/max > docId in on time interval. > If there is no other filter query, the doc count of one time interval is (max > docId- min docId +1) > if there is only one another term filter query, we can use this pr > [https://github.com/apache/lucene/pull/688 > |https://github.com/apache/lucene/pull/688]to get the diff value of index, > when we call advance(minId) and advance(maxId), the diff value is also the > doc count of one time interval > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org