[jira] [Commented] (LUCENE-10322) Enable -Xlint:path and -Xlint:-exports
[ https://issues.apache.org/jira/browse/LUCENE-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492453#comment-17492453 ] spike liu commented on LUCENE-10322: I would like to take care of this. > Enable -Xlint:path and -Xlint:-exports > -- > > Key: LUCENE-10322 > URL: https://issues.apache.org/jira/browse/LUCENE-10322 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > > Enable these and fix the resulting errors. This applies both to javac.gradle > and ecj. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu opened a new pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
spike-liu opened a new pull request #681: URL: https://github.com/apache/lucene/pull/681 https://issues.apache.org/jira/browse/LUCENE-10322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
rmuir commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1040079874 why do we have to make all these internal classes public to do this? I don't think this is a good tradeoff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10322) Enable -Xlint:path and -Xlint:-exports
[ https://issues.apache.org/jira/browse/LUCENE-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492499#comment-17492499 ] Dawid Weiss commented on LUCENE-10322: -- Please do, if you like. I don't know how complex it'll get - I'd split the patch into each of these options. > Enable -Xlint:path and -Xlint:-exports > -- > > Key: LUCENE-10322 > URL: https://issues.apache.org/jira/browse/LUCENE-10322 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Enable these and fix the resulting errors. This applies both to javac.gradle > and ecj. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
dweiss commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1040099139 I would leave -Xlint:exports entirely out of the patch - this is not easily fixable; the module API is broken in that it exposes types that are not public but making them public is not really a solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
dweiss commented on a change in pull request #681: URL: https://github.com/apache/lucene/pull/681#discussion_r806675876 ## File path: gradle/java/javac.gradle ## @@ -52,12 +51,12 @@ allprojects { "-Xlint:overrides", // TODO: some tests seem to have bad classpaths? // this check seems to be a good sanity check for gradle? -"-Xlint:-path", +"-Xlint:path", "-Xlint:processing", "-Xlint:rawtypes", "-Xlint:removal", "-Xlint:requires-automatic", -"-Xlint:requires-transitive-automatic", +"-Xlint:-requires-transitive-automatic", Review comment: Just disabling this check globally is not a good way to solve it - suppress on each module that does it. Also - this should be a more fine-grained discussion on which pieces of the API should be exposed and which are simply mistakes (and should be maybe hidden entirely). ## File path: gradle/java/javac.gradle ## @@ -52,12 +51,12 @@ allprojects { "-Xlint:overrides", // TODO: some tests seem to have bad classpaths? // this check seems to be a good sanity check for gradle? -"-Xlint:-path", +"-Xlint:path", Review comment: I would take care of this one separately to see if it can be a self-contained, small patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806633322 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/MonitorConfiguration.java ## @@ -35,8 +34,9 @@ private long purgeFrequency = 5; private TimeUnit purgeFrequencyUnits = TimeUnit.MINUTES; private QueryDecomposer queryDecomposer = new QueryDecomposer(); - private Path indexPath = null; private MonitorQuerySerializer serializer; + private Boolean readOnly = false; + private DirectoryProviderFunctionalInterface directoryProvider; Review comment: We can set this to a default value of `() -> new ByteBuffersDirectory()` which would then simplify the `buildIndexWriter()` implementation. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/DirectoryProviderFunctionalInterface.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import org.apache.lucene.store.Directory; + +/** A functional interface to provide a Directory instance */ +public interface DirectoryProviderFunctionalInterface { Review comment: Can we use `org.apache.lucene.util.IOSupplier` here instead please? ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Given that QueryIndex is package-private I think we have some leeway to split things up here a bit and make things clearer. Could you try making QueryIndex itself an interface, and then having read-only and writeable implementations? We could also move the purge executor from the Monitor into the writeable QueryIndex implementation as it's not necessary to clear the cache at all on a read-only query index. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -36,15 +36,7 @@ import org.apache.lucene.document.Field; import org.apache.lucene.document.SortedDocValuesField; import org.apache.lucene.document.StringField; -import org.apache.lucene.index.BinaryDocValues; Review comment: The code formatter will complain about these imports I think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806709538 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: HI @romseygeek, thanks for the reply, ok for everything! I'll work on that, regarding the interface implementation split, that I agree it's more clear, How you suggest to let user select the implementattion? Keeping the readonly flag in MonitorConfiguration and instantiating the correct on in Monitor, should be enough? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806711683 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: > Keeping the readonly flag in MonitorConfiguration and instantiating the correct on in Monitor, should be enough? I think so. Maybe add it as a second parameter to `setDirectoryProvider` as it only really makes sense in combination with a custom Directory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806723684 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Yes, it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dantuzi opened a new pull request #682: Testing when the gradlew check fails in github action
dantuzi opened a new pull request #682: URL: https://github.com/apache/lucene/pull/682 Fake pull request created to test the action -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dantuzi closed pull request #682: Testing when the gradlew check fails in github action
dantuzi closed pull request #682: URL: https://github.com/apache/lucene/pull/682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
msokolov commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r806991813 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -70,18 +121,125 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf) throws IOException { -Bits liveDocs = ctx.reader().getLiveDocs(); -TopDocs results = ctx.reader().searchNearestVectors(field, target, kPerLeaf, liveDocs); -if (results == null) { + private TopDocs searchLeaf(LeafReaderContext ctx, BitSetCollector filterCollector) + throws IOException { + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return approximateSearch(ctx, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must Review comment: "If there are" ## File path: lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java ## @@ -58,6 +60,12 @@ protected KnnVectorsReader() {} * true k closest neighbors. For large values of k (for example when k is close to the total * number of documents), the search may also retrieve fewer than k documents. * + * The returned {@link TopDocs} will contain a {@link ScoreDoc} for each nearest neighbor, in + * order of their similarity to the query vector (decreasing scores). The {@link TotalHits} + * contains the number of documents visited during the search. If the search stopped early because + * it hit {@code visitedLimit}, it is indicated through the relation {@code Review comment: Would it be enough to know that `TopDocs.totalHits.value==visitedLimit`? Do we need to use the relation as a sentinel? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r807030169 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: I'm trying to wrap my head around it and tried few things, but from what I understand search methods cannot be isolted from cache population, so without a middle abstract class I'll end up with a lot of duplicate code. I don't like this Interface -> BaseAbstract and two implemtation hierarchy so much. We could skip the interface, but still I don't like so much. Maybe I'm missing something, but Monitor.search seems very coupled with loading in memory all stored queries in `ConcurrentMap queries` so I have to replicate all query cache releated code. Am I wrong? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r807030169 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: I'm trying to wrap my head around it and tried few things, but from what I understand search methods cannot be isolated from cache population, so without a middle abstract class I'll end up with a lot of duplicate code. I don't like to have `Interface -> BaseAbstract` and two implemtation hierarchy so much. We could skip the interface, but still I don't like it. Maybe I'm missing something, but Monitor.search seems very coupled with loading in memory all stored queries in `ConcurrentMap queries` so I have to replicate all query cache releated code. Am I wrong? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r807105140 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Nevermind, I got it or maybe I just think i did :) Tomorrow I'll update the pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492755#comment-17492755 ] Greg Miller commented on LUCENE-10398: -- Ok great [~spike.liu]! As I thought about this more, I'm not quite as enthusiastic about adding this, but could still be convinced it's a good thing to do :) . It seems quite common that users will want to directly access doc values for various reasons, and providing nice ways to load them (i.e., the static factory methods in {{{}DocValues{}}}) is nice. On the other hand, I'm not sure how common it really is for users to directly load {{Terms}} for a given field. Before adding this functionality, I think it would be nice to have some concrete uses for it inside of the Lucene code base. Could we look for some places where this sort of factory method would be useful to better motivate the need? Or maybe we have some common user needs we're convinced of? > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase commented on pull request #658: URL: https://github.com/apache/lucene/pull/658#issuecomment-1040625361 I will push this tomorrow if it is ok with you @gautamworah96 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #680: LUCENE-10420: Remove deprecated interfaces and methods in IOUtils in main
uschindler merged pull request #680: URL: https://github.com/apache/lucene/pull/680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806709538 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: HI @romseygeek, thanks for the reply, ok for everything! I'll work on that, regarding the interface implementation split, that I agree it's more clear, How you suggest to let user select the implementattion? Keeping the readonly flag in MonitorConfiguration and instantiating the correct on in Monitor, should be enough? ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Yes, it makes sense. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: I'm trying to wrap my head around it and tried few things, but from what I understand search methods cannot be isolted from cache population, so without a middle abstract class I'll end up with a lot of duplicate code. I don't like this Interface -> BaseAbstract and two implemtation hierarchy so much. We could skip the interface, but still I don't like so much. Maybe I'm missing something, but Monitor.search seems very coupled with loading in memory all stored queries in `ConcurrentMap queries` so I have to replicate all query cache releated code. Am I wrong? ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: I'm trying to wrap my head around it and tried few things, but from what I understand search methods cannot be isolated from cache population, so without a middle abstract class I'll end up with a lot of duplicate code. I don't like to have `Interface -> BaseAbstract` and two implemtation hierarchy so much. We could skip the interface, but still I don't like it. Maybe I'm missing something, but Monitor.search seems very coupled with loading in memory all stored queries in `ConcurrentMap queries` so I have to replicate all query cache releated code. Am I wrong? ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Nevermind, I got it or maybe I just think i did :) Tomorrow I'll update the pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed pull request #173: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)
uschindler closed pull request #173: URL: https://github.com/apache/lucene/pull/173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
jtibshirani commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r806160008 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java ## @@ -206,14 +203,22 @@ private void writeMeta( meta.writeVLong(vectorIndexOffset); meta.writeVLong(vectorIndexLength); meta.writeInt(field.getVectorDimension()); -meta.writeInt(docIds.length); -for (int docId : docIds) { - // TODO: delta-encode, or write as bitset - meta.writeVInt(docId); + +// write docIDs +int count = docsWithField.cardinality(); +meta.writeInt(count); +if (count == maxDoc) { + meta.writeByte((byte) -1); + ; // dense marker, each document has a vector value Review comment: I think you accidentally added a newline before the comment. ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: Would the checksum and doc IDs still match if the vectors were out of order? Maybe worth strengthening this check. ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ## @@ -424,38 +448,45 @@ public int docID() { @Override public int nextDoc() { - if (++ord >= size()) { + if (++ord >= size) { doc = NO_MORE_DOCS; } else { -doc = ordToDoc[ord]; +doc = ordToDocOperator.applyAsInt(ord); } return doc; } @Override public int advance(int target) { assert docID() < target; - ord = Arrays.binarySearch(ordToDoc, ord + 1, ordToDoc.length, target); - if (ord < 0) { -ord = -(ord + 1); + + if (ordToDoc == null) { +ord = target; + } else { +ord = Arrays.binarySearch(ordToDoc, ord + 1, ordToDoc.length, target); +if (ord < 0) { + ord = -(ord + 1); +} } - assert ord <= ordToDoc.length; - if (ord == ordToDoc.length) { + + assert ord <= size; + if (ord == size) { doc = NO_MORE_DOCS; } else { -doc = ordToDoc[ord]; +doc = ordToDocOperator.applyAsInt(ord); +; Review comment: stray semicolon ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ## @@ -266,12 +268,12 @@ private Bits getAcceptOrds(Bits acceptDocs, FieldEntry fieldEntry) { return new Bits() { @Override public boolean get(int index) { -return acceptDocs.get(fieldEntry.ordToDoc[index]); +return acceptDocs.get(fieldEntry.ordToDoc(index)); Review comment: We could add a small optimization here where we check `fieldEntry.ordToDocs == null`, and if so just return `acceptDocs`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492803#comment-17492803 ] Greg Miller commented on LUCENE-10398: -- Just noticed there's a PR already for this (sorry, I think I forgot to refresh a browser tab and didn't notice). From a first glance at the PR, it looks like you _did_ already go find places this can be used, so thanks! I'll have a look at the PR shortly! > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r806633322 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/MonitorConfiguration.java ## @@ -35,8 +34,9 @@ private long purgeFrequency = 5; private TimeUnit purgeFrequencyUnits = TimeUnit.MINUTES; private QueryDecomposer queryDecomposer = new QueryDecomposer(); - private Path indexPath = null; private MonitorQuerySerializer serializer; + private Boolean readOnly = false; + private DirectoryProviderFunctionalInterface directoryProvider; Review comment: We can set this to a default value of `() -> new ByteBuffersDirectory()` which would then simplify the `buildIndexWriter()` implementation. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/DirectoryProviderFunctionalInterface.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import org.apache.lucene.store.Directory; + +/** A functional interface to provide a Directory instance */ +public interface DirectoryProviderFunctionalInterface { Review comment: Can we use `org.apache.lucene.util.IOSupplier` here instead please? ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: Given that QueryIndex is package-private I think we have some leeway to split things up here a bit and make things clearer. Could you try making QueryIndex itself an interface, and then having read-only and writeable implementations? We could also move the purge executor from the Monitor into the writeable QueryIndex implementation as it's not necessary to clear the cache at all on a read-only query index. ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -36,15 +36,7 @@ import org.apache.lucene.document.Field; import org.apache.lucene.document.SortedDocValuesField; import org.apache.lucene.document.StringField; -import org.apache.lucene.index.BinaryDocValues; Review comment: The code formatter will complain about these imports I think ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java ## @@ -87,8 +81,20 @@ final Map termFilters = new HashMap<>(); QueryIndex(MonitorConfiguration config, Presearcher presearcher) throws IOException { Review comment: > Keeping the readonly flag in MonitorConfiguration and instantiating the correct on in Monitor, should be enough? I think so. Maybe add it as a second parameter to `setDirectoryProvider` as it only really makes sense in combination with a custom Directory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10371) Make IndexRearranger able to arrange segment into a determined order
[ https://issues.apache.org/jira/browse/LUCENE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492814#comment-17492814 ] ASF subversion and git services commented on LUCENE-10371: -- Commit 6157854523d35f3359632361d87965572552c134 in lucene's branch refs/heads/main from Patrick Zhai [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6157854 ] LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order (#630) > Make IndexRearranger able to arrange segment into a determined order > > > Key: LUCENE-10371 > URL: https://issues.apache.org/jira/browse/LUCENE-10371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Patrick Zhai >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Previously when I tried to make change to luceneutil to let it use > {{IndexRearranger}} for a faster deterministic index construction, I found > that even each segment contains the same documents set, the order of segments > will impact the estimated hit number (using BMW): > [https://markmail.org/message/zl6zsqvbg7nwfq6w] > At that time the discussion tend to tolerant the small hit count difference > to resolve the issue, after some time when I discuss this issue again with > [~mikemccand] , we thought it might also be a good idea to just add ability > of rearranging the segments order to {{IndexRearranger}}, so that we can > ensure each time the rearranged index is truly the same. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #673: LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces
uschindler commented on pull request #673: URL: https://github.com/apache/lucene/pull/673#issuecomment-1039214098 Followup PR: #680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on pull request #656: URL: https://github.com/apache/lucene/pull/656#issuecomment-1039714498 @msokolov @jpountz @mayya-sharipova this is ready for another look. Notable changes: * When computing the filter results, only include documents that actually contain a vector. This gives an accurate estimate of the filter selectivity. To support this I introduced `KnnVectorFieldExistsQuery`, which seemed useful in its own right. * I stopped using `CollectionTerminationException` to indicate that the search hit the visited limit. Instead, we pass the information in `TopDocs` through `TotalHits`. The value is always the number of visited docs, but the relation is `GREATER_THAN_OR_EQUAL_TO` iff the search stopped early. This is kind of arbitrary but felt natural -- I'm very open to suggestions here! It's a fairly low-level API and it's marked experimental, so there is also room to refine it later. This update does not change the output of `KnnVectorQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
rmuir commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1040079874 why do we have to make all these internal classes public to do this? I don't think this is a good tradeoff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
dweiss commented on a change in pull request #681: URL: https://github.com/apache/lucene/pull/681#discussion_r806675876 ## File path: gradle/java/javac.gradle ## @@ -52,12 +51,12 @@ allprojects { "-Xlint:overrides", // TODO: some tests seem to have bad classpaths? // this check seems to be a good sanity check for gradle? -"-Xlint:-path", +"-Xlint:path", "-Xlint:processing", "-Xlint:rawtypes", "-Xlint:removal", "-Xlint:requires-automatic", -"-Xlint:requires-transitive-automatic", +"-Xlint:-requires-transitive-automatic", Review comment: Just disabling this check globally is not a good way to solve it - suppress on each module that does it. Also - this should be a more fine-grained discussion on which pieces of the API should be exposed and which are simply mistakes (and should be maybe hidden entirely). ## File path: gradle/java/javac.gradle ## @@ -52,12 +51,12 @@ allprojects { "-Xlint:overrides", // TODO: some tests seem to have bad classpaths? // this check seems to be a good sanity check for gradle? -"-Xlint:-path", +"-Xlint:path", Review comment: I would take care of this one separately to see if it can be a self-contained, small patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #173: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)
uschindler commented on pull request #173: URL: https://github.com/apache/lucene/pull/173#issuecomment-1039342764 This PR is no longer maintained! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
dweiss commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1040099139 I would leave -Xlint:exports entirely out of the patch - this is not easily fixable; the module API is broken in that it exposes types that are not public but making them public is not really a solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #630: LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order
zhaih merged pull request #630: URL: https://github.com/apache/lucene/pull/630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 commented on pull request #658: URL: https://github.com/apache/lucene/pull/658#issuecomment-1040653243 Yes. I was wondering why you had kept it open :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase commented on pull request #658: URL: https://github.com/apache/lucene/pull/658#issuecomment-1040625361 I will push this tomorrow if it is ok with you @gautamworah96 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih opened a new pull request #683: [Backport] LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order
zhaih opened a new pull request #683: URL: https://github.com/apache/lucene/pull/683 Backporting #630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
msokolov commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r806991813 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -70,18 +121,125 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf) throws IOException { -Bits liveDocs = ctx.reader().getLiveDocs(); -TopDocs results = ctx.reader().searchNearestVectors(field, target, kPerLeaf, liveDocs); -if (results == null) { + private TopDocs searchLeaf(LeafReaderContext ctx, BitSetCollector filterCollector) + throws IOException { + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return approximateSearch(ctx, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must Review comment: "If there are" ## File path: lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java ## @@ -58,6 +60,12 @@ protected KnnVectorsReader() {} * true k closest neighbors. For large values of k (for example when k is close to the total * number of documents), the search may also retrieve fewer than k documents. * + * The returned {@link TopDocs} will contain a {@link ScoreDoc} for each nearest neighbor, in + * order of their similarity to the query vector (decreasing scores). The {@link TotalHits} + * contains the number of documents visited during the search. If the search stopped early because + * it hit {@code visitedLimit}, it is indicated through the relation {@code Review comment: Would it be enough to know that `TopDocs.totalHits.value==visitedLimit`? Do we need to use the relation as a sentinel? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dantuzi closed pull request #682: Testing when the gradlew check fails in github action
dantuzi closed pull request #682: URL: https://github.com/apache/lucene/pull/682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field
jtibshirani commented on a change in pull request #677: URL: https://github.com/apache/lucene/pull/677#discussion_r807297322 ## File path: lucene/core/src/test/org/apache/lucene/search/TestDocValuesFieldExistsQuery.java ## @@ -38,6 +39,100 @@ public class TestDocValuesFieldExistsQuery extends LuceneTestCase { + public void testRewriteWithTermsPresent() throws IOException { +Directory dir = newDirectory(); +RandomIndexWriter iw = new RandomIndexWriter(random(), dir); +final int numDocs = atLeast(100); +for (int i = 0; i < numDocs; ++i) { + Document doc = new Document(); + doc.add(new StringField("f", random().nextBoolean() ? "yes" : "no", Store.NO)); + iw.addDocument(doc); +} +iw.commit(); +final IndexReader reader = iw.getReader(); +final IndexSearcher searcher = newSearcher(reader); +iw.close(); + +Query query = new DocValuesFieldExistsQuery("f"); +Query rewrittenQuery = query.rewrite(reader); +assertTrue(rewrittenQuery instanceof MatchAllDocsQuery); +assertSameMatches(searcher, query, rewrittenQuery, false); Review comment: I think we can skip these `assertSameMatches`, because we always rewrite the query as part of the search. So it is not asserting anything insightful about `DocValuesFieldExistsQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807228024 ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { Review comment: What about defining this once and exposing it through a static variable much like `TermsEnum.EMPTY` (e.g., `Terms.EMPTY`)? This looks perfectly safe to share/reuse to me, so we can avoid the overhead of instantiating over-and-over. ## File path: lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java ## @@ -111,12 +111,9 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio @Override public Scorer scorer(LeafReaderContext context) throws IOException { -Terms terms = context.reader().terms(fieldName); -if (terms == null) { - return null; -} +Terms terms = Terms.terms(context.reader(), fieldName); TermsEnum termsEnum = terms.iterator(); -if (termsEnum.seekExact(new BytesRef(featureName)) == false) { +if (!termsEnum.seekExact(new BytesRef(featureName))) { Review comment: As a best-practice, we prefer using `== false` instead of `!` syntax in this code base. Can you please retain this style? Thanks! ## File path: lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java ## @@ -244,8 +244,8 @@ public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float bo @Override public Matches matches(LeafReaderContext context, int doc) throws IOException { -Terms terms = context.reader().terms(field); -if (terms == null || terms.hasPositions() == false) { +Terms terms = Terms.terms(context.reader(), field); +if (!terms.hasPositions()) { Review comment: minor: Another place to use `== false` instead of `!` please ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { Review comment: Also, since we don't actually need this outside of the `Terms` class right now, I think I'd also prefer we make access `private`. We can always open up access if a use-case demands it. ## File path: lucene/queries/src/java/org/apache/lucene/queries/function/IndexReaderFunctions.java ## @@ -212,10 +212,10 @@ private TermFreqDoubleValuesSource(Term term) { @Override public DoubleValues getValues(LeafReaderContext ctx, DoubleValues scores) throws IOException { - Terms terms = ctx.reader().terms(term.field()); - TermsEnum te = terms == null ? null : terms.iterator(); + Terms terms = Terms.terms(ctx.reader(), term.field()); + TermsEnum te = terms.iterator(); - if (te == null || te.seekExact(term.bytes()) == false) { + if (!te.seekExact(term.bytes())) { Review comment: minor: Just another place to use `== false` instead of `!` please. ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { +return new Terms() { + @Override + public TermsEnum iterator() throws IOException { +return TermsEnum.EMPTY; + } + + @Override + public long size() throws IOException { +return 0; + } + + @Override + public long getSumTotalTermFreq() throws IOException { +return 0; + } + + @Override + public long getSumDocFreq() throws IOException { +return 0; + } + + @Override + public int getDocCount() throws IOException { +return 0; + } + + @Override + public boolean hasFreqs() { +return false; + } + + @Override + public boolean hasOffsets() { +return false; + } + + @Override + public boolean hasPositions() { +return false; + } + + @Override + public boolean hasPayloads() { +return false; + } +}; + } + + /** + * Returns the {@link Terms} index for this field, or {@link #emptyTerms} if it has none. + * @return terms instance, or an empty instance if {@code field} does not exis
[jira] [Commented] (LUCENE-10371) Make IndexRearranger able to arrange segment into a determined order
[ https://issues.apache.org/jira/browse/LUCENE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492881#comment-17492881 ] ASF subversion and git services commented on LUCENE-10371: -- Commit ceccc1bd7b96e9b0dcb8a723e02e40ae6e1891ab in lucene's branch refs/heads/branch_9x from Patrick Zhai [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ceccc1b ] LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order (#683) > Make IndexRearranger able to arrange segment into a determined order > > > Key: LUCENE-10371 > URL: https://issues.apache.org/jira/browse/LUCENE-10371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Patrick Zhai >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Previously when I tried to make change to luceneutil to let it use > {{IndexRearranger}} for a faster deterministic index construction, I found > that even each segment contains the same documents set, the order of segments > will impact the estimated hit number (using BMW): > [https://markmail.org/message/zl6zsqvbg7nwfq6w] > At that time the discussion tend to tolerant the small hit count difference > to resolve the issue, after some time when I discuss this issue again with > [~mikemccand] , we thought it might also be a good idea to just add ability > of rearranging the segments order to {{IndexRearranger}}, so that we can > ensure each time the rearranged index is truly the same. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #683: [Backport] LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order
zhaih merged pull request #683: URL: https://github.com/apache/lucene/pull/683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10371) Make IndexRearranger able to arrange segment into a determined order
[ https://issues.apache.org/jira/browse/LUCENE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Zhai resolved LUCENE-10371. --- Resolution: Fixed > Make IndexRearranger able to arrange segment into a determined order > > > Key: LUCENE-10371 > URL: https://issues.apache.org/jira/browse/LUCENE-10371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Patrick Zhai >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Previously when I tried to make change to luceneutil to let it use > {{IndexRearranger}} for a faster deterministic index construction, I found > that even each segment contains the same documents set, the order of segments > will impact the estimated hit number (using BMW): > [https://markmail.org/message/zl6zsqvbg7nwfq6w] > At that time the discussion tend to tolerant the small hit count difference > to resolve the issue, after some time when I discuss this issue again with > [~mikemccand] , we thought it might also be a good idea to just add ability > of rearranging the segments order to {{IndexRearranger}}, so that we can > ensure each time the rearranged index is truly the same. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492885#comment-17492885 ] Julie Tibshirani commented on LUCENE-10391: --- I've been keeping an eye on nightly benchmarks and don't see a noticeable improvement in search or indexing performance. The updated heap profiles do look better though. Apparently allocating the results queue is still non-trivial, so it could be worth trying to share that as well (which my initial PR did not do): {code} 8.39% 43814Morg.apache.lucene.util.LongHeap#() at org.apache.lucene.util.hnsw.NeighborQueue#() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() {code} > Reuse data structures across HnswGraph invocations > -- > > Key: LUCENE-10391 > URL: https://issues.apache.org/jira/browse/LUCENE-10391 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Julie Tibshirani >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. > Profiles from nightly benchmarks suggest that allocating data-structures > incurs both lots of heap allocations > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)] > and CPU usage > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).] > It looks like reusing data structures across invocations would be a > low-hanging fruit that could help save significant CPU? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807375030 ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { +return new Terms() { + @Override + public TermsEnum iterator() throws IOException { +return TermsEnum.EMPTY; + } + + @Override + public long size() throws IOException { +return 0; + } + + @Override + public long getSumTotalTermFreq() throws IOException { +return 0; + } + + @Override + public long getSumDocFreq() throws IOException { +return 0; + } + + @Override + public int getDocCount() throws IOException { +return 0; + } + + @Override + public boolean hasFreqs() { +return false; + } + + @Override + public boolean hasOffsets() { +return false; + } + + @Override + public boolean hasPositions() { +return false; + } + + @Override + public boolean hasPayloads() { +return false; + } +}; + } + + /** + * Returns the {@link Terms} index for this field, or {@link #emptyTerms} if it has none. + * @return terms instance, or an empty instance if {@code field} does not exist in this reader Review comment: FYI: spotless is complaining about this javadoc formatting (that's why `./gradlew check` is failing). You should be able to run `./gradlew spotlessApply` to fix this automatically. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492890#comment-17492890 ] Greg Miller commented on LUCENE-10398: -- OK, after looking over the PR, I'm back to liking this idea :) Looks like it allowed null checks to be removed in a number of places, which is nice. I'm in favor of this and left some minor comments on the PR, but I'm curious if anyone else feels like the API shouldn't be added for any particular reason. Thanks again [~spike.liu] for picking this up! > Add static method for getting Terms from LeafReader > --- > > Key: LUCENE-10398 > URL: https://issues.apache.org/jira/browse/LUCENE-10398 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} > that return {{null}} values if the field is not indexed. These methods also > have equivalent {{DocValues}} static methods, such as > {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a > {{null}} if there is no field. I noticed that {{Terms}} does not have an > equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or > something similar. I was wondering if there was a reason for this, or if a > method like this could be useful. Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r807399758 ## File path: lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java ## @@ -58,6 +60,12 @@ protected KnnVectorsReader() {} * true k closest neighbors. For large values of k (for example when k is close to the total * number of documents), the search may also retrieve fewer than k documents. * + * The returned {@link TopDocs} will contain a {@link ScoreDoc} for each nearest neighbor, in + * order of their similarity to the query vector (decreasing scores). The {@link TotalHits} + * contains the number of documents visited during the search. If the search stopped early because + * it hit {@code visitedLimit}, it is indicated through the relation {@code Review comment: It's possible (but unlikely) that the search completed normally with exactly `numVisited == visitedLimit`. The `visitedLimit` is inclusive. To me it felt more solid and obvious to use the relation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on pull request #656: URL: https://github.com/apache/lucene/pull/656#issuecomment-1040900652 Thanks for the review! I'll wait for the others to take a look too. I'm working on adding kNN with filtering to luceneutil. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #2641: SOLR-15965 Use proper signatures for SolrAuth
madrob opened a new pull request #2641: URL: https://github.com/apache/lucene-solr/pull/2641 Internode communication secured by PKI Authentication has changed formats. Difference from 9x is that on 8x we will still default to send v1 and accept v1,v2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807560361 ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { Review comment: Thanks for your review, Greg. Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807561090 ## File path: lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java ## @@ -111,12 +111,9 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio @Override public Scorer scorer(LeafReaderContext context) throws IOException { -Terms terms = context.reader().terms(fieldName); -if (terms == null) { - return null; -} +Terms terms = Terms.terms(context.reader(), fieldName); TermsEnum termsEnum = terms.iterator(); -if (termsEnum.seekExact(new BytesRef(featureName)) == false) { +if (!termsEnum.seekExact(new BytesRef(featureName))) { Review comment: Absolutely. Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807561320 ## File path: lucene/queries/src/java/org/apache/lucene/queries/function/IndexReaderFunctions.java ## @@ -212,10 +212,10 @@ private TermFreqDoubleValuesSource(Term term) { @Override public DoubleValues getValues(LeafReaderContext ctx, DoubleValues scores) throws IOException { - Terms terms = ctx.reader().terms(term.field()); - TermsEnum te = terms == null ? null : terms.iterator(); + Terms terms = Terms.terms(ctx.reader(), term.field()); + TermsEnum te = terms.iterator(); - if (te == null || te.seekExact(term.bytes()) == false) { + if (!te.seekExact(term.bytes())) { Review comment: fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807561536 ## File path: lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java ## @@ -244,8 +244,8 @@ public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float bo @Override public Matches matches(LeafReaderContext context, int doc) throws IOException { -Terms terms = context.reader().terms(field); -if (terms == null || terms.hasPositions() == false) { +Terms terms = Terms.terms(context.reader(), field); +if (!terms.hasPositions()) { Review comment: fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807563855 ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { +return new Terms() { + @Override + public TermsEnum iterator() throws IOException { +return TermsEnum.EMPTY; + } + + @Override + public long size() throws IOException { +return 0; + } + + @Override + public long getSumTotalTermFreq() throws IOException { +return 0; + } + + @Override + public long getSumDocFreq() throws IOException { +return 0; + } + + @Override + public int getDocCount() throws IOException { +return 0; + } + + @Override + public boolean hasFreqs() { +return false; + } + + @Override + public boolean hasOffsets() { +return false; + } + + @Override + public boolean hasPositions() { +return false; + } + + @Override + public boolean hasPayloads() { +return false; + } +}; + } + + /** + * Returns the {@link Terms} index for this field, or {@link #emptyTerms} if it has none. + * @return terms instance, or an empty instance if {@code field} does not exist in this reader + * @throws IOException if an I/O error occurs. + */ + public static Terms terms(LeafReader reader, String field) throws IOException { Review comment: fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
spike-liu commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r807570862 ## File path: lucene/core/src/java/org/apache/lucene/index/Terms.java ## @@ -124,6 +124,69 @@ protected BytesRef nextSeekTerm(BytesRef term) throws IOException { /** Zero-length array of {@link Terms}. */ public static final Terms[] EMPTY_ARRAY = new Terms[0]; + /** An empty {@link Terms} which returns no terms */ + public static Terms emptyTerms() { +return new Terms() { + @Override + public TermsEnum iterator() throws IOException { +return TermsEnum.EMPTY; + } + + @Override + public long size() throws IOException { +return 0; + } + + @Override + public long getSumTotalTermFreq() throws IOException { +return 0; + } + + @Override + public long getSumDocFreq() throws IOException { +return 0; + } + + @Override + public int getDocCount() throws IOException { +return 0; + } + + @Override + public boolean hasFreqs() { +return false; + } + + @Override + public boolean hasOffsets() { +return false; + } + + @Override + public boolean hasPositions() { +return false; + } + + @Override + public boolean hasPayloads() { +return false; + } +}; + } + + /** + * Returns the {@link Terms} index for this field, or {@link #emptyTerms} if it has none. + * @return terms instance, or an empty instance if {@code field} does not exist in this reader Review comment: Thanks for your kindly instruction. Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase merged pull request #658: URL: https://github.com/apache/lucene/pull/658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10378) Implement Weight#count on PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493021#comment-17493021 ] ASF subversion and git services commented on LUCENE-10378: -- Commit dd25fabb03c86835f4e0c312b8d7f5c3ba045c8d in lucene's branch refs/heads/main from Gautam Worah [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=dd25fab ] LUCENE-10378 Implement Weight#count for PointRangeQuery (#658) Implement Weight#count for PointRangeQuery to provide a faster way to calculate the number of matching range docs when each doc has at-most one point and the points are 1-dimensional. > Implement Weight#count on PointRangeQuery > - > > Key: LUCENE-10378 > URL: https://issues.apache.org/jira/browse/LUCENE-10378 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 4h 50m > Remaining Estimate: 0h > > When there are no deletions and the field is single-valued (docCount == size) > and has a single-dimension, PointRangeQuery could implement {{Weight#count}} > by only counting matches on the two leaves that cross with the query. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new pull request #684: Backport of LUCENE-10378
iverase opened a new pull request #684: URL: https://github.com/apache/lucene/pull/684 Implement Weight#count on PointRangeQuery. Also fixed a small style inconsistency that I noticed Issue: https://issues.apache.org/jira/browse/LUCENE-10378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 commented on pull request #658: URL: https://github.com/apache/lucene/pull/658#issuecomment-1041175108 I see you already opened a backport PR. Thanks! I'll approve that as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 edited a comment on pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 edited a comment on pull request #658: URL: https://github.com/apache/lucene/pull/658#issuecomment-1041175108 @iverase I see you already opened a backport PR. Thanks! I'll approve that as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #684: Backport of LUCENE-10378
iverase merged pull request #684: URL: https://github.com/apache/lucene/pull/684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10378) Implement Weight#count on PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-10378. --- Fix Version/s: 9.1 Assignee: Ignacio Vera Resolution: Fixed > Implement Weight#count on PointRangeQuery > - > > Key: LUCENE-10378 > URL: https://issues.apache.org/jira/browse/LUCENE-10378 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > When there are no deletions and the field is single-valued (docCount == size) > and has a single-dimension, PointRangeQuery could implement {{Weight#count}} > by only counting matches on the two leaves that cross with the query. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10378) Implement Weight#count on PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera reassigned LUCENE-10378: - Assignee: (was: Ignacio Vera) > Implement Weight#count on PointRangeQuery > - > > Key: LUCENE-10378 > URL: https://issues.apache.org/jira/browse/LUCENE-10378 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > When there are no deletions and the field is single-valued (docCount == size) > and has a single-dimension, PointRangeQuery could implement {{Weight#count}} > by only counting matches on the two leaves that cross with the query. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10378) Implement Weight#count on PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera reassigned LUCENE-10378: - Assignee: Ignacio Vera > Implement Weight#count on PointRangeQuery > - > > Key: LUCENE-10378 > URL: https://issues.apache.org/jira/browse/LUCENE-10378 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > When there are no deletions and the field is single-valued (docCount == size) > and has a single-dimension, PointRangeQuery could implement {{Weight#count}} > by only counting matches on the two leaves that cross with the query. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10378) Implement Weight#count on PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493037#comment-17493037 ] ASF subversion and git services commented on LUCENE-10378: -- Commit 08d3daef1ae2379b6e585b63eae4775fb24fea44 in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=08d3dae ] LUCENE-10378 Implement Weight#count for PointRangeQuery (apache#658) Implement Weight#count for PointRangeQuery to provide a faster way to calculate the number of matching range docs when each doc has at-most one point and the points are 1-dimensional. > Implement Weight#count on PointRangeQuery > - > > Key: LUCENE-10378 > URL: https://issues.apache.org/jira/browse/LUCENE-10378 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > When there are no deletions and the field is single-valued (docCount == size) > and has a single-dimension, PointRangeQuery could implement {{Weight#count}} > by only counting matches on the two leaves that cross with the query. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r807638868 ## File path: build.gradle ## @@ -183,3 +183,5 @@ apply from: file('gradle/hacks/turbocharge-jvm-opts.gradle') apply from: file('gradle/hacks/dummy-outputs.gradle') apply from: file('gradle/pylucene/pylucene.gradle') +sourceCompatibility = JavaVersion.VERSION_16 Review comment: Are these changes intentional? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
spike-liu commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1041211089 > why do we have to make all these internal classes public to do this? I don't think this is a good tradeoff. Thanks for your review, Robert. Just like David mentioned, simply making them public is really not a good solution. + 1 example for your reference: - `public class ByteBufferIndexInput` uses `private class ByteBufferGuard` as a parameter of method; https://user-images.githubusercontent.com/9884987/154220157-d4f664d0-6f67-4809-8c38-3633aa073b31.png";> - `org.apache.lucene.search.suggest.document.NRTSuggester` uses `ByteBufferIndexInput` like below: https://user-images.githubusercontent.com/9884987/154218470-77e7fb39-dddb-4987-8a85-68887bb43987.png";> Hence it violates the existing rule of java module system, which seems a bit hard to comply in our case right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu commented on a change in pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
spike-liu commented on a change in pull request #681: URL: https://github.com/apache/lucene/pull/681#discussion_r807641222 ## File path: gradle/java/javac.gradle ## @@ -52,12 +51,12 @@ allprojects { "-Xlint:overrides", // TODO: some tests seem to have bad classpaths? // this check seems to be a good sanity check for gradle? -"-Xlint:-path", +"-Xlint:path", Review comment: Thanks for you review, Dawid. Got it. Any more work needed, please feel free to let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu edited a comment on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
spike-liu edited a comment on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1041211089 > why do we have to make all these internal classes public to do this? I don't think this is a good tradeoff. Thanks for your review, Robert. Just like Dawid mentioned, simply making them public is really not a good solution. + 1 example for your reference: - `public class ByteBufferIndexInput` uses `private class ByteBufferGuard` as a parameter of method; https://user-images.githubusercontent.com/9884987/154220157-d4f664d0-6f67-4809-8c38-3633aa073b31.png";> - `org.apache.lucene.search.suggest.document.NRTSuggester` uses `ByteBufferIndexInput` like below: https://user-images.githubusercontent.com/9884987/154218470-77e7fb39-dddb-4987-8a85-68887bb43987.png";> Hence it violates the existing rule of java module system, which seems a bit hard to comply in our case right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org