[GitHub] [lucene] jpountz commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
jpountz commented on a change in pull request #413: URL: https://github.com/apache/lucene/pull/413#discussion_r737165184 ## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ## @@ -25,18 +25,10 @@ import java.io.IOException; import java.util.HashSet; import java.util.Set; -import org.apache.lucene.document.Document; -import org.apache.lucene.document.Field; -import org.apache.lucene.document.KnnVectorField; -import org.apache.lucene.document.StringField; -import org.apache.lucene.index.DirectoryReader; -import org.apache.lucene.index.IndexReader; -import org.apache.lucene.index.IndexWriter; -import org.apache.lucene.index.IndexWriterConfig; -import org.apache.lucene.index.RandomIndexWriter; -import org.apache.lucene.index.Term; -import org.apache.lucene.index.VectorSimilarityFunction; +import org.apache.lucene.document.*; +import org.apache.lucene.index.*; Review comment: Oh, I thought we failed the build on wildcard imports, but apparently we don't. Maybe still use explicit imports to reduce line changes of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #415: LUCENE-10206 Implement O(1) count on query cache
jpountz merged pull request #415: URL: https://github.com/apache/lucene/pull/415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10206) Implement O(1) count on query cache
[ https://issues.apache.org/jira/browse/LUCENE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434733#comment-17434733 ] ASF subversion and git services commented on LUCENE-10206: -- Commit 941df98c3f718371af4702c92bf6537739120064 in lucene's branch refs/heads/main from Nik Everett [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=941df98 ] LUCENE-10206 Implement O(1) count on query cache (#415) When we load a query into the query cache we always calculate the count of matching documents. This uses that count to power the new `O(1)` `Weight#count` method. > Implement O(1) count on query cache > --- > > Key: LUCENE-10206 > URL: https://issues.apache.org/jira/browse/LUCENE-10206 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nik Everett >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > I'd like to implement the `Weight#count` method in `LRUQueryCache` so cached > queries can quickly return their counts. We already have a count on all of > the bit sets we use for the query cache we just have to store it and "plug it > in". > > I got here because we frequently end up wanting to get counts and I saw hot > `RoaringDocIdSet`'s iterator hot spotting. I don't think it's slow or > anything, but when the collector is just `count++` the iterator is > substantial. It seems like we could frequently avoid the whole thing by > implementing `count` in the query cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
janhoy commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r737229976 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 Review comment: I thought the same. If the consensus is that we're going away from field guessing, then we should not promote the current _default config, but rather be explicit and reference the bundled `techproducts` configset. Or better, show them how to use Schema Designer to setup a configset for a certain dataset? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10206) Implement O(1) count on query cache
[ https://issues.apache.org/jira/browse/LUCENE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10206. --- Fix Version/s: main (9.0) Resolution: Fixed > Implement O(1) count on query cache > --- > > Key: LUCENE-10206 > URL: https://issues.apache.org/jira/browse/LUCENE-10206 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nik Everett >Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I'd like to implement the `Weight#count` method in `LRUQueryCache` so cached > queries can quickly return their counts. We already have a count on all of > the bit sets we use for the query cache we just have to store it and "plug it > in". > > I got here because we frequently end up wanting to get counts and I saw hot > `RoaringDocIdSet`'s iterator hot spotting. I don't think it's slow or > anything, but when the collector is just `count++` the iterator is > substantial. It seems like we could frequently avoid the whole thing by > implementing `count` in the query cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434769#comment-17434769 ] Michael McCandless commented on LUCENE-10207: - I love this idea! Using the aggregate term statistics already in the index to efficiently guesstimate the cost on the index side of things. The user can always override the decision if they know something is unusual about their index? (Hmm, maybe not – looks like the logic is hardcoded deep inside an anonymous {{ScorerSuppplier}} in {{IoDVQ}}). Should we try to take deletions into account at all? Because a PK field with deletions will look like it is not "precisely" PK based on the aggregate stats. Though I suppose even with e.g. 50% deletions in the index, this proposed cost metric is close enough. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9660) gradle task cache should not cache --tests
[ https://issues.apache.org/jira/browse/LUCENE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434803#comment-17434803 ] ASF subversion and git services commented on LUCENE-9660: - Commit 486141f0eb01c892dbeeed67060b5b4adc77d38d in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=486141f ] LUCENE-9660: correct help/tests.txt. > gradle task cache should not cache --tests > -- > > Key: LUCENE-9660 > URL: https://issues.apache.org/jira/browse/LUCENE-9660 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: David Smiley >Assignee: Dawid Weiss >Priority: Minor > Fix For: main (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > I recently ran a specific test at the CLI via gradle to see if a particular > build failure repeats. It includes the {{--tests}} command line option to > specify the test. The test passed. Later I wanted to run it again; I > suspected it might be flakey. Gradle completed in 10 seconds, and I'm > certain it didn't actually run the test. There was no printout and the > build/test-results/test/outputs/... from the test run still had not changed > from previously. > Mike Drob informed me of "gradlew cleanTest" but I'd prefer to not have to > know about that, at least not for the specific case of wanting to execute a > specific test. > CC [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10163) Review top-level *.txt and *.md files
[ https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434808#comment-17434808 ] ASF subversion and git services commented on LUCENE-10163: -- Commit 1613355149e5fc11d0804b457742f5862e843ae2 in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1613355 ] LUCENE-10163: update smoke tester - README inside lucene/ is no longer there in the source release. > Review top-level *.txt and *.md files > - > > Key: LUCENE-10163 > URL: https://issues.apache.org/jira/browse/LUCENE-10163 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > Some of them contain obsolete pointers and information > (SYSTEM_REQUIREMENTS.md, etc.). > Also, move the files that are distribution-specific (lucene/README.md) to the > distribution project. Otherwise they > give odd, incorrect information like: > {code} > To review the documentation, read the main documentation page, located at: > `docs/index.html` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9660) gradle task cache should not cache --tests
[ https://issues.apache.org/jira/browse/LUCENE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434801#comment-17434801 ] ASF subversion and git services commented on LUCENE-9660: - Commit 81f5b4d6423958890876bd755e4ed68c73fbb612 in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=81f5b4d ] LUCENE-9660: add tests.neverUpToDate=true option which, by default, makes test tasks always execute. (#410) > gradle task cache should not cache --tests > -- > > Key: LUCENE-9660 > URL: https://issues.apache.org/jira/browse/LUCENE-9660 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: David Smiley >Assignee: Dawid Weiss >Priority: Minor > Fix For: main (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > I recently ran a specific test at the CLI via gradle to see if a particular > build failure repeats. It includes the {{--tests}} command line option to > specify the test. The test passed. Later I wanted to run it again; I > suspected it might be flakey. Gradle completed in 10 seconds, and I'm > certain it didn't actually run the test. There was no printout and the > build/test-results/test/outputs/... from the test run still had not changed > from previously. > Mike Drob informed me of "gradlew cleanTest" but I'd prefer to not have to > know about that, at least not for the specific case of wanting to execute a > specific test. > CC [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
[ https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434807#comment-17434807 ] ASF subversion and git services commented on LUCENE-10198: -- Commit 4329450392f11303fdd8ed5352d9cfffca8dc8c1 in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4329450 ] LUCENE-10198: remove debug statement that crept in. > Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack > and system proxies) > --- > > Key: LUCENE-10198 > URL: https://issues.apache.org/jira/browse/LUCENE-10198 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10154) NumericLeafComparator to define getPointValues
[ https://issues.apache.org/jira/browse/LUCENE-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434802#comment-17434802 ] ASF subversion and git services commented on LUCENE-10154: -- Commit 2ed6e4aa78eb6d1fbb90c21c9723313ab5077e83 in lucene's branch refs/heads/hnsw from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2ed6e4a ] LUCENE-10154 NumericLeafComparator to define getPointValues (#364) This patch adds getPointValues to NumericLeafComparatorsimilar how it has getNumericDocValues. Numeric Sort optimization with points relies on the assumption that points and doc values record the same information, as we substitute iterator over doc_values with one over points. If we override getNumericDocValues it almost certainly means that whatever PointValues NumericComparator is going to look at shouldn't be used to skip non-competitive documents. Returning null for pointValues in this case will force comparator NOT to use sort optimization with points, and continue with a traditional way of iterating over doc values. > NumericLeafComparator to define getPointValues > -- > > Key: LUCENE-10154 > URL: https://issues.apache.org/jira/browse/LUCENE-10154 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Minor > Fix For: main (9.0), 8.11 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > NumericLeafComparator must have a method getPointValues similar how it has > getNumericDocValues. > Numeric Sort optimization with points relies on the assumption that points > and doc values record the same information, as we substitute iterator over > doc_values with one over points. > If we extend {{getNumericDocValues}} it almost certainly means that whatever > {{PointValues}} NumericComparator is going to look at shouldn't be used to > skip non-competitive documents. Returning null for pointValues in this case > will force comparator NOT to use sort optimization with points, and continue > with a traditional way of iterating over doc values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
[ https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434804#comment-17434804 ] ASF subversion and git services commented on LUCENE-10198: -- Commit 780846a732b9c3f9c8b0abeae7d1d2c19df524e4 in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=780846a ] LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies) (#405) Co-authored-by: balmukundblr > Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack > and system proxies) > --- > > Key: LUCENE-10198 > URL: https://issues.apache.org/jira/browse/LUCENE-10198 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10199) Drop ZIP binary distribution from release artifacts
[ https://issues.apache.org/jira/browse/LUCENE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434806#comment-17434806 ] ASF subversion and git services commented on LUCENE-10199: -- Commit fb6aaa7b2c28749c93553c7ffb7e5f5a372ad9b3 in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb6aaa7 ] LUCENE-10199: drop binary .zip artifact. (#407) > Drop ZIP binary distribution from release artifacts > --- > > Key: LUCENE-10199 > URL: https://issues.apache.org/jira/browse/LUCENE-10199 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: main (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10206) Implement O(1) count on query cache
[ https://issues.apache.org/jira/browse/LUCENE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434809#comment-17434809 ] ASF subversion and git services commented on LUCENE-10206: -- Commit 941df98c3f718371af4702c92bf6537739120064 in lucene's branch refs/heads/hnsw from Nik Everett [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=941df98 ] LUCENE-10206 Implement O(1) count on query cache (#415) When we load a query into the query cache we always calculate the count of matching documents. This uses that count to power the new `O(1)` `Weight#count` method. > Implement O(1) count on query cache > --- > > Key: LUCENE-10206 > URL: https://issues.apache.org/jira/browse/LUCENE-10206 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nik Everett >Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I'd like to implement the `Weight#count` method in `LRUQueryCache` so cached > queries can quickly return their counts. We already have a count on all of > the bit sets we use for the query cache we just have to store it and "plug it > in". > > I got here because we frequently end up wanting to get counts and I saw hot > `RoaringDocIdSet`'s iterator hot spotting. I don't think it's slow or > anything, but when the collector is just `count++` the iterator is > substantial. It seems like we could frequently avoid the whole thing by > implementing `count` in the query cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10163) Review top-level *.txt and *.md files
[ https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434805#comment-17434805 ] ASF subversion and git services commented on LUCENE-10163: -- Commit 08c03566648c0b024b8160869b3d694c3cebaabd in lucene's branch refs/heads/hnsw from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=08c0356 ] LUCENE-10163: clean up and remove some old cruft in readme files. Move binary release only README.md to the distribution project so that it doesn't look weird in the source tree. (#406) > Review top-level *.txt and *.md files > - > > Key: LUCENE-10163 > URL: https://issues.apache.org/jira/browse/LUCENE-10163 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > Some of them contain obsolete pointers and information > (SYSTEM_REQUIREMENTS.md, etc.). > Also, move the files that are distribution-specific (lucene/README.md) to the > distribution project. Otherwise they > give odd, incorrect information like: > {code} > To review the documentation, read the main documentation page, located at: > `docs/index.html` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 commented on pull request #415: URL: https://github.com/apache/lucene/pull/415#issuecomment-952923025 > jpountz merged commit 941df98 into apache:main 5 hours ago Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10208) Minimum score can decrease in concurrent search
Jim Ferenczi created LUCENE-10208: - Summary: Minimum score can decrease in concurrent search Key: LUCENE-10208 URL: https://issues.apache.org/jira/browse/LUCENE-10208 Project: Lucene - Core Issue Type: Bug Reporter: Jim Ferenczi TestLatLonPointDistanceFeatureQuery#testCompareSorting started to fail sporadically after https://github.com/apache/lucene/pull/331. The test change added in this PR exposes an existing bug in top docs collector. They re-set the minimum score multiple times per segment when a bulk scorer is used. In practice this is not a problem because the local minimum score cannot decrease. However when concurrent search is used, the global minimum score is updated after the local one so that breaks the assertion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
chatman commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r737538697 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 Review comment: For quickstart examples, we don't need the user to use their own configsets. They can start with the default configset, add fields (schema API) and their indexing/searching. > If the consensus is that we're going away from field guessing, then we should not promote the current _default config, but rather be explicit and reference the bundled techproducts configset. I'm more inclined to remove the techproducts configset. They can be downloaded from some web resource for those who need it. > Or better, show them how to use Schema Designer to setup a configset for a certain dataset? +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r737588612 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -143,6 +143,106 @@ private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + /** Builder for UnifiedHighlighter. */ + public abstract static class Builder> { +private IndexSearcher searcher; +private Analyzer indexAnalyzer; +private boolean handleMultiTermQuery = true; +private boolean highlightPhrasesStrictly = true; +private boolean passageRelevancyOverSpeed = true; +private int maxLength = DEFAULT_MAX_LENGTH; +private Supplier breakIterator = +() -> BreakIterator.getSentenceInstance(Locale.ROOT); +private Predicate fieldMatcher; +private PassageScorer scorer = new PassageScorer(); +private PassageFormatter formatter = new DefaultPassageFormatter(); +private int maxNoHighlightPassages = -1; +private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + +public T withSearcher(IndexSearcher value) { + this.searcher = value; + return self(); +} + +public T withIndexAnalyzer(Analyzer value) { + this.indexAnalyzer = value; + return self(); +} + +public T withHandleMultiTermQuery(boolean value) { + this.handleMultiTermQuery = value; + return self(); +} + +public T withHighlightPhrasesStrictly(boolean value) { + this.highlightPhrasesStrictly = value; + return self(); +} + +public T withPassageRelevancyOverSpeed(boolean value) { + this.passageRelevancyOverSpeed = value; + return self(); +} + +public T withMaxLength(int value) { + if (value < 0 || value == Integer.MAX_VALUE) { +// two reasons: no overflow problems in BreakIterator.preceding(offset+1), +// our sentinel in the offsets queue uses this value to terminate. +throw new IllegalArgumentException("maxLength must be < Integer.MAX_VALUE"); + } + this.maxLength = value; + return self(); +} + +public T withBreakIterator(Supplier value) { + this.breakIterator = value; + return self(); +} + +public T withFieldMatcher(Predicate value) { + this.fieldMatcher = value; + return self(); +} + +public T withScorer(PassageScorer value) { + this.scorer = value; + return self(); +} + +public T withFormatter(PassageFormatter value) { + this.formatter = value; + return self(); +} + +public T withMaxNoHighlightPassages(int value) { + this.maxNoHighlightPassages = value; + return self(); +} + +public T withCacheFieldValCharsThreshold(int value) { + this.cacheFieldValCharsThreshold = value; + return self(); +} + +protected abstract T self(); + +public UnifiedHighlighter build() { + return new UnifiedHighlighter(this); +} + } + + // Why? https://web.archive.org/web/20150920054846/https://weblogs.java.net/node/642849 Review comment: @dsmiley Is there a way to run the checks again on the code? I see that 1/3 checks failed. The failure was due to `socket hang up`. I wonder if retrying might work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
jtibshirani commented on a change in pull request #413: URL: https://github.com/apache/lucene/pull/413#discussion_r737599366 ## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ## @@ -25,18 +25,10 @@ import java.io.IOException; import java.util.HashSet; import java.util.Set; -import org.apache.lucene.document.Document; -import org.apache.lucene.document.Field; -import org.apache.lucene.document.KnnVectorField; -import org.apache.lucene.document.StringField; -import org.apache.lucene.index.DirectoryReader; -import org.apache.lucene.index.IndexReader; -import org.apache.lucene.index.IndexWriter; -import org.apache.lucene.index.IndexWriterConfig; -import org.apache.lucene.index.RandomIndexWriter; -import org.apache.lucene.index.Term; -import org.apache.lucene.index.VectorSimilarityFunction; +import org.apache.lucene.document.*; +import org.apache.lucene.index.*; Review comment: I also noticed our static analysis is totally fine with it (surprisingly?) I'll need to fix my IntelliJ setup :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova opened a new pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova opened a new pull request #416: URL: https://github.com/apache/lucene/pull/416 Currently HNSW has only a single layer. This patch attempts to make multi-layered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova commented on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953142944 Benchmarking based on @jtibshirani [setup](based on https://github.com/jtibshirani/lucene/pull/1) baseline: main branch candidate: this PR **glove-25-angular** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.626 |10962.821 |0.631 | 8869.807 | | n_cands=50 | 0.888 | 4409.952 |0.889 | 4111.685 | | n_cands=100 | 0.946 | 2621.846 |0.947 | 2734.787 | | n_cands=500 | 0.994 | 661.253 |0.994 | 686.700 | | n_cands=800 | 0.997 | 430.172 |0.997 | 459.356 | | n_cands=1000 | 0.998 | 342.915 |0.998 | 355.238 | **sift-128-euclidean** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.601 | 6948.736 |0.607 | 6677.931 | | n_cands=50 | 0.889 | 3003.781 |0.892 | 3202.925 | | n_cands=100 | 0.952 | 1622.276 |0.953 | 1996.992 | | n_cands=500 | 0.996 | 444.135 |0.996 | 540.368 | | n_cands=800 | 0.998 | 296.835 |0.998 | 367.316 | | n_cands=1000 | 0.999 | 245.498 |0.999 | 311.339 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova edited a comment on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova edited a comment on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953142944 Benchmarking based on @jtibshirani [setup](based on https://github.com/jtibshirani/lucene/pull/1) baseline: main branch candidate: this PR **glove-25-angular** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.626 |10962.821 |0.631 | 8869.807 | | n_cands=50 | 0.888 | 4409.952 |0.889 | 4111.685 | | n_cands=100 | 0.946 | 2621.846 |0.947 | 2734.787 | | n_cands=500 | 0.994 | 661.253 |0.994 | 686.700 | | n_cands=800 | 0.997 | 430.172 |0.997 | 459.356 | | n_cands=1000 | 0.998 | 342.915 |0.998 | 355.238 | **sift-128-euclidean** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.601 | 6948.736 |0.607 | 6677.931 | | n_cands=50 | 0.889 | 3003.781 |0.892 | 3202.925 | | n_cands=100 | 0.952 | 1622.276 |0.953 | 1996.992 | | n_cands=500 | 0.996 | 444.135 |0.996 | 540.368 | | n_cands=800 | 0.998 | 296.835 |0.998 | 367.316 | | n_cands=1000 | 0.999 | 245.498 |0.999 | 311.339 | As can be seen from the comparison, there is very slight change that the hierarchy brings: a small increase in recall by at the expense of lower QPSs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
jtibshirani merged pull request #413: URL: https://github.com/apache/lucene/pull/413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9614) Implement KNN Query
[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434996#comment-17434996 ] ASF subversion and git services commented on LUCENE-9614: - Commit abd5ec4ff0b56b1abfc2883e47e75871e60d3cad in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=abd5ec4 ] LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0 (#413) When the reader has no live docs, `KnnVectorQuery` can error out. This happens because `IndexReader#numDocs` is 0, and we end up passing an illegal value of `k = 0` to the search method. This commit removes the problematic optimization in `KnnVectorQuery` and replaces with a lower-level based on the total number of vectors in the segment. > Implement KNN Query > --- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
jtibshirani commented on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953198759 > As can be seen from the comparison, there is very slight change that the hierarchy brings: a small increase in recall by at the expense of lower QPSs It looks like QPS is sometimes worse, but often better (like in all the sift-128-euclidean runs, num_cands >=50). I wonder if the first runs are affected by a lack of warm-up? My original set-up you linked to didn't include a warm-up, and in LUCENE-9937 we found that this can have a big impact on the first runs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10200) Restructure and modernize the release artifacts
[ https://issues.apache.org/jira/browse/LUCENE-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10200: - Description: This is an umbrella issue for various sub-tasks as per my e-mail [1]. [1] [https://markmail.org/thread/f7yrggnynq2ijgmy] In this order, perhaps: * Apply small text file changes (LUCENE-10163) * Simplify artifacts (LUCENE-10199 drop ZIP binary), (LUCENE-10192 drop third party JARs). * Create an additional binary artifact for Luke (LUCENE-9978). * Review the content of licenses/ - there are some entries there that relate to tests only (jetty). * Test everything with the smoke tester. was: This is an umbrella issue for various sub-tasks as per my e-mail [1]. [1] https://markmail.org/thread/f7yrggnynq2ijgmy In this order, perhaps: * Apply small text file changes (LUCENE-10163) * Simplify artifacts (LUCENE-10199 drop ZIP binary), (LUCENE-10192 drop third party JARs). * Create an additional binary artifact for Luke (LUCENE-9978). * Review the content of licenses/ - there are some entries there that relate to tests only (jetty) or oddballs like elegant-icon-font or a stray pddl*.txt. * Test everything with the smoke tester. > Restructure and modernize the release artifacts > --- > > Key: LUCENE-10200 > URL: https://issues.apache.org/jira/browse/LUCENE-10200 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > This is an umbrella issue for various sub-tasks as per my e-mail [1]. > [1] [https://markmail.org/thread/f7yrggnynq2ijgmy] > In this order, perhaps: > * Apply small text file changes (LUCENE-10163) > * Simplify artifacts (LUCENE-10199 drop ZIP binary), (LUCENE-10192 drop > third party JARs). > * Create an additional binary artifact for Luke (LUCENE-9978). > * Review the content of licenses/ - there are some entries there that relate > to tests only (jetty). > * Test everything with the smoke tester. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10200) Restructure and modernize the release artifacts
[ https://issues.apache.org/jira/browse/LUCENE-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435007#comment-17435007 ] ASF subversion and git services commented on LUCENE-10200: -- Commit 62eb9a809e8e6327df0006efd342b980b2d18bd9 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=62eb9a8 ] LUCENE-10200: remove unused dangling license exclusions. Add references to the remaining ones. > Restructure and modernize the release artifacts > --- > > Key: LUCENE-10200 > URL: https://issues.apache.org/jira/browse/LUCENE-10200 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > This is an umbrella issue for various sub-tasks as per my e-mail [1]. > [1] https://markmail.org/thread/f7yrggnynq2ijgmy > In this order, perhaps: > * Apply small text file changes (LUCENE-10163) > * Simplify artifacts (LUCENE-10199 drop ZIP binary), (LUCENE-10192 drop third > party JARs). > * Create an additional binary artifact for Luke (LUCENE-9978). > * Review the content of licenses/ - there are some entries there that relate > to tests only (jetty) or oddballs like elegant-icon-font or a stray pddl*.txt. > * Test everything with the smoke tester. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10209) Gradle wrapper validation gh workflow step fails with odd messages
[ https://issues.apache.org/jira/browse/LUCENE-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435016#comment-17435016 ] ASF subversion and git services commented on LUCENE-10209: -- Commit 727c6b1e0b1429bc521174ab5c60bebf0e0178e1 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=727c6b1 ] LUCENE-10209: Temporarily comment out gradle validation. > Gradle wrapper validation gh workflow step fails with odd messages > -- > > Key: LUCENE-10209 > URL: https://issues.apache.org/jira/browse/LUCENE-10209 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Minor > > I will comment it out for the time being. Don't know what's causing it. > https://github.com/gradle/wrapper-validation-action/issues/46 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10209) Gradle wrapper validation gh workflow step fails with odd messages
Dawid Weiss created LUCENE-10209: Summary: Gradle wrapper validation gh workflow step fails with odd messages Key: LUCENE-10209 URL: https://issues.apache.org/jira/browse/LUCENE-10209 Project: Lucene - Core Issue Type: Task Reporter: Dawid Weiss I will comment it out for the time being. Don't know what's causing it. https://github.com/gradle/wrapper-validation-action/issues/46 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10209) Gradle wrapper validation gh workflow step fails with odd messages
[ https://issues.apache.org/jira/browse/LUCENE-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10209: Assignee: Dawid Weiss > Gradle wrapper validation gh workflow step fails with odd messages > -- > > Key: LUCENE-10209 > URL: https://issues.apache.org/jira/browse/LUCENE-10209 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > I will comment it out for the time being. Don't know what's causing it. > https://github.com/gradle/wrapper-validation-action/issues/46 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
dsmiley commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r737770485 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -143,6 +143,106 @@ private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + /** Builder for UnifiedHighlighter. */ + public abstract static class Builder> { +private IndexSearcher searcher; +private Analyzer indexAnalyzer; +private boolean handleMultiTermQuery = true; +private boolean highlightPhrasesStrictly = true; +private boolean passageRelevancyOverSpeed = true; +private int maxLength = DEFAULT_MAX_LENGTH; +private Supplier breakIterator = +() -> BreakIterator.getSentenceInstance(Locale.ROOT); +private Predicate fieldMatcher; +private PassageScorer scorer = new PassageScorer(); +private PassageFormatter formatter = new DefaultPassageFormatter(); +private int maxNoHighlightPassages = -1; +private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + +public T withSearcher(IndexSearcher value) { + this.searcher = value; + return self(); +} + +public T withIndexAnalyzer(Analyzer value) { + this.indexAnalyzer = value; + return self(); +} + +public T withHandleMultiTermQuery(boolean value) { + this.handleMultiTermQuery = value; + return self(); +} + +public T withHighlightPhrasesStrictly(boolean value) { + this.highlightPhrasesStrictly = value; + return self(); +} + +public T withPassageRelevancyOverSpeed(boolean value) { + this.passageRelevancyOverSpeed = value; + return self(); +} + +public T withMaxLength(int value) { + if (value < 0 || value == Integer.MAX_VALUE) { +// two reasons: no overflow problems in BreakIterator.preceding(offset+1), +// our sentinel in the offsets queue uses this value to terminate. +throw new IllegalArgumentException("maxLength must be < Integer.MAX_VALUE"); + } + this.maxLength = value; + return self(); +} + +public T withBreakIterator(Supplier value) { + this.breakIterator = value; + return self(); +} + +public T withFieldMatcher(Predicate value) { + this.fieldMatcher = value; + return self(); +} + +public T withScorer(PassageScorer value) { + this.scorer = value; + return self(); +} + +public T withFormatter(PassageFormatter value) { + this.formatter = value; + return self(); +} + +public T withMaxNoHighlightPassages(int value) { + this.maxNoHighlightPassages = value; + return self(); +} + +public T withCacheFieldValCharsThreshold(int value) { + this.cacheFieldValCharsThreshold = value; + return self(); +} + +protected abstract T self(); + +public UnifiedHighlighter build() { + return new UnifiedHighlighter(this); +} + } + + // Why? https://web.archive.org/web/20150920054846/https://weblogs.java.net/node/642849 Review comment: I suspect you intended a top-level comment but responded to a previous thread about builder subclassing. Any way; I re-ran them and they passed. I wouldn't worry about this; just run locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova edited a comment on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova edited a comment on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953142944 Benchmarking based on @jtibshirani [setup](https://github.com/jtibshirani/lucene/pull/1) baseline: main branch candidate: this PR **glove-25-angular** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.626 |10962.821 |0.631 | 8869.807 | | n_cands=50 | 0.888 | 4409.952 |0.889 | 4111.685 | | n_cands=100 | 0.946 | 2621.846 |0.947 | 2734.787 | | n_cands=500 | 0.994 | 661.253 |0.994 | 686.700 | | n_cands=800 | 0.997 | 430.172 |0.997 | 459.356 | | n_cands=1000 | 0.998 | 342.915 |0.998 | 355.238 | **sift-128-euclidean** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.601 | 6948.736 |0.607 | 6677.931 | | n_cands=50 | 0.889 | 3003.781 |0.892 | 3202.925 | | n_cands=100 | 0.952 | 1622.276 |0.953 | 1996.992 | | n_cands=500 | 0.996 | 444.135 |0.996 | 540.368 | | n_cands=800 | 0.998 | 296.835 |0.998 | 367.316 | | n_cands=1000 | 0.999 | 245.498 |0.999 | 311.339 | As can be seen from the comparison, there is very slight change that the hierarchy brings: a small increase in recall by at the expense of lower QPSs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova commented on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953265461 @jtibshirani Thanks for the comment. > I wonder if the first runs are affected by a lack of warm-up? I've added a warmup stage as well, but starting with bogus query args in ann benchmarking algorithm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova edited a comment on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova edited a comment on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953265461 @jtibshirani Thanks for the comment. > I wonder if the first runs are affected by a lack of warm-up? I've added a warmup stage as well, but starting with bogus query args in [ann benchmarking algorithm](https://github.com/jtibshirani/ann-benchmarks/blob/lucene-hnsw/algos.yaml#L70) . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r737860855 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -143,6 +143,106 @@ private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + /** Builder for UnifiedHighlighter. */ + public abstract static class Builder> { +private IndexSearcher searcher; +private Analyzer indexAnalyzer; +private boolean handleMultiTermQuery = true; +private boolean highlightPhrasesStrictly = true; +private boolean passageRelevancyOverSpeed = true; +private int maxLength = DEFAULT_MAX_LENGTH; +private Supplier breakIterator = +() -> BreakIterator.getSentenceInstance(Locale.ROOT); +private Predicate fieldMatcher; +private PassageScorer scorer = new PassageScorer(); +private PassageFormatter formatter = new DefaultPassageFormatter(); +private int maxNoHighlightPassages = -1; +private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; + +public T withSearcher(IndexSearcher value) { + this.searcher = value; + return self(); +} + +public T withIndexAnalyzer(Analyzer value) { + this.indexAnalyzer = value; + return self(); +} + +public T withHandleMultiTermQuery(boolean value) { + this.handleMultiTermQuery = value; + return self(); +} + +public T withHighlightPhrasesStrictly(boolean value) { + this.highlightPhrasesStrictly = value; + return self(); +} + +public T withPassageRelevancyOverSpeed(boolean value) { + this.passageRelevancyOverSpeed = value; + return self(); +} + +public T withMaxLength(int value) { + if (value < 0 || value == Integer.MAX_VALUE) { +// two reasons: no overflow problems in BreakIterator.preceding(offset+1), +// our sentinel in the offsets queue uses this value to terminate. +throw new IllegalArgumentException("maxLength must be < Integer.MAX_VALUE"); + } + this.maxLength = value; + return self(); +} + +public T withBreakIterator(Supplier value) { + this.breakIterator = value; + return self(); +} + +public T withFieldMatcher(Predicate value) { + this.fieldMatcher = value; + return self(); +} + +public T withScorer(PassageScorer value) { + this.scorer = value; + return self(); +} + +public T withFormatter(PassageFormatter value) { + this.formatter = value; + return self(); +} + +public T withMaxNoHighlightPassages(int value) { + this.maxNoHighlightPassages = value; + return self(); +} + +public T withCacheFieldValCharsThreshold(int value) { + this.cacheFieldValCharsThreshold = value; + return self(); +} + +protected abstract T self(); + +public UnifiedHighlighter build() { + return new UnifiedHighlighter(this); +} + } + + // Why? https://web.archive.org/web/20150920054846/https://weblogs.java.net/node/642849 Review comment: @dsmiley ah yes you are right. Thanks for rerunning the tests. Does the new builder class look right to you? Is it expected to remove all setters from this class? This would mean I'll have to modify all their references in other classes and unit tests and replace them with builders. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
dsmiley commented on pull request #412: URL: https://github.com/apache/lucene/pull/412#issuecomment-953329902 > Does the new builder class look right to you? Is it expected to remove all setters from this class? This would mean I'll have to modify all their references in other classes and unit tests and replace them with builders. It looks good at a glance... you/I will see better if you update one of the clients that might want to subclass with extra configuration. Is there any or is this builder subclassing issue entirely hypothetical at this point? I suspect only hypothetical. We'll want nice Javadocs on the builder setters since this is where consumers/clients will see it. We can merely move the docs there from the existing locations, and add javadoc references pointing to the builder from the existing fields/enum values as desired. > This would mean I'll have to modify all their references in other classes and unit tests and replace them with builders. Yes, that's the point of this issue. You might try updating just one/two source files (presumably tests) and see how it goes. If there's some ugliness that brings doubt then maybe stop and share, otherwise continue. RE 9.0. If this doesn't make 9.0, then the actual removal of the setters would happen in 10 but the rest of it (the builder) could arrive in 9.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on pull request #412: URL: https://github.com/apache/lucene/pull/412#issuecomment-95323 > It looks good at a glance... you/I will see better if you update one of the clients that might want to subclass with extra configuration. Is there any or is this builder subclassing issue entirely hypothetical at this point? I suspect only hypothetical. We'll want nice Javadocs on the builder setters since this is where consumers/clients will see it. We can merely move the docs there from the existing locations, and add javadoc references pointing to the builder from the existing fields/enum values as desired. Great. I added a unit test (just for demo) and a class `SubUnifiedHighlighter` in `TestUnifiedHighlighter.java` where I've added a new test field and also tested it. It does look right to me since it is able to use the new field and also fields from parent class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-10207: - Attachment: LUCENE-10207_multitermquery.patch > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435109#comment-17435109 ] Robert Muir commented on LUCENE-10207: -- I attached a patch that refactors {{TermInSetQuery}} to extend {{MultiTermQuery}}. Instead of {{seekExact}}'ing to e.g. thousands of terms like the current query, it acts more like AutomatonQuery: ping-pong intersects the {{PrefixCodedTerms}} against the terms dictionary. With the change, if you want it to run against DV instead terms/postings, you can just call {{termInSetQuery.setRewriteMethod(new DocValuesRewriteMethod())}} and it should work to provide a "slow" implementation. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435115#comment-17435115 ] Robert Muir commented on LUCENE-10207: -- {quote} Should we try to take deletions into account at all? Because a PK field with deletions will look like it is not "precisely" PK based on the aggregate stats. Though I suppose even with e.g. 50% deletions in the index, this proposed cost metric is close enough. {quote} Deletions are irrelevant, term statistics don't reflect deletions. If the same term is in segmentM (and its doc deleted) and then its also in segmentN (with the updated doc), it causes no issue for the proposed estimation here because the stats are per-segment. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435116#comment-17435116 ] Robert Muir commented on LUCENE-10207: -- cc [~uschindler] if you get a chance to look at the MultiTermQuery patch. It has been a long time since I tried to subclass FilteredTermsEnum. The assert statements in this subclass were extremely helpful > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova edited a comment on pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova edited a comment on pull request #416: URL: https://github.com/apache/lucene/pull/416#issuecomment-953142944 Benchmarking based on @jtibshirani [setup](https://github.com/jtibshirani/lucene/pull/1) baseline: main branch candidate: this PR **glove-25-angular** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.626 |10962.821 |0.631 | 8869.807 | | n_cands=50 | 0.888 | 4409.952 |0.889 | 4111.685 | | n_cands=100 | 0.946 | 2621.846 |0.947 | 2734.787 | | n_cands=500 | 0.994 | 661.253 |0.994 | 686.700 | | n_cands=800 | 0.997 | 430.172 |0.997 | 459.356 | | n_cands=1000 | 0.998 | 342.915 |0.998 | 355.238 | **glove-200-angular** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.285 | 4843.028 |0.312 | 5208.453 | | n_cands=50 | 0.556 | 2119.933 |0.558 | 2250.213 | | n_cands=100 | 0.655 | 1399.261 |0.648 | 1454.996 | | n_cands=500 | 0.806 | 379.745 |0.806 | 410.553 | | n_cands=800 | 0.836 | 252.796 |0.836 | 276.456 | | n_cands=1000 | 0.849 | 201.012 |0.849 | 220.739 | **sift-128-euclidean** | | baseline recall | baseline QPS | candidate recall | candidate QPS | | | --: | ---: | ---: | : | | n_cands=10 | 0.601 | 6948.736 |0.607 | 6677.931 | | n_cands=50 | 0.889 | 3003.781 |0.892 | 3202.925 | | n_cands=100 | 0.952 | 1622.276 |0.953 | 1996.992 | | n_cands=500 | 0.996 | 444.135 |0.996 | 540.368 | | n_cands=800 | 0.998 | 296.835 |0.998 | 367.316 | | n_cands=1000 | 0.999 | 245.498 |0.999 | 311.339 | As can be seen from the comparison, there is very slight change that the hierarchy brings: a small increase in recall by at the expense of lower QPSs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 edited a comment on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 edited a comment on pull request #412: URL: https://github.com/apache/lucene/pull/412#issuecomment-95323 > It looks good at a glance... you/I will see better if you update one of the clients that might want to subclass with extra configuration. Is there any or is this builder subclassing issue entirely hypothetical at this point? I suspect only hypothetical. We'll want nice Javadocs on the builder setters since this is where consumers/clients will see it. We can merely move the docs there from the existing locations, and add javadoc references pointing to the builder from the existing fields/enum values as desired. @dsmiley Great. I added a unit test (just for demo) and a class `SubUnifiedHighlighter` in `TestUnifiedHighlighter.java` where I've added a new test field and also tested it. It does look right to me since it is able to use the new field and also fields from parent class. I can add some unit tests to test the new builder, then I can focus on modifying the javadocs in order to introduce the builder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul merged pull request #2596: SOLR-15722: Delete Replica does not delete the Per replica state
noblepaul merged pull request #2596: URL: https://github.com/apache/lucene-solr/pull/2596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435190#comment-17435190 ] Adrien Grand commented on LUCENE-10207: --- I have vague memories of playing with the MultiTermQuery approach in the past and it wasn't an obvious win due to the fact that seekExact could return false by just looking at the terms index while the MultiTermQuery approach would always advance to the next term after the target, which would in-turn always decode a frame of the terms dictionary. (It's been a very long time though, so I might remember wrong, or maybe other changes have been made since then so that this is no longer a problem.) > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org