[jira] [Assigned] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-13893: --- Assignee: Munendra S N > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-13944: --- Assignee: Munendra S N > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > &fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannick Welsch updated LUCENE-9264: --- Fix Version/s: master (9.0) > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9265) Deprecate SimpleFSDirectory
Yannick Welsch created LUCENE-9265: -- Summary: Deprecate SimpleFSDirectory Key: LUCENE-9265 URL: https://issues.apache.org/jira/browse/LUCENE-9265 Project: Lucene - Core Issue Type: Sub-task Reporter: Yannick Welsch -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit
s1monw commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r388786805 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterEvents.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. + +package org.apache.lucene.index; + +/** + * Callback interface to signal various actions taken by IndexWriter. + * + * @lucene.experimental + */ +public interface IndexWriterEvents { + /** + * A default implementation that ignores all events. + */ + IndexWriterEvents NULL_EVENTS = new IndexWriterEvents() { +@Override +public void beginMergeOnCommit() { } + +@Override +public void finishMergeOnCommit() { } + +@Override +public void abandonedMergesOnCommit(int abandonedCount) { } + }; + + /** + * Signals the start of waiting for a merge on commit, returned from + * {@link MergePolicy#findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)}. + */ + void beginMergeOnCommit(); Review comment: I am not really happy with this interface. First and foremost it's only partially used in this PR. I also think it doesn't belong here but rather into a merge policy? I think IW and merge lifecycle should not be tightly coupled. Can we achieve the same with an interface a MP can provide to the IW rather than setting it on the IW config. A pull model should be used here instead IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053200#comment-17053200 ] Jan Høydahl commented on LUCENE-9033: - I have started work on releaseWizard.py but not yet ready. > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > * releaseWizard.py > * ReleaseTODO page > * addBackcompatIndexes.py > * archive-solr-ref-guide.sh > * createPatch.py > * publish-solr-ref-guide.sh > * solr-ref-gudie/src/meta-docs/publish.adoc > There may be others -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9033: Description: * releaseWizard.py (Started: janhoy) * ReleaseTODO page * addBackcompatIndexes.py * archive-solr-ref-guide.sh * createPatch.py * publish-solr-ref-guide.sh * -solr-ref-gudie/src/meta-docs/publish.adoc- (/) Done There may be others was: * releaseWizard.py * ReleaseTODO page * addBackcompatIndexes.py * archive-solr-ref-guide.sh * createPatch.py * publish-solr-ref-guide.sh * solr-ref-gudie/src/meta-docs/publish.adoc There may be others > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > * releaseWizard.py (Started: janhoy) > * ReleaseTODO page > * addBackcompatIndexes.py > * archive-solr-ref-guide.sh > * createPatch.py > * publish-solr-ref-guide.sh > * -solr-ref-gudie/src/meta-docs/publish.adoc- (/) Done > There may be others -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ywelsch opened a new pull request #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory
ywelsch opened a new pull request #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory URL: https://github.com/apache/lucene-solr/pull/1321 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053208#comment-17053208 ] Bruno Roustant commented on SOLR-14040: --- This is a good opportunity for me to learn how we deal with feature development and work in progress. As I understand shared-schema (SolrCloud or not) is still in development and not documented, still with some limitations/problems. Do we document in the ref guide features in progress? Because it seems weird to me to document a limitation in the ref guide for a feature that is not yet documented as available. If we document the limitation, shouldn't we also document the feature itself, but actually it is not ready... difficult. Is there a section specific to "coming" features? In fact, where do the users learn about this undocumented feature? Directly in the code? This is where we should explain the current limitations and risks. > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Blocker > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-9723) "Error writing document" on document add caused by NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SOLR-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053221#comment-17053221 ] Kenny Knecht commented on SOLR-9723: still happens with 7.2.1 > "Error writing document" on document add caused by NegativeArraySizeException > - > > Key: SOLR-9723 > URL: https://issues.apache.org/jira/browse/SOLR-9723 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 6.2.1 > Environment: Windows Server 2012 R2 x64, Java 1.8.0_111 >Reporter: Seva Alekseyev >Priority: Major > > I'm adding documents to SOLR 6.2.1 via /solr/corename/update in a tight loop > on multiple threads. After some time, SOLR starts throwing intermittent > errors. They don't reproduce. Here's one: > 2016-11-02 02:29:10.997 ERROR (qtp1389647288-10719) [ x:fscan] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception > writing document id 72513253_HS-RNA-Valenzuela-2.xls to the index; possible > analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:939) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1094) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:720) > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) > at > org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:91) > at > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250) > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) > a
[GitHub] [lucene-solr] janhoy opened a new pull request #1322: Remove some unused lines from addBackcompatIndexes.py related to svn
janhoy opened a new pull request #1322: Remove some unused lines from addBackcompatIndexes.py related to svn URL: https://github.com/apache/lucene-solr/pull/1322 This is dead code in a python script. We don't use svn anymore. I did not add corresponding git commands since the releaseWizard explicitly does an add after running the script, and noone has complained for so long time :) Tagging @sarowe since it seems you have touched this script in the past This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1324: LUCENE-9033 Update ReleaseWizard for new website instructions
janhoy opened a new pull request #1324: LUCENE-9033 Update ReleaseWizard for new website instructions URL: https://github.com/apache/lucene-solr/pull/1324 See https://issues.apache.org/jira/browse/LUCENE-9033 This is still work in progress This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053231#comment-17053231 ] Simon Willnauer commented on LUCENE-8962: - I read through this issue and I want to share some of my thoughts. First, I understand the need for this and the motivation, yet every time we add something like this to the IndexWriter to do something _as part of_ another method it triggers an alarm on my end. I have spent hours and days thinking about how IW can be simpler and the biggest issues that I see is that the primitives on IW like commit or openReader are doing too much. Just look at openReader it's pretty involved and changing the bus factor or making it easier to understand is hard. Adding stuff like _wait for merge_ with something like a timeout is not what I think we should do neither to _openReader_ nor to _commit_. That said, I think we can make the same things happen but we should think in primitives rather than changing method behavior with configuration. Let me explain what I mean: Lets say we keep _commit_ and _openReader_ the way it is and would instead allow to use an existing reader NRT or not and allow itself to _optimize_ itself (yeah I said that - it might be a good name after all). With a slightly refactored IW we can share the merge logic and let the reader re-write itself since we are talking about very small segments the overhead is very small. This would in turn mean that we are doing the work twice ie. the IW would do its normal work and might merge later etc. We might even merge this stuff into heap-space or so if we have enough I haven't thought too much about that. This way we can clean up IW potentially and add a very nice optimization that works for commit as well as NRT. We should strive for making IW simpler not do more. I hope I wasn't too discouraging. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9033: Description: *releaseWizard.py:* Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? was: * releaseWizard.py (Started: janhoy) * ReleaseTODO page * addBackcompatIndexes.py * archive-solr-ref-guide.sh * createPatch.py * publish-solr-ref-guide.sh * -solr-ref-gudie/src/meta-docs/publish.adoc- (/) Done There may be others > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > *releaseWizard.py:* Janhoy has started on this, but will likely not finish > before the 8.5 release > *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] > page:* I suggest we deprecate this page if folks are happy with > releaseWizard, which should encapsulate all steps and details, and can also > generate an HTML TODO document per release. > *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we > do not publish PDF anymore > *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done > > There may be other places affected, such as other WIKI pages? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ywelsch opened a new pull request #1323: LUCENE-9265: Deprecate SimpleFSDirectory in favor of NIOFSDirectory
ywelsch opened a new pull request #1323: LUCENE-9265: Deprecate SimpleFSDirectory in favor of NIOFSDirectory URL: https://github.com/apache/lucene-solr/pull/1323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9033: Description: *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? was: *releaseWizard.py:* Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] > Janhoy has started on this, but will likely not finish before the 8.5 release > *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] > page:* I suggest we deprecate this page if folks are happy with > releaseWizard, which should encapsulate all steps and details, and can also > generate an HTML TODO document per release. > *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we > do not publish PDF anymore > *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done > > There may be other places affected, such as other WIKI pages? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053233#comment-17053233 ] Yannick Welsch commented on LUCENE-9264: I've opened a pull request for the removal (linked in this issue) and one for the deprecation (see sub-task). > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on issue #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory
rmuir commented on issue #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory URL: https://github.com/apache/lucene-solr/pull/1321#issuecomment-595695483 Looks great! Thanks for doing this cleanup. will merge it shortly... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw opened a new pull request #1325: Consolidated process event logic after CRUD action
s1monw opened a new pull request #1325: Consolidated process event logic after CRUD action URL: https://github.com/apache/lucene-solr/pull/1325 Today we have duplicated logic on how to convert a seqNo into a real seqNo and process events based on this. This change consolidated the logic into a single method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1326: Remove unused scripts in dev-tools folder
janhoy opened a new pull request #1326: Remove unused scripts in dev-tools folder URL: https://github.com/apache/lucene-solr/pull/1326 Cleanup of unused scripts. Please validate my assumption that this is not in use :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes.
asfgit closed pull request #1320: LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes. URL: https://github.com/apache/lucene-solr/pull/1320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053247#comment-17053247 ] ASF subversion and git services commented on LUCENE-9257: - Commit 97336434661cf32f4674ddb43901219f678e2608 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9733643 ] LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode and Reader attributes. Closes #1320 > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory
asfgit closed pull request #1321: LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory URL: https://github.com/apache/lucene-solr/pull/1321 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053282#comment-17053282 ] ASF subversion and git services commented on LUCENE-9264: - Commit 624f5a3c2f5ab25a44b3e3843dbef36d4ed70602 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=624f5a3 ] LUCENE-9264: Remove SimpleFSDirectory in favor of NIOFSDirectory Closes #1321 > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir merged pull request #1323: LUCENE-9265: Deprecate SimpleFSDirectory in favor of NIOFSDirectory
rmuir merged pull request #1323: LUCENE-9265: Deprecate SimpleFSDirectory in favor of NIOFSDirectory URL: https://github.com/apache/lucene-solr/pull/1323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9265) Deprecate SimpleFSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9265: Fix Version/s: 8.5 > Deprecate SimpleFSDirectory > --- > > Key: LUCENE-9265 > URL: https://issues.apache.org/jira/browse/LUCENE-9265 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Yannick Welsch >Priority: Minor > Fix For: 8.5 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9265) Deprecate SimpleFSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-9265. - Resolution: Fixed > Deprecate SimpleFSDirectory > --- > > Key: LUCENE-9265 > URL: https://issues.apache.org/jira/browse/LUCENE-9265 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Yannick Welsch >Priority: Minor > Fix For: 8.5 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-9264. - Resolution: Fixed > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9265) Deprecate SimpleFSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053286#comment-17053286 ] ASF subversion and git services commented on LUCENE-9265: - Commit c3d9cd1bf35e858cdb2efa550e8ad17d0e5106ef in lucene-solr's branch refs/heads/branch_8x from Yannick Welsch [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c3d9cd1 ] LUCENE-9265: Deprecate SimpleFSDirectory in favor of NIOFSDirectory (#1323) > Deprecate SimpleFSDirectory > --- > > Key: LUCENE-9265 > URL: https://issues.apache.org/jira/browse/LUCENE-9265 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Yannick Welsch >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9265) Deprecate SimpleFSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9265: Fix Version/s: (was: 8.5) 8.6 > Deprecate SimpleFSDirectory > --- > > Key: LUCENE-9265 > URL: https://issues.apache.org/jira/browse/LUCENE-9265 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Yannick Welsch >Priority: Minor > Fix For: 8.6 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9265) Deprecate SimpleFSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053291#comment-17053291 ] ASF subversion and git services commented on LUCENE-9265: - Commit 775900c77680058baae5969241c4b3c5bfd82d2b in lucene-solr's branch refs/heads/branch_8x from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=775900c ] LUCENE-9265: move entry to 8.6 section > Deprecate SimpleFSDirectory > --- > > Key: LUCENE-9265 > URL: https://issues.apache.org/jira/browse/LUCENE-9265 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Yannick Welsch >Priority: Minor > Fix For: 8.6 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9033) Update Release docs an scripts with new site instructions
[ https://issues.apache.org/jira/browse/LUCENE-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9033: Description: *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* [PR#1326|https://github.com/apache/lucene-solr/pull/1326] This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? was: *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] Janhoy has started on this, but will likely not finish before the 8.5 release *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] page:* I suggest we deprecate this page if folks are happy with releaseWizard, which should encapsulate all steps and details, and can also generate an HTML TODO document per release. *publish-solr-ref-guide.sh:* This script can be deleted, not in use since we do not publish PDF anymore *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done There may be other places affected, such as other WIKI pages? > Update Release docs an scripts with new site instructions > - > > Key: LUCENE-9033 > URL: https://issues.apache.org/jira/browse/LUCENE-9033 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/tools >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > *releaseWizard.py:* [PR#1324|https://github.com/apache/lucene-solr/pull/1324] > Janhoy has started on this, but will likely not finish before the 8.5 release > *[ReleaseTODO|https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo] > page:* I suggest we deprecate this page if folks are happy with > releaseWizard, which should encapsulate all steps and details, and can also > generate an HTML TODO document per release. > *publish-solr-ref-guide.sh:* > [PR#1326|https://github.com/apache/lucene-solr/pull/1326] This script can be > deleted, not in use since we do not publish PDF anymore > *(/) solr-ref-gudie/src/meta-docs/publish.adoc:* Done > > There may be other places affected, such as other WIKI pages? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #880: Tweak header format.
janhoy closed pull request #880: Tweak header format. URL: https://github.com/apache/lucene-solr/pull/880 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #404: Comment to explain how to use URLClassifyProcessorFactory
janhoy commented on issue #404: Comment to explain how to use URLClassifyProcessorFactory URL: https://github.com/apache/lucene-solr/pull/404#issuecomment-595723278 @ohtwadi Do you want to address the review comment so we can merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053338#comment-17053338 ] ASF subversion and git services commented on SOLR-13942: Commit 4cf37ade3531305d508e383b9c16a0c5690bacae in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4cf37ad ] Revert "SOLR-13942: /api/cluster/zk/* to fetch raw ZK data" This reverts commit bc6fa3b65060b17a88013a0378f4a9d285067d82. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053339#comment-17053339 ] ASF subversion and git services commented on SOLR-13942: Commit a8e7895c3007f3aa7e58bc52fb610416e80850a6 in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a8e7895 ] Revert "SOLR-13942: /api/cluster/zk/* to fetch raw ZK data" This reverts commit 2044f8c83ebb0775d76b1e96c168ca936701abd4. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1327: SOLR-13942: /api/cluster/zk/* to fetch raw ZK data
noblepaul opened a new pull request #1327: SOLR-13942: /api/cluster/zk/* to fetch raw ZK data URL: https://github.com/apache/lucene-solr/pull/1327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053344#comment-17053344 ] Noble Paul commented on SOLR-13942: --- I've opened a new PR. added more tests . Please review > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: New Feature > Components: v2 API >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14309) Expose GC logs via an HTTP API
Noble Paul created SOLR-14309: - Summary: Expose GC logs via an HTTP API Key: SOLR-14309 URL: https://issues.apache.org/jira/browse/SOLR-14309 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14310) Expose solr logs with basic filters via HTTP
Noble Paul created SOLR-14310: - Summary: Expose solr logs with basic filters via HTTP Key: SOLR-14310 URL: https://issues.apache.org/jira/browse/SOLR-14310 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053361#comment-17053361 ] Dawid Weiss commented on LUCENE-9241: - I wasn't really that much concerned; just pointing out the (sad) fact of how it's implemented for Windows. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13944: Attachment: SOLR-13944.patch > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > &fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053381#comment-17053381 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053380#comment-17053380 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053382#comment-17053382 ] ASF subversion and git services commented on LUCENE-8962: - Commit 90aced5a51f92ffd6e97449eb7c44aacc643c8a3 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90aced5 ] LUCENE-8962: Split test case (#1313) * LUCENE-8962: Simplify test case The testMergeOnCommit test case was trying to verify too many things at once: basic semantics of merge on commit and proper behavior when a bunch of indexing threads are writing and committing all at once. Now we just verify basic behavior, with strict assertions on invariants, while leaving it to MockRandomMergePolicy to enable merge on commit in existing test cases to verify that indexing generally works as expected and no new unexpected exceptions are thrown. * LUCENE-8962: Only update toCommit if merge was committed The code was previously assuming that if mergeFinished() was called and isAborted() was false, then the merge must have completed successfully. Instead, we should know for sure if a given merge was committed, and only then update our pending commit SegmentInfos. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13944: Status: Patch Available (was: Open) > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > &fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053385#comment-17053385 ] Munendra S N commented on SOLR-13944: - [^SOLR-13944.patch] Initial patch for fixing NPE. This is valid, as defType for fq is by default is lucene and then localParams syntax is parsed but the case of tagging for collapse filter wasn't handled in SOLR-8807 (it was doing a simple string match). Here, I have replaced it with filter parsing, without that we can't know if there is collapse filter or not. {noformat} fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price asc, id asc'} {noformat} [~tflobbe] As you had asked the user to create the JIRA issue, I would prefer if you could take look at this patch > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > &fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11725) json.facet's stddev() function should be changed to use the "Corrected sample stddev" formula
[ https://issues.apache.org/jira/browse/SOLR-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053387#comment-17053387 ] Munendra S N commented on SOLR-11725: - I'm planning to commit this weekend (only to master), let me know if there are any concerns > json.facet's stddev() function should be changed to use the "Corrected sample > stddev" formula > - > > Key: SOLR-11725 > URL: https://issues.apache.org/jira/browse/SOLR-11725 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-11725.patch, SOLR-11725.patch, SOLR-11725.patch > > > While working on some equivalence tests/demonstrations for > {{facet.pivot+stats.field}} vs {{json.facet}} I noticed that the {{stddev}} > calculations done between the two code paths can be measurably different, and > realized this is due to them using very different code... > * {{json.facet=foo:stddev(foo)}} > ** {{StddevAgg.java}} > ** {{Math.sqrt((sumSq/count)-Math.pow(sum/count, 2))}} > * {{stats.field=\{!stddev=true\}foo}} > ** {{StatsValuesFactory.java}} > ** {{Math.sqrt(((count * sumOfSquares) - (sum * sum)) / (count * (count - > 1.0D)))}} > Since I"m not really a math guy, I consulting with a bunch of smart math/stat > nerds I know online to help me sanity check if these equations (some how) > reduced to eachother (In which case the discrepancies I was seeing in my > results might have just been due to the order of intermediate operation > execution & floating point rounding differences). > They confirmed that the two bits of code are _not_ equivalent to each other, > and explained that the code JSON Faceting is using is equivalent to the > "Uncorrected sample stddev" formula, while StatsComponent's code is > equivalent to the the "Corrected sample stddev" formula... > https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation > When I told them that stuff like this is why no one likes mathematicians and > pressed them to explain which one was the "most canonical" (or "most > generally applicable" or "best") definition of stddev, I was told that: > # This is something statisticians frequently disagree on > # Practically speaking the diff between the calculations doesn't tend to > differ significantly when count is "very large" > # _"Corrected sample stddev" is more appropriate when comparing two > distributions_ > Given that: > * the primary usage of computing the stddev of a field/function against a > Solr result set (or against a sub-set of results defined by a facet > constraint) is probably to compare that distribution to a different Solr > result set (or to compare N sub-sets of results defined by N facet > constraints) > * the size of the sets of documents (values) can be relatively small when > computing stats over facet constraint sub-sets > ...it seems like {{StddevAgg.java}} should be updated to use the "Corrected > sample stddev" equation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter.
bruno-roustant opened a new pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. URL: https://github.com/apache/lucene-solr/pull/1328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053400#comment-17053400 ] Bruno Roustant commented on LUCENE-9257: While preparing the port to the 8x branch I saw that I forgot a significant cleanup: the openedFromWriter boolean, which was also added to support FSTLoadMode logic. So I also remove it. For visibility I added PR#1328, but I'll commit it immediately. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant closed pull request #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific.
bruno-roustant closed pull request #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific. URL: https://github.com/apache/lucene-solr/pull/1305 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on issue #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific.
bruno-roustant commented on issue #1305: LUCENE-9257: Make FSTLoadMode enum not BlockTree specific. URL: https://github.com/apache/lucene-solr/pull/1305#issuecomment-595761243 Replaced by https://github.com/apache/lucene-solr/pull/1320 to always keep FST off-heap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram opened a new pull request #1329: SOLR-14275: Policy calculations are very slow for large clusters and large operations
sigram opened a new pull request #1329: SOLR-14275: Policy calculations are very slow for large clusters and large operations URL: https://github.com/apache/lucene-solr/pull/1329 # Description See JIRA for the explanation of the problem. # Solution Try and reduce the combinatoric explosion in the candidate placements. Use caching more effectively. # Tests Manual performance tests using the scenario.txt attached to JIRA. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053412#comment-17053412 ] ASF subversion and git services commented on LUCENE-9257: - Commit c73d2c15ba7c5936715408807184c99ab7cfdfd4 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c73d2c1 ] LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn edited a comment on issue #1155: LUCENE-8962: Add ability to selectively merge on commit
dnhatn edited a comment on issue #1155: LUCENE-8962: Add ability to selectively merge on commit URL: https://github.com/apache/lucene-solr/pull/1155#issuecomment-595607002 I missed the fact that `mergeFinished` is executed under IndexWriter lock. I will dig into this again. Please ignore my previous comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053427#comment-17053427 ] David Smiley commented on LUCENE-8962: -- Thanks so much for your input Simon! We need to fight the complexity here. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053432#comment-17053432 ] ASF subversion and git services commented on LUCENE-9257: - Commit e7a61eadf6d2f3c722c791e7470a79b2e919cdeb in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e7a61ea ] LUCENE-9257: Always keep FST off-heap. Remove FSTLoadMode, Reader attributes and openedFromWriter. > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9257) FSTLoadMode should not be BlockTree specific as it is used more generally in index package
[ https://issues.apache.org/jira/browse/LUCENE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant resolved LUCENE-9257. Fix Version/s: 8.6 Resolution: Fixed Thanks reviewers! > FSTLoadMode should not be BlockTree specific as it is used more generally in > index package > -- > > Key: LUCENE-9257 > URL: https://issues.apache.org/jira/browse/LUCENE-9257 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Minor > Fix For: 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > FSTLoadMode and its associate attribute key (static String) are currently > defined in BlockTreeTermsReader, but they are actually used outside of > BlockTree in the general "index" package. > CheckIndex and ReadersAndUpdates are using these enum and attribute key to > drive the FST load mode through the SegmentReader which is not specific to a > postings format. They have an unnecessary dependency to BlockTreeTermsReader. > We could move FSTLoadMode out of BlockTreeTermsReader, to make it a public > enum of the "index" package. That way CheckIndex and ReadersAndUpdates do not > import anymore BlockTreeTermsReader. > This would also allow other postings formats to use the same enum (e.g. > LUCENE-9254) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant closed pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter.
bruno-roustant closed pull request #1328: LUCENE-9257: Always keep FST off-heap. Remove SegmentReadState.openedFromWriter. URL: https://github.com/apache/lucene-solr/pull/1328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13199: Attachment: SOLR-13199.patch > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13199: Status: Patch Available (was: Open) > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-13199: --- Assignee: Munendra S N > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Assignee: Munendra S N >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
[ https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053452#comment-17053452 ] Munendra S N commented on SOLR-13199: - [^SOLR-13199.patch] NPE is still occurring when using without nestedPath field. I have removed version check which wasn't required. When parentFilter is null then, setting parentFilter to {{MatchNoDocsQuery}} as parentFilter String is specified after parsing it resolves to {{null}} [~dsmiley] Could you please review this once? > NPE due to unexpected null return value from QueryBitSetProducer.getBitSet > -- > > Key: SOLR-13199 > URL: https://issues.apache.org/jira/browse/SOLR-13199 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The > attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that > you will obtain by following the steps below: > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > {noformat} >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > Attachments: SOLR-13199.patch, home.zip > > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:* > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.NullPointerException > at > org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1) > at > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184) > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386) > at > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292) > at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73) > {noformat} > In ChildDocTransformer.transform, we have the following lines: > {noformat} > final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); > final int segPrevRootId = segRootId==0? -1: > segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay > {noformat} > But getBitSet can return null if the set of DocIds is empty: > {noformat} > return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits(); > {noformat} > We found this bug using [Diffblue Microservices > Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more > information on this [fuzz testing > campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053517#comment-17053517 ] David Smiley commented on LUCENE-8103: -- Notice that {{TwoPhaseIterator.asDocIdSetIterator(tpi);}} will return an implementation whose {{advance(docId)}} method will move beyond the passed in docID and call matches until it finds a match. That is a waste _if the user of this DISI doesn't care what the next matching document is if the approximation doesn't match_. So QueryValueSource's exists() method could work with the approximation first and if that matches, then and only then call TPI.match. If there is no TPI then the the scorer's DISI is accurate. > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053545#comment-17053545 ] Nhat Nguyen commented on LUCENE-8962: - Some engine tests in Elasticsearch are failing because of this change. I am working to backport them to Lucene so that we can catch similar issues in Lucene. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13893: Status: Patch Available (was: Open) > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053562#comment-17053562 ] Munendra S N commented on SOLR-13893: - [^SOLR-13893.patch] Slightly modified patch > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14289) Solr may attempt to check Chroot after already having connected once
[ https://issues.apache.org/jira/browse/SOLR-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053561#comment-17053561 ] Mike Drob commented on SOLR-14289: -- [~dsmiley] - seems like we're working on similar problems around speeding up core startup - can you take a look at this and let me know what you think? > Solr may attempt to check Chroot after already having connected once > > > Key: SOLR-14289 > URL: https://issues.apache.org/jira/browse/SOLR-14289 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Attachments: Screen Shot 2020-02-26 at 2.56.14 PM.png > > Time Spent: 10m > Remaining Estimate: 0h > > On server startup, we will attempt to load the solr.xml from zookeeper if we > have the right properties set, and then later when starting up the core > container will take time to verify (and create) the chroot even if it is the > same string that we already used before. We can likely skip the second > short-lived zookeeper connection to speed up our startup sequence a little > bit. > > See this attached image from thread profiling during startup. > !Screen Shot 2020-02-26 at 2.56.14 PM.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13893) BlobRepository looks at the wrong system variable (runtme.lib.size)
[ https://issues.apache.org/jira/browse/SOLR-13893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13893: Attachment: SOLR-13893.patch > BlobRepository looks at the wrong system variable (runtme.lib.size) > --- > > Key: SOLR-13893 > URL: https://issues.apache.org/jira/browse/SOLR-13893 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-13893.patch, SOLR-13893.patch > > > Tim Swetland on the user's list pointed out this line in BlobRepository: > private static final long MAX_JAR_SIZE = > Long.parseLong(System.getProperty("runtme.lib.size", String.valueOf(5 * 1024 > * 1024))); > "runtme" can't be right. > [~ichattopadhyaya][~noblepaul] what's your opinion? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11725) json.facet's stddev() function should be changed to use the "Corrected sample stddev" formula
[ https://issues.apache.org/jira/browse/SOLR-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-11725: --- Assignee: Munendra S N > json.facet's stddev() function should be changed to use the "Corrected sample > stddev" formula > - > > Key: SOLR-11725 > URL: https://issues.apache.org/jira/browse/SOLR-11725 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Chris M. Hostetter >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-11725.patch, SOLR-11725.patch, SOLR-11725.patch > > > While working on some equivalence tests/demonstrations for > {{facet.pivot+stats.field}} vs {{json.facet}} I noticed that the {{stddev}} > calculations done between the two code paths can be measurably different, and > realized this is due to them using very different code... > * {{json.facet=foo:stddev(foo)}} > ** {{StddevAgg.java}} > ** {{Math.sqrt((sumSq/count)-Math.pow(sum/count, 2))}} > * {{stats.field=\{!stddev=true\}foo}} > ** {{StatsValuesFactory.java}} > ** {{Math.sqrt(((count * sumOfSquares) - (sum * sum)) / (count * (count - > 1.0D)))}} > Since I"m not really a math guy, I consulting with a bunch of smart math/stat > nerds I know online to help me sanity check if these equations (some how) > reduced to eachother (In which case the discrepancies I was seeing in my > results might have just been due to the order of intermediate operation > execution & floating point rounding differences). > They confirmed that the two bits of code are _not_ equivalent to each other, > and explained that the code JSON Faceting is using is equivalent to the > "Uncorrected sample stddev" formula, while StatsComponent's code is > equivalent to the the "Corrected sample stddev" formula... > https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation > When I told them that stuff like this is why no one likes mathematicians and > pressed them to explain which one was the "most canonical" (or "most > generally applicable" or "best") definition of stddev, I was told that: > # This is something statisticians frequently disagree on > # Practically speaking the diff between the calculations doesn't tend to > differ significantly when count is "very large" > # _"Corrected sample stddev" is more appropriate when comparing two > distributions_ > Given that: > * the primary usage of computing the stddev of a field/function against a > Solr result set (or against a sub-set of results defined by a facet > constraint) is probably to compare that distribution to a different Solr > result set (or to compare N sub-sets of results defined by N facet > constraints) > * the size of the sets of documents (values) can be relatively small when > computing stats over facet constraint sub-sets > ...it seems like {{StddevAgg.java}} should be updated to use the "Corrected > sample stddev" equation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11359) An autoscaling/suggestions endpoint to recommend operations
[ https://issues.apache.org/jira/browse/SOLR-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052544#comment-17052544 ] Megan Carey edited comment on SOLR-11359 at 3/6/20, 4:38 PM: - Would it be possible to explicitly return the URL to hit for applying the suggestion? i.e. rather than return an HTTP method, operation type, etc. just return the constructed URL for executing the action? Also, are you considering writing a cron to periodically execute these suggestions? Or was the intention for these to be manually applied? [~noble.paul] was (Author: megancarey): Would it be possible to explicitly return the URL to hit for applying the suggestion? i.e. rather than return an HTTP method, operation type, etc. just return the constructed URL for executing the action? Also, are you considering writing a cron to periodically execute these suggestions? > An autoscaling/suggestions endpoint to recommend operations > --- > > Key: SOLR-11359 > URL: https://issues.apache.org/jira/browse/SOLR-11359 > Project: Solr > Issue Type: New Feature > Components: AutoScaling >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Attachments: SOLR-11359.patch > > > Autoscaling can make suggestions to users on what operations they can perform > to improve the health of the cluster > The suggestions will have the following information > * http end point > * http method (POST,DELETE) > * command payload -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on issue #1301: LUCENE-9254: UniformSplit supports FST off-heap.
bruno-roustant commented on issue #1301: LUCENE-9254: UniformSplit supports FST off-heap. URL: https://github.com/apache/lucene-solr/pull/1301#issuecomment-595856255 Updated after LUCENE-9257 removed FSTLoadMode. Now FST is off-heap by default. It is possible to force it with a boolean in the UniformSplitPostingsFormat. Also, FST is always on-heap if there is block encoding/decoding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-9258: Assignee: David Smiley > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053589#comment-17053589 ] David Smiley commented on LUCENE-9258: -- Makes sense to me; your test is perfect. I'm curious; how did you see this at a higher level (e.g. Solr or ES)? The issue title & details here are a bit geeky / low-level and I'm trying to think of a good CHANGES.txt entry that might be more meaningful to users. > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: Hi, [~jtibshirani], thanks for your suggestions! ??"I wonder if this clustering-based approach could fit more closely in the current search framework. In the current prototype, we keep all the cluster information on-heap. We could instead try storing each cluster as its own 'term' with a postings list. The kNN query would then be modelled as an 'OR' over these terms."?? In the previous implementation ([https://github.com/irvingzhang/lucene-solr/commit/eb5f79ea7a705595821f73f80a0c5752061869b2]), the cluster information is divided into two parts – meta (.ifi) and data(.ifd) as shown in the following figure, where each cluster with a postings list is stored in the data file (.ifd) and not kept on-heap. A major concern of this implementation is its reading performance of cluster data since reading is a very frequent behavior on kNN search. I will test and check the performance. !image-2020-02-16-15-05-02-451.png! ??"Because of this concern, it could be nice to include benchmarks for index time (in addition to QPS)..."?? Many thanks! I will check the links you mentioned and consider optimize the clustering cost. In addition, more benchmarks will be added soon. h2. *UPDATE – Feb. 24, 2020* I have add a new implementation for IVF index, which has been marked as ***V2 under the package org.apache.lucene.codecs.lucene90. In current implementation, the IVF index has been divided into two files with suffixes .ifi and .ifd, respectively. The .ifd file will be read if cluster information is needed. The experiments are conducted on dataset sift1M (Test codes: [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/KnnIvfPerformTester.java]), detailed results are as follows, # add document -- 3921 ms; # commit -- 3912286 ms (mainly spent on k-means training, 10 iterations, 4000 centroids, totally 512,000 vectors used for training); # R@100 recall time and recall ratio are listed in the following table ||nprobe||avg. search time (ms)||recall ratio (%)|| |8|28.0755|44.154| |16|27.1745|57.9945| |32|32.986|71.7003| |64|40.4082|83.50471| |128|50.9569|92.07929| |256|73.923|97.150894| Compare with on-heap implementation of IVF index, the query time increases significantly (22%~71%). Actually, IVF index is comprised of unique docIDs, and will not take up too much memory. *There is a small argument about whether to keep the cluster information on-heap or not. Hope to hear more suggestions.* ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. >
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: (was: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: The index format of IVFFlat is organized as follows, !1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png! In general, the number of centroids lies within the interval [4 * sqrt(N), 16 * sqrt(N)], where N is the data set size. We use (4 * sqrt(N)) as the actual value of centroid number to balance between accuracy and computational load, denoted by c. And the full data set is used for training if its size no larger than 200,000. Otherwise (128 * c) points are selected after shuffling for training in order to accelerate training. Experiments have been conducted on a large data set (sift1M, [http://corpus-texmex.irisa.fr/]) to verify the implementation of IVFFlat. The base data set (sift_base.fvecs) contains 1,000,000 vectors with 128 dimensions. And 10,000 queries (sift_query.fvecs) are used for recall testing. The recall ratio follows Recall=(Recall vectors in groundTruth) / (number of queries * TopK), where number of queries = 10,000 and TopK=100. The results are as follows (single thread and single segment), ||nprobe||avg. search time (ms)||recall (%)|| |8|16.3827|44.24| |16|16.5834|58.04| |32|19.2031|71.55| |64|24.7065|83.30| |128|34.9165|92.03| |256|60.5844|97.18| | | | | **The test codes could be found in [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java.|https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java] ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: (was: image-2020-02-16-15-05-02-451.png) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-22-06-132.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r389038791 ## File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java ## @@ -211,6 +213,18 @@ public IndexSearcher(IndexReaderContext context, Executor executor) { assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader(); reader = context.reader(); this.executor = executor; +this.sliceExecutionControlPlane = executor == null ? null : getSliceExecutionControlPlane(executor); +this.readerContext = context; +leafContexts = context.leaves(); +this.leafSlices = executor == null ? null : slices(leafContexts); + } + + // Package private for testing + IndexSearcher(IndexReaderContext context, Executor executor, SliceExecutionControlPlane sliceExecutionControlPlane) { +assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader(); +reader = context.reader(); +this.executor = executor; +this.sliceExecutionControlPlane = executor == null ? null : sliceExecutionControlPlane; Review comment: Not sure if I understood your point. The passed in instance is the one being assigned to the member? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-25-58-047.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-03-07-01-27-12-859.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png, image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053620#comment-17053620 ] Michael Sokolov commented on LUCENE-8962: - Based on [~simonw]'s recent comments in github, plus difficulty getting tests to pass consistently (apparently there are more failing tests in Elasticland), we should probably revert for now, at least from 8.x and 8.5 branches. I am tied up for the moment, but will be able to do the revert this weekend. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053623#comment-17053623 ] Xin-Chun Zhang commented on LUCENE-9136: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583504416262-89784074-c9dc-4489-99a1-5e4b3c76e5fc.png|width=624,height=430! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583510066130-b4fbcb29-8ad7-4ff2-99ce-c52f7c27826e.png|width=679,height=468! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583515010497-20b74f41-72c3-48ce-a929-1cbfbd6a6423.png|width=691,height=476! > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: image-2020-03-07-01-22-06-132.png, > image-2020-03-07-01-25-58-047.png, image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenar
[GitHub] [lucene-solr] atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#issuecomment-595884480 @jpountz Raised another iteration, please let me know your thoughts and comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028879 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { Review comment: nit: space after `{` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028289 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { +permits.release(); + } +} +private void processEventsInternal() throws IOException { Review comment: nit: add a new line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028473 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { Review comment: I am not sure if `EventQueue` is a better name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389029514 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java ## @@ -3773,7 +3774,58 @@ public void testRefreshAndRollbackConcurrently() throws Exception { stopped.set(true); indexer.join(); refresher.join(); + if (w.getTragicException() != null) { +w.getTragicException().printStackTrace(); Review comment: I think we don't need to print the stack trace here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053640#comment-17053640 ] Michael Froh commented on LUCENE-8962: -- bq. With a slightly refactored IW we can share the merge logic and let the reader re-write itself since we are talking about very small segments the overhead is very small. This would in turn mean that we are doing the work twice ie. the IW would do its normal work and might merge later etc. Just to provide a bit more context, for the case where my team uses this change, we're replicating the index (think Solr master/slave) from "writers" to many "searchers", so we're avoiding doing the work many times. An earlier (less invasive) approach I tried to address the small flushed segments problem was roughly: call commit on writer, hard link the commit files to another filesystem directory to "clone" the index, open an IW on that directory, merge small segments on the clone, let searchers replicate from the clone. That approach does mean that the merging work happens twice (since the "real" index doesn't benefit from the merge on the clone), but it doesn't involve any changes in Lucene. Maybe that less-invasive approach is a better way to address this. It's certainly more consistent with [~simonw]'s suggestion above. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053641#comment-17053641 ] Kevin Watters commented on SOLR-13749: -- having a local param like method=xcjf could trigger the xcjf query parser if we want. There are some complications. Currently, XCJF benefits greatly by some additional configuration for that query parser to specify the field in which a collection has been routed on. The current join query parsers aren't defined by default in the solrconfig.xml . by merging together the functionality of these 2 query parsers, we might want to explicitly define the join query parser in the solr config by default. Additionally, there are many query parsers beyond xcjf that are really join query parsers. "child", and "parent" should also be considered "join" query parsers if we want to fully go to a consolidated join query parser model. We'll try to be responsive to issues on this ticket, however, I'm not sure how much bandwidth we will have for larger refactors related to xcjf. My preference would be that we leave it as is. This is what we were asked to develop and contribute back so we'd like to keep it as close to the original contribution as possible. If we collectively want to wrangle all of those join parsers into a single consolidated join query parser perhaps we could track that as a different issue/ticket. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully
dnhatn commented on a change in pull request #1319: LUCENE-9164: process all events before closing gracefully URL: https://github.com/apache/lucene-solr/pull/1319#discussion_r389028879 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -299,7 +300,76 @@ static int getActualMaxDocs() { final FieldNumbers globalFieldNumberMap; final DocumentsWriter docWriter; - private final Queue eventQueue = new ConcurrentLinkedQueue<>(); + private final CloseableQueue eventQueue = new CloseableQueue(this); + + static final class CloseableQueue implements Closeable { +private volatile boolean closed = false; +private final Semaphore permits = new Semaphore(Integer.MAX_VALUE); +private final Queue queue = new ConcurrentLinkedQueue<>(); +private final IndexWriter writer; + +CloseableQueue(IndexWriter writer) { + this.writer = writer; +} + +private void tryAcquire() { + if (permits.tryAcquire() == false) { +throw new AlreadyClosedException("queue is closed"); + } + if (closed) { +throw new AlreadyClosedException("queue is closed"); + } +} + +boolean add(Event event) { + tryAcquire(); + try { +return queue.add(event); + } finally { +permits.release(); + } +} + +void processEvents() throws IOException { + tryAcquire(); + try { +processEventsInternal(); + }finally { Review comment: nit: space after `}` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14073) Fix segment look ahead NPE in CollapsingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-14073: -- Attachment: SOLR-14073.patch > Fix segment look ahead NPE in CollapsingQParserPlugin > - > > Key: SOLR-14073 > URL: https://issues.apache.org/jira/browse/SOLR-14073 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14073.patch, SOLR-14073.patch, SOLR-14073.patch > > > The CollapsingQParserPlugin has a bug that if every segment is not visited > during the collect it throws an NPE. This causes the CollapsingQParserPlugin > to not work when used with any feature that short circuits the segments > during the collect. This includes using the CollapsingQParserPlugin twice in > the same query and the time limiting collector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request
[ https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053653#comment-17053653 ] Lucene/Solr QA commented on SOLR-13944: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 57s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13944 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995852/SOLR-13944.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / c73d2c1 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/699/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/699/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > CollapsingQParserPlugin throws NPE instead of bad request > - > > Key: SOLR-13944 > URL: https://issues.apache.org/jira/browse/SOLR-13944 > Project: Solr > Issue Type: Bug >Affects Versions: 7.3.1 >Reporter: Stefan >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-13944.patch > > > I noticed the following NPE: > {code:java} > java.lang.NullPointerException at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584) > {code} > If I am correct, the problem was already addressed in SOLR-8807. The fix does > was not working in this case though, because of a syntax error in the query > (I used the local parameter syntax twice instead of combining it). The > relevant part of the query is: > {code:java} > &fq={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price > asc, id asc'} > {code} > After discussing that on the mailing list, I was asked to open a ticket, > because this situation should result in a bad request instead of a > NullpointerException (see > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: glove-100-angular.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: glove-25-angular.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Comment: was deleted (was: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583504416262-89784074-c9dc-4489-99a1-5e4b3c76e5fc.png|width=624,height=430! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583510066130-b4fbcb29-8ad7-4ff2-99ce-c52f7c27826e.png|width=679,height=468! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !https://intranetproxy.alipay.com/skylark/lark/0/2020/png/35769/1583515010497-20b74f41-72c3-48ce-a929-1cbfbd6a6423.png|width=691,height=476! ) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential >
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: sift-128-euclidean.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053668#comment-17053668 ] Xin-Chun Zhang commented on LUCENE-9136: 1. My personal git branch: [https://github.com/irvingzhang/lucene-solr/tree/jira/lucene-9136-ann-ivfflat]. 2. The vector format is as follows, !image-2020-03-07-01-25-58-047.png|width=535,height=297! Structure of IVF index meta is as follows, !image-2020-03-07-01-27-12-859.png|width=606,height=276! Structure of IVF data: !image-2020-03-07-01-22-06-132.png|width=529,height=309! 3. Ann-benchmark tool could be found in: [https://github.com/irvingzhang/ann-benchmarks]. Benchmark results (Single Thread, 2.5GHz * 2CPU, 16GB RAM, nprobe=8,16,32,64,128,256, centroids=4*sqrt(N), where N the size of dataset): 1) Glove-1.2M-25D-Angular: index build + training cost 706s, qps: 18.8~49.6, recall: 76.8%~99.7% !glove-25-angular.png|width=653,height=450! 2) Glove-1.2M-100D-Angular: index build + training cost 2487s, qps: 12.2~38.3, recall 65.8%~96.3% !glove-100-angular.png|width=671,height=462! 3) Sift-1M-128D-Euclidean: index build + training cost 2397s, qps 14.8~38.2, recall 71.1%~99.2% !sift-128-euclidean.png|width=684,height=471! > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: glove-100-angular.png, glove-25-angular.png, > image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, > image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png > > Time Spent: 50m > Remaining Estimate: 0h > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. > The latest branch is > [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits
[jira] [Created] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
Mike Drob created LUCENE-9266: - Summary: ant nightly-smoke fails due to presence of build.gradle Key: LUCENE-9266 URL: https://issues.apache.org/jira/browse/LUCENE-9266 Project: Lucene - Core Issue Type: Task Reporter: Mike Drob Seen on Jenkins - [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org