[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires less memory and disks when compared with HNSW. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval ca
[jira] [Resolved] (LUCENE-7146) "Latest SVN" needs replaced on the website
[ https://issues.apache.org/jira/browse/LUCENE-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved LUCENE-7146. - Resolution: Won't Fix Closing this. If we feel the need we can always add some git hash to the new website. > "Latest SVN" needs replaced on the website > -- > > Key: LUCENE-7146 > URL: https://issues.apache.org/jira/browse/LUCENE-7146 > Project: Lucene - Core > Issue Type: Bug > Components: general/website >Reporter: Chris M. Hostetter >Priority: Major > > Mike ask a little while back on dev@lucene... > {noformat} > On the bottom right of Lucene's index.html we have "Latest SVN" but of > course it only displays this last svn commit: > r1726344 LUCENE-6937: moving trunk from SVN to GIT. (lucene) — dweiss > Does anyone know how to convert this to the "Latest GIT"? > {noformat} > This isn't particularly straight forward, so filing an issue to track it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016707#comment-17016707 ] Dawid Weiss commented on LUCENE-9134: - > I have to invoke javacc. Where do we get this from? You declare a build script dependency and then just import from jar, as usual. buildscript dependencies don't need versions.props entries as they're evaluated early. I don't debug those scripts in intellij - a println along the way does the job for me. I don't know if breakpoints will work with gradle files - if it's an interpreted script and not a precompiled one (which gets translated into a java class) then I doubt you can put a breakpoint in there. It is interpreted after all. > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1176: LUCENE-9143 Add error-prone checks to build, but disabled
dweiss commented on a change in pull request #1176: LUCENE-9143 Add error-prone checks to build, but disabled URL: https://github.com/apache/lucene-solr/pull/1176#discussion_r367298139 ## File path: gradle/defaults-java.gradle ## @@ -1,11 +1,51 @@ // Configure Java project defaults. -allprojects { - plugins.withType(JavaPlugin) { +buildscript { Review comment: I'd prefer if you separated the configuration and application of this plugin into a separate file (validation/errorprone.gradle)? Then each file configures one thing. Sure - there is an overhead in multiple passes over project collection but I think it's worth knowing what each particular file does and it makes them shorter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] irvingzhang commented on a change in pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging
irvingzhang commented on a change in pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging URL: https://github.com/apache/lucene-solr/pull/1169#discussion_r367305642 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90KnnGraphWriter.java ## @@ -216,8 +216,11 @@ private void mergeKnnGraph(FieldInfo mergeFieldInfo, final MergeState mergeState int docid; while ((docid = sub.nextDoc()) != NO_MORE_DOCS) { int mappedDocId = docMap.get(docid); +/// deleted document (not alive) +if (mappedDocId < 0) { Review comment: Thanks @mocobeta , I have corrected the condition for deleted docIds. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks
[ https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016760#comment-17016760 ] ASF subversion and git services commented on SOLR-14128: Commit 543505470c26f1ebb3ecd5ca57c411c03941a6a1 in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5435054 ] SOLR-14128: Tentative fix: put replicas on other nodes than overseer, wait for all replicas to complete the reload. > SystemCollectionCompatTest times out waiting for Overseer to do compatibility > checks > > > Key: SOLR-14128 > URL: https://issues.apache.org/jira/browse/SOLR-14128 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Andrzej Bialecki >Priority: Major > Attachments: fail.txt, nodeset.patch, pass.txt, > thetaphi_Lucene-Solr-master-Linux_25161.log.txt > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016766#comment-17016766 ] Chen Zhixiang commented on LUCENE-9130: --- Try to dump terms' positing info: private void debugOutputTermsInfo2(IndexReader indexReader, int doc, String fieldName) throws IOException { Terms terms = MultiTerms.getTerms(indexReader, fieldName); TermsEnum termIter = terms.iterator(); while (termIter.next() != null) { PostingsEnum postingsEnum = termIter.postings(null, PostingsEnum.ALL); while (postingsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { int freq = postingsEnum.freq(); System.out.printf("term: %s, freq: %d,", termIter.term().utf8ToString(), freq); while (freq > 0) { System.out.printf(" nextPosition: %d,", postingsEnum.nextPosition()); System.out.printf(" startOffset: %d, endOffset: %d", postingsEnum.startOffset(), postingsEnum.endOffset()); freq--; } System.out.println(); } } } Output: term: 1, freq: 1, nextPosition: 7, startOffset: -1, endOffset: -1 term: 2179, freq: 1, nextPosition: 0, startOffset: -1, endOffset: -1 term: 2184, freq: 1, nextPosition: 2, startOffset: -1, endOffset: -1 term: lg, freq: 1, nextPosition: 6, startOffset: -1, endOffset: -1 term: 入, freq: 1, nextPosition: 4, startOffset: -1, endOffset: -1 terms' position info is right(filtered terms take a position number), but no offset(invalid -1), is offset info needed in PhraseQuery? > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front
[ https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016768#comment-17016768 ] ASF subversion and git services commented on LUCENE-9068: - Commit 7ea7ed72aca556f957a5de55911c852124db8715 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ea7ed7 ] LUCENE-9068: Solr query handling code catches FuzzyTermsException > Build FuzzyQuery automata up-front > -- > > Key: LUCENE-9068 > URL: https://issues.apache.org/jira/browse/LUCENE-9068 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 2.5h > Remaining Estimate: 0h > > FuzzyQuery builds a set of levenshtein automata (one for each possible edit > distance) at rewrite time, and passes them between different TermsEnum > invocations using an attribute source. This seems a bit needlessly > complicated, and also means that things like visiting a query end up building > the automata again. We should instead build the automata at query > construction time, which is how AutomatonQuery does it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front
[ https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016767#comment-17016767 ] ASF subversion and git services commented on LUCENE-9068: - Commit 89cfb906b6c6d08880ddf277e5792b04cf426a5c in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=89cfb90 ] LUCENE-9068: Solr query handling code catches FuzzyTermsException > Build FuzzyQuery automata up-front > -- > > Key: LUCENE-9068 > URL: https://issues.apache.org/jira/browse/LUCENE-9068 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 2.5h > Remaining Estimate: 0h > > FuzzyQuery builds a set of levenshtein automata (one for each possible edit > distance) at rewrite time, and passes them between different TermsEnum > invocations using an attribute source. This seems a bit needlessly > complicated, and also means that things like visiting a query end up building > the automata again. We should instead build the automata at query > construction time, which is how AutomatonQuery does it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front
[ https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016772#comment-17016772 ] Alan Woodward commented on LUCENE-9068: --- Should be fixed now - apologies, the failing test is marked as Slow so it was skipped when I ran tests locally. > Build FuzzyQuery automata up-front > -- > > Key: LUCENE-9068 > URL: https://issues.apache.org/jira/browse/LUCENE-9068 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 2.5h > Remaining Estimate: 0h > > FuzzyQuery builds a set of levenshtein automata (one for each possible edit > distance) at rewrite time, and passes them between different TermsEnum > invocations using an attribute source. This seems a bit needlessly > complicated, and also means that things like visiting a query end up building > the automata again. We should instead build the automata at query > construction time, which is how AutomatonQuery does it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9068) Build FuzzyQuery automata up-front
[ https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9068. --- Resolution: Fixed > Build FuzzyQuery automata up-front > -- > > Key: LUCENE-9068 > URL: https://issues.apache.org/jira/browse/LUCENE-9068 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 2.5h > Remaining Estimate: 0h > > FuzzyQuery builds a set of levenshtein automata (one for each possible edit > distance) at rewrite time, and passes them between different TermsEnum > invocations using an attribute source. This seems a bit needlessly > complicated, and also means that things like visiting a query end up building > the automata again. We should instead build the automata at query > construction time, which is how AutomatonQuery does it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks
[ https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016795#comment-17016795 ] Andrzej Bialecki commented on SOLR-14128: - Beasting with this fix but on a slow machine produced a different error, which occurred when trying to update the schema - this may be a variant of SOLR-13368: {code:java} [beaster] 2> 10461 INFO (qtp676755392-54) [n:127.0.0.1:34393_solr c:.system s:shard1 r:core_node2 x:.system_shard1_replica_n1 ] o.a.s.c.S.Request [.system_shard1_replica_n1] webapp=/solr path=/schema params={wt=javabin&version=2} status=0 QTime=4 [beaster] 2> 10475 ERROR (qtp676755392-49) [n:127.0.0.1:34393_solr c:.system s:shard1 r:core_node4 x:.system_shard1_replica_n3 ] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error reading input String Can't find resource 'schema.xml' in classpath or '/configs/.system', cwd=/home/parallels/lucene-solr/solr/build/solr-core/test/J0 [beaster] 2>at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:94) [beaster] 2>at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) [beaster] 2>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2582) [beaster] 2>at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) [beaster] 2>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) [beaster] 2>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) [beaster] 2>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604) [beaster] 2>at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:166) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [beaster] 2>at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [beaster] 2>at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) [beaster] 2>at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) [beaster] 2>at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [beaster] 2>at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [beaster] 2>at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) [beaster] 2>at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717) [beaster] 2>at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [beaster] 2>at org.eclipse.jetty.server.Server.handle(Server.java:500) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) [beaster] 2>at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270) [beaster] 2>at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [beaster] 2>at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) [beaster] 2>at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) [beaster] 2>at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) [beaster] 2>at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) [beaster] 2>at java.base/java.lang.Thread.run(Thread.java:834) [beaster] 2> Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'schema.xml' in classpath or '/configs/.system', cwd=/home/pa
[jira] [Comment Edited] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks
[ https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016795#comment-17016795 ] Andrzej Bialecki edited comment on SOLR-14128 at 1/16/20 11:15 AM: --- Beasting with this fix but on a slow machine produced a different error, which occurred when trying to update the schema - this may be a variant of SOLR-13368. This is fully reproducible when running on a Linux VM (macOS host), it occurs just after a couple beast runs. {code:java} [beaster] 2> 10461 INFO (qtp676755392-54) [n:127.0.0.1:34393_solr c:.system s:shard1 r:core_node2 x:.system_shard1_replica_n1 ] o.a.s.c.S.Request [.system_shard1_replica_n1] webapp=/solr path=/schema params={wt=javabin&version=2} status=0 QTime=4 [beaster] 2> 10475 ERROR (qtp676755392-49) [n:127.0.0.1:34393_solr c:.system s:shard1 r:core_node4 x:.system_shard1_replica_n3 ] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error reading input String Can't find resource 'schema.xml' in classpath or '/configs/.system', cwd=/home/parallels/lucene-solr/solr/build/solr-core/test/J0 [beaster] 2>at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:94) [beaster] 2>at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) [beaster] 2>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2582) [beaster] 2>at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) [beaster] 2>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) [beaster] 2>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) [beaster] 2>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604) [beaster] 2>at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:166) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [beaster] 2>at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) [beaster] 2>at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) [beaster] 2>at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) [beaster] 2>at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) [beaster] 2>at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212) [beaster] 2>at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [beaster] 2>at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [beaster] 2>at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) [beaster] 2>at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717) [beaster] 2>at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [beaster] 2>at org.eclipse.jetty.server.Server.handle(Server.java:500) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) [beaster] 2>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) [beaster] 2>at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270) [beaster] 2>at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [beaster] 2>at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) [beaster] 2>at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) [beaster] 2>at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) [beaster] 2>at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) [beaster] 2>at java.base/java.lang.Thread.run(Thread.java:834)
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. Howeve
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], in very early stage. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], still very early. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned imple
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], still very early. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all
[GitHub] [lucene-solr] Sachpat opened a new pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7
Sachpat opened a new pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7 URL: https://github.com/apache/lucene-solr/pull/1177 # Description This is the fix backported from 8.3 to 7.7 as implemented at https://github.com/apache/lucene-solr/commit/2a1d5eea42d2bb372245480dd2961baf6fa06469 as per the discussion at https://issues.apache.org/jira/browse/SOLR-13779?focusedCommentId=17016124&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17016124 # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `ant precommit` and the appropriate test suite. - [x] I have added tests for my changes. - [x] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Sachpat commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7
Sachpat commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7 URL: https://github.com/apache/lucene-solr/pull/1177#issuecomment-575149368 @dweiss Please take a look at this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13779) Use the safe fork of simple-xml for clustering contrib
[ https://issues.apache.org/jira/browse/SOLR-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016918#comment-17016918 ] Sachin Pattan commented on SOLR-13779: -- [~dweiss] I created the PR which is available at [https://github.com/apache/lucene-solr/pull/1177] . Also, I had created a PR for SOLR-13971 in 7.7x https://issues.apache.org/jira/browse/SOLR-13971?focusedCommentId=17014143&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17014143 . So maybe it makes sense to create a release for 7.7x which includes both the fixes. > Use the safe fork of simple-xml for clustering contrib > -- > > Key: SOLR-13779 > URL: https://issues.apache.org/jira/browse/SOLR-13779 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 8.3 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija merged pull request #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes
gerlowskija merged pull request #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes URL: https://github.com/apache/lucene-solr/pull/1163 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016921#comment-17016921 ] ASF subversion and git services commented on SOLR-14186: Commit 424ace6f5d729a01ed0a150fb126c9ca204e5b66 in lucene-solr's branch refs/heads/master from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=424ace6 ] SOLR-14186: Enforce CRLF in Windows files with .gitattributes (#1163) > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016941#comment-17016941 ] Jason Gerlowski commented on LUCENE-9077: - Congrats and thanks for all your hard work in getting the gradle build to master [~dweiss]! One question: When I run gradle on master and then later switch to a branch that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts and files as "Untracked files": {code} ➜ lucene-solr git:(branch_8x) ✗ git status On branch branch_8x Your branch is up to date with 'origin/branch_8x'. Untracked files: (use "git add ..." to include in what will be committed) .gradle/ buildSrc/ gradle.properties {code} Is it reasonable to add those files to .gitignore on branch_8x? I'm willing to file the sub-task and do it myself, just wanted to make sure there's not a reason you've avoided it so far. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integration layers that should be added (I use IntelliJ and it >
[jira] [Created] (LUCENE-9144) Error message on 1D BKDWriter is wrong when adding too many points
Ignacio Vera created LUCENE-9144: Summary: Error message on 1D BKDWriter is wrong when adding too many points Key: LUCENE-9144 URL: https://issues.apache.org/jira/browse/LUCENE-9144 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera The error message for the 1D BKD writer when adding too many points is wrong because: 1) It uses pointCount (which is always 0 at that point) instead of valueCount 2) It concatenate the numbers as a string instead of adding them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016941#comment-17016941 ] Jason Gerlowski edited comment on LUCENE-9077 at 1/16/20 1:50 PM: -- Congrats and thanks for all your hard work in getting the gradle build to master [~dweiss]! One question: When I run gradle on master and then later switch to a branch that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts and files as "Untracked files": {code} ➜ lucene-solr git:(branch_8x) ✗ git status On branch branch_8x Your branch is up to date with 'origin/branch_8x'. Untracked files: (use "git add ..." to include in what will be committed) .gradle/ buildSrc/ gradle.properties {code} Is it reasonable to add those files to .gitignore on branch_8x? I'm willing to file the sub-task and do it myself, just wanted to make sure there's not a reason you've avoided it so far. *EDIT* Hmm, it looks like {{ant precommit}} on branch_8x fails because of these files. Maybe it's best not to hide them since they can cause other issues. was (Author: gerlowskija): Congrats and thanks for all your hard work in getting the gradle build to master [~dweiss]! One question: When I run gradle on master and then later switch to a branch that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts and files as "Untracked files": {code} ➜ lucene-solr git:(branch_8x) ✗ git status On branch branch_8x Your branch is up to date with 'origin/branch_8x'. Untracked files: (use "git add ..." to include in what will be committed) .gradle/ buildSrc/ gradle.properties {code} Is it reasonable to add those files to .gitignore on branch_8x? I'm willing to file the sub-task and do it myself, just wanted to make sure there's not a reason you've avoided it so far. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equival
[GitHub] [lucene-solr] iverase opened a new pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter
iverase opened a new pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter URL: https://github.com/apache/lucene-solr/pull/1178 See https://issues.apache.org/jira/browse/LUCENE-9144 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points
[ https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-9144: - Summary: Error message on OneDimensionBKDWriter is wrong when adding too many points (was: Error message on 1D BKDWriter is wrong when adding too many points) > Error message on OneDimensionBKDWriter is wrong when adding too many points > --- > > Key: LUCENE-9144 > URL: https://issues.apache.org/jira/browse/LUCENE-9144 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The error message for the 1D BKD writer when adding too many points is wrong > because: > 1) It uses pointCount (which is always 0 at that point) instead of valueCount > 2) It concatenate the numbers as a string instead of adding them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016966#comment-17016966 ] ASF subversion and git services commented on SOLR-14186: Commit 8c2e800cae6f8f53b8189b26cc443aa070025a2c in lucene-solr's branch refs/heads/branch_8x from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8c2e800 ] SOLR-14186: Introduce gitattributes to manage EOL > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016973#comment-17016973 ] ASF subversion and git services commented on SOLR-14130: Commit 99ec7dcd261ccf4d3f95d5af4ef0a18b91bee3f4 in lucene-solr's branch refs/heads/branch_8x from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=99ec7dc ] SOLR-14130: Add parsing instructions for different types of query records > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query but require extra storage for graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Each of them has its merits and demerits. Since HNSW is now under development, it may be better to provide IVFFlat for an alternative choice. I will soon commit my personal implementations. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires much less memory and disks when compared with HNSW [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. And IVFFlat supports both online and offline training. I'm now trying to introduce the IVFFlat to Lucene core in my person branch [[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], in very early stage. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Summary: Introduce IVFFlat to Lucene for ANN similarity search (was: Introduce IVFFlat for ANN similarity search) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > IVFFlat and HNSW are the most popular ones among all the algorithms. > Recently, implementation of ANN algorithms for Lucene, such as HNSW > (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. > IVFFlat has smaller index size but requires k-means clustering, while HNSW is > faster in query but require extra storage for graphs [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Each of them has its merits and demerits. Since HNSW is now under > development, it may be better to provide IVFFlat for an alternative choice. > I will soon commit my personal implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor
[ https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016995#comment-17016995 ] Zsolt Gyulavari commented on SOLR-14070: ZK could have been firewalled if the clients wouldn't talk to them directly. > Deprecate CloudSolrClient's ZKHost constructor > -- > > Key: SOLR-14070 > URL: https://issues.apache.org/jira/browse/SOLR-14070 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > > CloudSolrClient can be used by pointing it to a ZK cluster. This is > inherently insecure. > CSC can already be used in all the same ways by pointing it to a Solr cluster. > Proposing to add a deprecation notice to the following constructor to the > CloudSolrClient#Builder: > {code} > public Builder(List zkHosts, Optional zkChroot) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor
[ https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017006#comment-17017006 ] Jan Høydahl commented on SOLR-14070: bq. ZK could have been firewalled if the clients wouldn't talk to them directly. CloudSolrClient already supports bootsrapping from Solr URLs, which would be the recommended way. It is also easier to implement in Solr clients for other languages such as PHP, C# etc without worrying about ZK. So. you are free to firewall your ZK today if you wish :) I'm +0 on the idea. We could deprecate and then not remove it until v10.0, just to force users into reconsidering their choice. We could also add more JavaDocs to the deprecated constructor, reminding people to limit access to their ZK as much as possible. > Deprecate CloudSolrClient's ZKHost constructor > -- > > Key: SOLR-14070 > URL: https://issues.apache.org/jira/browse/SOLR-14070 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > > CloudSolrClient can be used by pointing it to a ZK cluster. This is > inherently insecure. > CSC can already be used in all the same ways by pointing it to a Solr cluster. > Proposing to add a deprecation notice to the following constructor to the > CloudSolrClient#Builder: > {code} > public Builder(List zkHosts, Optional zkChroot) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter
iverase merged pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter URL: https://github.com/apache/lucene-solr/pull/1178 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points
[ https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017018#comment-17017018 ] ASF subversion and git services commented on LUCENE-9144: - Commit eb13d5bc8b3b0497ce2aca3d99e37884dc54599a in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eb13d5b ] LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. (#1178) > Error message on OneDimensionBKDWriter is wrong when adding too many points > --- > > Key: LUCENE-9144 > URL: https://issues.apache.org/jira/browse/LUCENE-9144 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The error message for the 1D BKD writer when adding too many points is wrong > because: > 1) It uses pointCount (which is always 0 at that point) instead of valueCount > 2) It concatenate the numbers as a string instead of adding them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points
[ https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017032#comment-17017032 ] ASF subversion and git services commented on LUCENE-9144: - Commit ced06d7086a870a1dbab6af841132daf1f4c4c68 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ced06d7 ] LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. (#1178) > Error message on OneDimensionBKDWriter is wrong when adding too many points > --- > > Key: LUCENE-9144 > URL: https://issues.apache.org/jira/browse/LUCENE-9144 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The error message for the 1D BKD writer when adding too many points is wrong > because: > 1) It uses pointCount (which is always 0 at that point) instead of valueCount > 2) It concatenate the numbers as a string instead of adding them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points
[ https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9144. -- Fix Version/s: 8.5 Assignee: Ignacio Vera Resolution: Fixed > Error message on OneDimensionBKDWriter is wrong when adding too many points > --- > > Key: LUCENE-9144 > URL: https://issues.apache.org/jira/browse/LUCENE-9144 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.5 > > Time Spent: 20m > Remaining Estimate: 0h > > The error message for the 1D BKD writer when adding too many points is wrong > because: > 1) It uses pointCount (which is always 0 at that point) instead of valueCount > 2) It concatenate the numbers as a string instead of adding them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017089#comment-17017089 ] Bruno Roustant commented on LUCENE-9125: Here it is: Task QPS trunk StdDev QPS patch StdDev Pct diff HighIntervalsOrdered 463.57 (13.2%) 443.74 (19.6%) -4.3% ( -32% - 32%) Respell 382.45 (14.7%) 374.88 (21.3%) -2.0% ( -33% - 39%) OrHighLow 1746.37 (6.8%) 1737.44 (7.0%) -0.5% ( -13% - 14%) AndHighLow 4208.34 (6.1%) 4186.85 (5.8%) -0.5% ( -11% - 12%) HighTerm 5697.99 (7.5%) 5673.66 (5.1%) -0.4% ( -12% - 13%) BrowseMonthTaxoFacets 4679.40 (3.7%) 4664.60 (2.6%) -0.3% ( -6% - 6%) Prefix3 442.09 (17.3%) 441.77 (16.6%) -0.1% ( -28% - 40%) BrowseDateTaxoFacets 4104.50 (3.4%) 4102.05 (2.8%) -0.1% ( -6% - 6%) OrHighMed 681.54 (11.8%) 681.70 (10.6%) 0.0% ( -20% - 25%) AndHighHigh 978.85 (8.3%) 979.47 (9.9%) 0.1% ( -16% - 19%) BrowseDayOfYearTaxoFacets 3615.56 (2.8%) 3620.94 (2.4%) 0.1% ( -4% - 5%) MedTerm 5964.33 (5.7%) 5980.59 (5.8%) 0.3% ( -10% - 12%) LowTerm 6555.56 (4.8%) 6576.49 (5.3%) 0.3% ( -9% - 10%) Fuzzy2 73.24 (16.4%) 73.55 (16.1%) 0.4% ( -27% - 39%) Fuzzy1 887.86 (5.3%) 892.14 (2.7%) 0.5% ( -7% - 8%) HighPhrase 901.57 (5.7%) 905.94 (6.6%) 0.5% ( -11% - 13%) OrHighHigh 741.70 (11.5%) 745.44 (8.4%) 0.5% ( -17% - 23%) BrowseMonthSSDVFacets 3462.54 (4.2%) 3480.43 (3.0%) 0.5% ( -6% - 8%) HighSloppyPhrase 617.51 (6.9%) 620.74 (7.8%) 0.5% ( -13% - 16%) PKLookup 275.55 (5.2%) 277.01 (5.0%) 0.5% ( -9% - 11%) MedSloppyPhrase 1843.18 (4.7%) 1853.23 (3.8%) 0.5% ( -7% - 9%) LowSloppyPhrase 2085.07 (4.3%) 2098.25 (3.9%) 0.6% ( -7% - 9%) BrowseDayOfYearSSDVFacets 2985.60 (2.5%) 3009.10 (2.6%) 0.8% ( -4% - 6%) AndHighMed 1712.96 (5.8%) 1729.47 (4.5%) 1.0% ( -8% - 12%) LowSpanNear 2006.25 (6.2%) 2029.83 (6.0%) 1.2% ( -10% - 14%) MedSpanNear 814.10 (12.3%) 823.97 (10.1%) 1.2% ( -18% - 26%) HighSpanNear 593.47 (10.3%) 600.77 (10.6%) 1.2% ( -17% - 24%) HighTermDayOfYearSort 1035.41 (7.8%) 1050.76 (6.5%) 1.5% ( -11% - 17%) Wildcard 772.44 (10.7%) 791.42 (12.7%) 2.5% ( -18% - 28%) MedPhrase 806.70 (8.7%) 827.27 (8.1%) 2.5% ( -13% - 21%) LowPhrase 805.91 (7.9%) 831.26 (5.3%) 3.1% ( -9% - 17%) IntNRQ 1898.15 (8.1%) 1967.24 (9.8%) 3.6% ( -13% - 23%) HighTermMonthSort 3150.77 (12.1%) 3300.42 (13.5%) 4.7% ( -18% - 34%) > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017284#comment-17017284 ] ASF subversion and git services commented on SOLR-14130: Commit 35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35d8e3d ] SOLR-14130: Continue to improve log parsing logic > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9145) Address warnings found by static analysis
Mike Drob created LUCENE-9145: - Summary: Address warnings found by static analysis Key: LUCENE-9145 URL: https://issues.apache.org/jira/browse/LUCENE-9145 Project: Lucene - Core Issue Type: Sub-task Reporter: Mike Drob Assignee: Mike Drob -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017288#comment-17017288 ] ASF subversion and git services commented on SOLR-14130: Commit f48b5f9324532169ddf41e4cb52b5f628b5bc31b in lucene-solr's branch refs/heads/branch_8x from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f48b5f9 ] SOLR-14130: Continue to improve log parsing logic > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9145) Address warnings found by static analysis
[ https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017292#comment-17017292 ] Kevin Risden commented on LUCENE-9145: -- [~mdrob] I'm all for this and probably easier with just Gradle. I think there are a few older jiras related to this as well. Might help to link them? Pmd, javac warnings, etc. I can dig them up too if it helps. > Address warnings found by static analysis > - > > Key: LUCENE-9145 > URL: https://issues.apache.org/jira/browse/LUCENE-9145 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017300#comment-17017300 ] Mike Drob commented on LUCENE-9077: --- Yes, we should add all of those to .gitignore and then figure out how to make ant precommit stop complaining as well. They're just like any other build files. Specifically, you should be able to add just {{.gradle}} and {{gradle.properties}}, no need to add buildSrc since the only thing inside of it is another .gradle dir. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integration layers that should be added (I use IntelliJ and it > imports the project out of the box, without the need for any special tuning). > * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; > currently XSLT...) > * I didn't bother adding Solr dist/test-framework to packaging (who'd use it > from a binary distribution? > > *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and > Cao Mạnh Đạt but also applies lessons learned from t
[jira] [Commented] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017302#comment-17017302 ] Mike Drob commented on LUCENE-9077: --- Another clean up question is when we would feel comfortable removing the maven shadow build from master. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integration layers that should be added (I use IntelliJ and it > imports the project out of the box, without the need for any special tuning). > * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; > currently XSLT...) > * I didn't bother adding Solr dist/test-framework to packaging (who'd use it > from a binary distribution? > > *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and > Cao Mạnh Đạt but also applies lessons learned from those two efforts: > * *Do not try to do too many things at once*. If we deviate too far from > master, the branch will be hard to merge. > * *Do everything in baby-steps* and add small, independent build fragments > rep
[jira] [Commented] (LUCENE-9143) Add more static analysis and clean up resulting warnings/errors
[ https://issues.apache.org/jira/browse/LUCENE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017303#comment-17017303 ] Kevin Risden commented on LUCENE-9143: -- [~mdrob] I'm all for this and probably easier with just Gradle. I think there are a few older jiras related to this as well. Might help to link them? Pmd, javac warnings, etc. I can dig them up too if it helps. (previously commented on the subtask by mistake from my phone and now also see the PR :) ) > Add more static analysis and clean up resulting warnings/errors > --- > > Key: LUCENE-9143 > URL: https://issues.apache.org/jira/browse/LUCENE-9143 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mike Drob >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Part of the discussion with Mark Miller was the need for better bug finding - > especially in tricky areas like concurrency. One of the ways we can do this > is with added static analysis and increased tooling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-9145) Address warnings found by static analysis
[ https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated LUCENE-9145: - Comment: was deleted (was: [~mdrob] I'm all for this and probably easier with just Gradle. I think there are a few older jiras related to this as well. Might help to link them? Pmd, javac warnings, etc. I can dig them up too if it helps.) > Address warnings found by static analysis > - > > Key: LUCENE-9145 > URL: https://issues.apache.org/jira/browse/LUCENE-9145 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9143) Add more static analysis and clean up resulting warnings/errors
[ https://issues.apache.org/jira/browse/LUCENE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017313#comment-17017313 ] Erick Erickson commented on LUCENE-9143: I've been going to get around to these forever. I haven't done any work on them, mostly here to close if this JIRA takes care of them. +1 to only working with the Gradle build, and by implication master. especially as sometime in the not-too-distant future we won't be supporting Java 8 any more. > Add more static analysis and clean up resulting warnings/errors > --- > > Key: LUCENE-9143 > URL: https://issues.apache.org/jira/browse/LUCENE-9143 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mike Drob >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Part of the discussion with Mark Miller was the need for better bug finding - > especially in tricky areas like concurrency. One of the ways we can do this > is with added static analysis and increased tooling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017314#comment-17017314 ] ASF subversion and git services commented on SOLR-14186: Commit 7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d in lucene-solr's branch refs/heads/branch_8_4 from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d3ac7c ] SOLR-14186: Introduce gitattributes to manage EOL > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-14186: -- Assignee: Jason Gerlowski > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017317#comment-17017317 ] Jason Gerlowski commented on SOLR-14186: I've committed a gitattributes file to all of the branches that might see subsequent releases: master, branch_8x and branch_8_4. That way anything released from these branches should avoid accidentally releasing a broken Windows script, etc. > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14186) Ensure Windows files retain CRLF endings
[ https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-14186. Fix Version/s: 8.5 master (9.0) 8.4.2 Resolution: Fixed > Ensure Windows files retain CRLF endings > > > Key: SOLR-14186 > URL: https://issues.apache.org/jira/browse/SOLR-14186 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0), 8.4 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Fix For: 8.4.2, master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > We've had several recent instances where our Windows files (solr.cmd, > solr.in.cmd) end up getting their Windows-specific line-endings stripped out. > This causes chunks of those scripts to fail when run on Windows. > e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and > the problem was fixed and recurred again within a week. > Generally, contributors/committers can prevent this by setting their > {{core.autocrlf}} git setting to {{input}}. But we should also put > repository-wide settings in place exempting certain files from line-ending > conversion entirely. > This issue proposes adding a .gitattributes setting to special-case > OS-specific files (bash scripts, Windows batch files, etc.) This will > prevent solr.cmd's line endings from being changed by committers who forget > to configure the setting on a new machine, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit merged pull request #1173: LUCENE-8369: Remove obsolete spatial module
asfgit merged pull request #1173: LUCENE-8369: Remove obsolete spatial module URL: https://github.com/apache/lucene-solr/pull/1173 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017319#comment-17017319 ] ASF subversion and git services commented on LUCENE-8369: - Commit 78655239c58a1ed72d6e015dd05a0b355c936999 in lucene-solr's branch refs/heads/master from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7865523 ] LUCENE-8369: Remove obsolete spatial module > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017324#comment-17017324 ] Dawid Weiss commented on LUCENE-9077: - There are a number of unanswered questions, actually. For now, I'd consider the gradle build structure *not* compatible with ant (whether on master or on branch_8x). So I wouldn't recommend doing live branch switches without a full clean. This means: when you switch branches (or build systems), run: {code} git clean -xfd . {code} Ignoring temporary files is one thing but ant happily sucks in jar files for sha checks from build/ folders etc. I myself do the above or just keep separate branches for 8x and master... If you're desperate then feel free to work on this and file a pull request, please - I just don't consider it a priority for now. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integration layers that should be added (I use IntelliJ and it > imports the project out of the box, without the need for any special tuning). > * Add Sol
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017333#comment-17017333 ] ASF subversion and git services commented on LUCENE-8369: - Commit c0c775799c1c1f69d146336016bcd4c6ffdd2ce8 in lucene-solr's branch refs/heads/branch_8x from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c0c7757 ] LUCENE-8369: Remove obsolete spatial module > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Knize updated LUCENE-8369: --- Resolution: Resolved Status: Resolved (was: Patch Available) > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on issue #1173: LUCENE-8369: Remove obsolete spatial module
nknize commented on issue #1173: LUCENE-8369: Remove obsolete spatial module URL: https://github.com/apache/lucene-solr/pull/1173#issuecomment-575264274 Thx @dsmiley! Merged and back ported to 8.x This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader
Andrzej Bialecki created SOLR-14192: --- Summary: Race condition between SchemaManager and ZkIndexSchemaReader Key: SOLR-14192 URL: https://issues.apache.org/jira/browse/SOLR-14192 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 8.4 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 8.5 Spin-off from SOLR-14128 and SOLR-13368. In SolrCloud when a SolrCore is created and it uses managed schema then its {{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial {{schema.xml}} to {{managed-schema}}. This includes removing the original {{schema.xml}} file. SOLR-13368 added some locking to make sure the changed resource name (i.e. {{managed-schema}}) becomes visible only when this process is complete, and that in-flight requests to /admin/schema block until this process is complete, to avoid returning inconsistent data. This locking mechanism uses simple Object monitors. However, if there's more than 1 node in the cluster the subsequent request to retrieve schema may execute on a core that still hasn't reloaded its schema ({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to trigger), and the resource name in that stale schema still points to {{schema.xml}}, which by this time no longer exists because it was removed by {{ManagedIndexSchemaFactory}} in the first core. As I see it there are two bugs here: # there's no distributed locking when this upgrade is performed, so it's natural that there are multiple cores racing against each other to perform this upgrade. # the upgrade process removes {{schema.xml}} too early - it triggers all other cores by creating the {{managed-schema}} file, and then other cores reload from the new managed schema - but it should wait until this reload is complete on all cores because only then it's safe to delete the non-managed resource as it's no longer in use by any core. Issue 1. can be solved by adding an ephemeral znode lock so that only one core can perform the upgrade. Issue 2. can be solved by using {{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and deleting {{schema.xml}} only after it's done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7
dweiss merged pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7 URL: https://github.com/apache/lucene-solr/pull/1177 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7
dweiss commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7 URL: https://github.com/apache/lucene-solr/pull/1177#issuecomment-575268931 Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor
[ https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017358#comment-17017358 ] Zsolt Gyulavari commented on SOLR-14070: Totally agree, Jan. I just wanted to give a reason for why it would be a more secure to deprecate it (and use a FW). Happy to hear that's already possible, wasn't aware of it. > Deprecate CloudSolrClient's ZKHost constructor > -- > > Key: SOLR-14070 > URL: https://issues.apache.org/jira/browse/SOLR-14070 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > > CloudSolrClient can be used by pointing it to a ZK cluster. This is > inherently insecure. > CSC can already be used in all the same ways by pointing it to a Solr cluster. > Proposing to add a deprecation notice to the following constructor to the > CloudSolrClient#Builder: > {code} > public Builder(List zkHosts, Optional zkChroot) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017369#comment-17017369 ] Dawid Weiss commented on LUCENE-9134: - I'll be with my kids for the weekend, perhaps [~mdrob] would like to jump in on this one and try to give you a headstart, Erick? I'd really start with something super-simple and proceed to attach other tasks from there. Jflex/ javacc seem like good candidates to me and Mike has some experience and gut feeling where I think it should be going (we discussed it on another issue). > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017407#comment-17017407 ] ASF subversion and git services commented on SOLR-14184: Commit 5f2d7c4855987670489d68884c787e4cfb377fa9 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f2d7c4 ] SOLR-14184: Internal 'test' variable DirectUpdateHandler2.commitOnClose has been removed and replaced with TestInjection.skipIndexWriterCommitOnClose > replace DirectUpdateHandler2.commitOnClose with something in TestInjection > -- > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017428#comment-17017428 ] ASF subversion and git services commented on SOLR-14184: Commit bb48773cdc279403b8c6af82f4f52b247a1e61c1 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bb48773 ] SOLR-14184: Internal 'test' variable DirectUpdateHandler2.commitOnClose has been removed and replaced with TestInjection.skipIndexWriterCommitOnClose (cherry picked from commit 5f2d7c4855987670489d68884c787e4cfb377fa9) > replace DirectUpdateHandler2.commitOnClose with something in TestInjection > -- > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6336) AnalyzingInfixSuggester needs duplicate handling
[ https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017458#comment-17017458 ] Michal Hlavac commented on LUCENE-6336: --- It's not general solution, but I tried to override add method to basically create or update existing document and it works. Of course, it doesn't work with weightField and payloadField, but in my scenario with only field usage it works: {code:java} public class DedupAnalyzingInfixSuggester extends AnalyzingInfixSuggester { public DedupAnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws IOException { super(dir, analyzer); } // ... Other constructors ... @Override public void add(BytesRef text, Set contexts, long weight, BytesRef payload) throws IOException { update(text, contexts, weight, payload); } } {code} > AnalyzingInfixSuggester needs duplicate handling > > > Key: LUCENE-6336 > URL: https://issues.apache.org/jira/browse/LUCENE-6336 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.10.3, 5.0 >Reporter: Jan Høydahl >Priority: Major > Labels: lookup, suggester > Attachments: LUCENE-6336.patch > > > Spinoff from LUCENE-5833 but else unrelated. > Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and > stores payload and score together with the suggest text. > I did some testing with Solr, producing the DocumentDictionary from an index > with multiple documents containing the same text, but with random weights > between 0-100. Then I got duplicate identical suggestions sorted by weight: > {code} > { > "suggest":{"languages":{ > "engl":{ > "numFound":101, > "suggestions":[{ > "term":"English", > "weight":100, > "payload":"0"}, > { > "term":"English", > "weight":99, > "payload":"0"}, > { > "term":"English", > "weight":98, > "payload":"0"}, > ---etc all the way down to 0--- > {code} > I also reproduced the same behavior in AnalyzingInfixSuggester directly. So > there is a need for some duplicate removal here, either while building the > local suggest index or during lookup. Only the highest weight suggestion for > a given term should be returned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks
[ https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017465#comment-17017465 ] Andrzej Bialecki commented on SOLR-14128: - Follow-up for the schema update bug in SOLR-14192. > SystemCollectionCompatTest times out waiting for Overseer to do compatibility > checks > > > Key: SOLR-14128 > URL: https://issues.apache.org/jira/browse/SOLR-14128 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Andrzej Bialecki >Priority: Major > Attachments: fail.txt, nodeset.patch, pass.txt, > thetaphi_Lucene-Solr-master-Linux_25161.log.txt > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader
[ https://issues.apache.org/jira/browse/SOLR-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-14192: Attachment: SOLR-14192.patch > Race condition between SchemaManager and ZkIndexSchemaReader > > > Key: SOLR-14192 > URL: https://issues.apache.org/jira/browse/SOLR-14192 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.4 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-14192.patch > > > Spin-off from SOLR-14128 and SOLR-13368. > In SolrCloud when a SolrCore is created and it uses managed schema then its > {{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial > {{schema.xml}} to {{managed-schema}}. This includes removing the original > {{schema.xml}} file. > SOLR-13368 added some locking to make sure the changed resource name (i.e. > {{managed-schema}}) becomes visible only when this process is complete, and > that in-flight requests to /admin/schema block until this process is > complete, to avoid returning inconsistent data. This locking mechanism uses > simple Object monitors. > However, if there's more than 1 node in the cluster the subsequent request to > retrieve schema may execute on a core that still hasn't reloaded its schema > ({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to > trigger), and the resource name in that stale schema still points to > {{schema.xml}}, which by this time no longer exists because it was removed by > {{ManagedIndexSchemaFactory}} in the first core. > As I see it there are two bugs here: > # there's no distributed locking when this upgrade is performed, so it's > natural that there are multiple cores racing against each other to perform > this upgrade. > # the upgrade process removes {{schema.xml}} too early - it triggers all > other cores by creating the {{managed-schema}} file, and then other cores > reload from the new managed schema - but it should wait until this reload is > complete on all cores because only then it's safe to delete the non-managed > resource as it's no longer in use by any core. > Issue 1. can be solved by adding an ephemeral znode lock so that only one > core can perform the upgrade. Issue 2. can be solved by using > {{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and > deleting {{schema.xml}} only after it's done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader
[ https://issues.apache.org/jira/browse/SOLR-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017466#comment-17017466 ] Andrzej Bialecki commented on SOLR-14192: - This patch seems to fix it for me, at least I wasn't able to reproduce this anymore. Summary of changes: * use an ephemeral ZK lock when upgrading the schema to managed. * be more lenient when retrieving the schema - if local core claims to be still using {{schema.xml}} but it cannot be found in ZK then try to retrieve the backup left over after upgrade, ie. {{schema.xml.bak}}, and if that doesn't exist either then simply use the current in-memory schema. > Race condition between SchemaManager and ZkIndexSchemaReader > > > Key: SOLR-14192 > URL: https://issues.apache.org/jira/browse/SOLR-14192 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.4 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-14192.patch > > > Spin-off from SOLR-14128 and SOLR-13368. > In SolrCloud when a SolrCore is created and it uses managed schema then its > {{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial > {{schema.xml}} to {{managed-schema}}. This includes removing the original > {{schema.xml}} file. > SOLR-13368 added some locking to make sure the changed resource name (i.e. > {{managed-schema}}) becomes visible only when this process is complete, and > that in-flight requests to /admin/schema block until this process is > complete, to avoid returning inconsistent data. This locking mechanism uses > simple Object monitors. > However, if there's more than 1 node in the cluster the subsequent request to > retrieve schema may execute on a core that still hasn't reloaded its schema > ({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to > trigger), and the resource name in that stale schema still points to > {{schema.xml}}, which by this time no longer exists because it was removed by > {{ManagedIndexSchemaFactory}} in the first core. > As I see it there are two bugs here: > # there's no distributed locking when this upgrade is performed, so it's > natural that there are multiple cores racing against each other to perform > this upgrade. > # the upgrade process removes {{schema.xml}} too early - it triggers all > other cores by creating the {{managed-schema}} file, and then other cores > reload from the new managed schema - but it should wait until this reload is > complete on all cores because only then it's safe to delete the non-managed > resource as it's no longer in use by any core. > Issue 1. can be solved by adding an ephemeral znode lock so that only one > core can perform the upgrade. Issue 2. can be solved by using > {{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and > deleting {{schema.xml}} only after it's done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9146) Switch GitHub PR test from ant precommit to gradle
Mike Drob created LUCENE-9146: - Summary: Switch GitHub PR test from ant precommit to gradle Key: LUCENE-9146 URL: https://issues.apache.org/jira/browse/LUCENE-9146 Project: Lucene - Core Issue Type: Sub-task Reporter: Mike Drob -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017518#comment-17017518 ] David Smiley commented on LUCENE-9125: -- There's an option for lucene-util to format the output for JIRA; I forget what it is off-hand. What data set did you use? (e.g. wikibigall or...?) Looking at the results you posted, the optimization seems fairly invisible. It surely would not have improved HighTermMonthSort as there's no fuzzy stuff there, and so that's 4.7% of "noise". > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14184: -- Resolution: Fixed Status: Resolved (was: Patch Available) > replace DirectUpdateHandler2.commitOnClose with (negated) > TestInjection.skipIndexWriterCommitOnClose > > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} > > It's been replaced with the (negated) option > {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset > to it's default value of {{false}} by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14184: -- Fix Version/s: 8.5 master (9.0) Description: {code:java} public static volatile boolean commitOnClose = true; // TODO: make this a real config option or move it to TestInjection {code} Lots of tests muck with this (to simulate unclean shutdown and force tlog replay on restart) but there's no garuntee that it is reset properly. It should be replaced by logic in {{TestInjection}} that is correctly cleaned up by {{TestInjection.reset()}} It's been replaced with the (negated) option {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset to it's default value of {{false}} by {{TestInjection.reset()}} was: {code:java} public static volatile boolean commitOnClose = true; // TODO: make this a real config option or move it to TestInjection {code} Lots of tests muck with this (to simulate unclean shutdown and force tlog replay on restart) but there's no garuntee that it is reset properly. It should be replaced by logic in {{TestInjection}} that is correctly cleaned up by {{TestInjection.reset()}} Summary: replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose (was: replace DirectUpdateHandler2.commitOnClose with something in TestInjection) > replace DirectUpdateHandler2.commitOnClose with (negated) > TestInjection.skipIndexWriterCommitOnClose > > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} > > It's been replaced with the (negated) option > {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset > to it's default value of {{false}} by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017521#comment-17017521 ] Mike Drob commented on LUCENE-9134: --- For reference, this is what I told Erick over Slack. Didn't realize the question was on JIRA as well. {quote} I think what you want to do is define a configuration that includes a dependency for net.java.dev.javacc:javacc:5.0 and then you can refer to the classpath of that I never got breakpoints to work with tasks, so if you figure it out please share! declare the dep something like https://github.com/apache/lucene-solr/blob/master/gradle/validation/rat-sources.gradle#L27 (edited) and then use it something like https://github.com/apache/lucene-solr/blob/master/gradle/validation/rat-sources.gradle#L143 (edited) {quote} > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017635#comment-17017635 ] Michael McCandless commented on LUCENE-9125: That drop is because I tried JDK 13 for one (maybe two) runs, and it's a big slowdown for many queries!! Then I switched to JDK 12 and most queries are as fast JDK 11. > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017636#comment-17017636 ] Michael McCandless commented on LUCENE-9125: [~broustant] those QPS numbers are crazy high – which {{-source}} did you use? > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. Recently, the implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress, and draws attention of those who are interested in Lucene and hope to use HNSW with Solr/Lucene. As another alternative for ANN similarity search problems, IVFFlat is also very popular with many users and supporters. Compared with HNSW, IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query (no training required) but requires extra storage for graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Both of them have their merits and demerits. Another advantage is that IVFFlat can be faster and more accurate when enables GPU parallel computing (current not support in Java). Since HNSW is now under development, it may be better to provide both algorithm implementations for potential users who have very different applications and scenarios. I will soon commit my personal implementations. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query but require extra storage for graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Each of them has its merits and demerits. Since HNSW is now under development, it may be better to provide IVFFlat for an alternative choice. I will soon commit my personal implementations. > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous a
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. Recently, the implementation of HNSW (Hierarchical Navigable Small World, LUCENE-9004) for Lucene, has made great progress. The issue draws attention of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. As an alternative for solving ANN similarity search problems, IVFFlat is also very popular with many users and supporters. Compared with HNSW, IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query (no training required) but requires extra storage for saving graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Another advantage is that IVFFlat can be faster and more accurate when enables GPU parallel computing (current not support in Java). Both algorithms have their merits and demerits. Since HNSW is now under development, it may be better to provide both implementations (HNSW && IVFFlat) for potential users who are faced with very different scenarios and want to more choices. I will soon commit my personal implementations. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. Recently, the implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress, and draws attention of those who are interested in Lucene and hope to use HNSW with Solr/Lucene. As another alternative for ANN similarity search problems, IVFFlat is also very popular with many users and supporters. Compared with HNSW, IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query (no training required) but requires extra storage for graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Both of them have their merits and demerits. Another advantage is that IVFFlat can be faster and more accurate when enables GPU parallel computing (current not support in Java). Since HNSW is now under development, it may be better to provide both algorithm implementations for potential users who
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689 ] Cao Manh Dat commented on SOLR-12859: - To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will share the same list of interceptors even with client created in test. So how we can discriminate a request sent from a client inside a node with a request sent from a client inside a test method if all client will use a same list of interceptors? The naive solution was setting a flag called {{isSolrThread}} to distinguish these two case. In most of cases, a request sent by a node will be sent from a thread from a threadPool created by {{ExecutorUtil}}. So to make auth tests pass {{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. With all of these context less review the mystery code again {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr; if (reqInfo != null) { // 1.Author's idea: Ok, the thread is holding a request, if authentication is enabled, the req must hold a Principal Principal principal = reqInfo.getUserPrincipal(); if (principal == null) { // 2. Author's idea: the req did not pass authentication since Principal is not set, do not need to do anything here! // my comment: this is not true, SolrRequestInfo is also used as a garbage to put data into so many place rely on data inside SolrRequestInfo, the present of SolrRequestInfo does not mean that it comes from outside. return Optional.empty(); } else { usr = principal.getName(); } } else { if (!isSolrThread()) { // 3. Author's idea: so the req is not sent inside a thread created by ExecutorUtil, it must come from test code or outside world // my comment: it is not true, since in {{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was used instead of a threadPool created by ExecutorUtil return Optional.empty(); } // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out from this node. usr = "$"; //special name to denote the user is the node itself } {code} > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.so
[jira] [Comment Edited] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689 ] Cao Manh Dat edited comment on SOLR-12859 at 1/17/20 4:15 AM: -- To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will share the same list of interceptors even with client created in test. So how we can discriminate a request sent from a client inside a node with a request sent from a client inside a test method if all client will use a same list of interceptors? The naive solution was setting a flag called {{isSolrThread}} to distinguish these two case. In most of cases, a request sent by a node will be sent from a thread from a threadPool created by {{ExecutorUtil}}. So to make auth tests pass {{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. With all of these context less review the mystery code again {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr; if (reqInfo != null) { // 1.Author's idea: Ok, the thread is holding a request, if authentication is enabled, the req must hold a Principal Principal principal = reqInfo.getUserPrincipal(); if (principal == null) { // 2. Author's idea: the req did not pass authentication since Principal is not set, do not need to do anything here! // my comment: this is not true, SolrRequestInfo is also used as a garbage to put data into so many place rely on data inside SolrRequestInfo, the present of SolrRequestInfo does not mean that it comes from outside. return Optional.empty(); } else { usr = principal.getName(); } } else { if (!isSolrThread()) { // 3. Author's idea: so the req is not sent inside a thread created by ExecutorUtil, it must come from test code or outside world // my comment: it is not true, since in {{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was used instead of a threadPool created by ExecutorUtil return Optional.empty(); } // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out from this node. usr = "$"; //special name to denote the user is the node itself } {code} But with new HTTP/2 client, interceptor is added to each client object, so there are no single static variable here -> no sharing interceptors between clients of nodes and clients of test -> if the interceptor's code is called it must be sent from a node. So the mystery block can be changed to for the interceptor of HTTP/2 client {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr = NODE_IS_USER; if (reqInfo != null && reqInfo.getUserPrincipal() != null) usr = reqInfo.getUserPrincipal().getName() {code} was (Author: caomanhdat): To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will share the same list of interceptors even with client created in test. So how we can discriminate
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017696#comment-17017696 ] Julie Tibshirani commented on LUCENE-9004: -- Hello and thank you for this very exciting work! We have been doing research into nearest neighbor search on high-dimensional vectors and I wanted to share some thoughts here in the hope that they're helpful. Related to Adrien's comment about search filters, I am wondering how deleted documents would be handled. If I'm understanding correctly, a segment's deletes are applied 'on top of' the query. So if the k nearest neighbors to the query vector all happen to be deleted, then the query won't bring back any documents. From a user's perspective, I could see this behavior being surprising or hard to work with. One approach would be to keep expanding the search while skipping over deleted documents, but I'm not sure about the performance + accuracy it would give (there's a [short discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] in the hnswlib repo on this point). The recent paper [Graph based Nearest Neighbor Search: Promises and Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based approaches and claims that the hierarchy of layers only really helps in low dimensions (Figure 4). In these experiments, they see that a 'flat' version of HNSW performs very similarly to the original above around 16 dimensions. The original HNSW paper also cites the hierarchy as most helpful in low dimensions. This seemed interesting in that it may be possible to avoid some complexity if the focus is not on low-dimensional vectors. (It also suggests that graph-based kNN is an active research area and that there are likely to be improvements + new approaches that come out. One such new approach is [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node|[https://suhasjs.github.io/files/diskann_neurips19.pdf]]). On the subject of testing recall, we are working on adding [sentence embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] datasets to the ann-benchmarks repo. Hopefully that will help provide some realistic shared data to test against. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > Time Spent: 40m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the g
[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017696#comment-17017696 ] Julie Tibshirani edited comment on LUCENE-9004 at 1/17/20 4:36 AM: --- Hello and thank you for this very exciting work! We have been doing research into nearest neighbor search on high-dimensional vectors and I wanted to share some thoughts here in the hope that they're helpful. Related to Adrien's comment about search filters, I am wondering how deleted documents would be handled. If I'm understanding correctly, a segment's deletes are applied 'on top of' the query. So if the k nearest neighbors to the query vector all happen to be deleted, then the query won't bring back any documents. From a user's perspective, I could see this behavior being surprising or hard to work with. One approach would be to keep expanding the search while skipping over deleted documents, but I'm not sure about the performance + accuracy it would give (there's a [short discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] in the hnswlib repo on this point). The recent paper [Graph based Nearest Neighbor Search: Promises and Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based approaches and claims that the hierarchy of layers only really helps in low dimensions (Figure 4). In these experiments, they see that a 'flat' version of HNSW performs very similarly to the original above around 16 dimensions. The original HNSW paper also cites the hierarchy as most helpful in low dimensions. This seemed interesting in that it may be possible to avoid some complexity if the focus is not on low-dimensional vectors. (It also suggests that graph-based kNN is an active research area and that there are likely to be improvements + new approaches that come out. One such new approach is [DiskANN Fast Accurate Billion-point Nearest Neighbor Search on a Single Node|https://suhasjs.github.io/files/diskann_neurips19.pdf]). On the subject of testing recall, we are working on adding [sentence embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] datasets to the ann-benchmarks repo. Hopefully that will help provide some realistic shared data to test against. was (Author: jtibshirani): Hello and thank you for this very exciting work! We have been doing research into nearest neighbor search on high-dimensional vectors and I wanted to share some thoughts here in the hope that they're helpful. Related to Adrien's comment about search filters, I am wondering how deleted documents would be handled. If I'm understanding correctly, a segment's deletes are applied 'on top of' the query. So if the k nearest neighbors to the query vector all happen to be deleted, then the query won't bring back any documents. From a user's perspective, I could see this behavior being surprising or hard to work with. One approach would be to keep expanding the search while skipping over deleted documents, but I'm not sure about the performance + accuracy it would give (there's a [short discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] in the hnswlib repo on this point). The recent paper [Graph based Nearest Neighbor Search: Promises and Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based approaches and claims that the hierarchy of layers only really helps in low dimensions (Figure 4). In these experiments, they see that a 'flat' version of HNSW performs very similarly to the original above around 16 dimensions. The original HNSW paper also cites the hierarchy as most helpful in low dimensions. This seemed interesting in that it may be possible to avoid some complexity if the focus is not on low-dimensional vectors. (It also suggests that graph-based kNN is an active research area and that there are likely to be improvements + new approaches that come out. One such new approach is [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node|[https://suhasjs.github.io/files/diskann_neurips19.pdf]]). On the subject of testing recall, we are working on adding [sentence embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] datasets to the ann-benchmarks repo. Hopefully that will help provide some realistic shared data to test against. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layer
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017713#comment-17017713 ] Cao Manh Dat commented on SOLR-12859: - I attached a draft patch for fixing the problem, the idea are * Setting isSolrThread inside {{DefaultSolrThreadFactory}}. That class is belonging to solr-core so it will always created by a node * if isSolrThread == true, set usr = "$" even incase of principal == null > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017714#comment-17017714 ] Cao Manh Dat commented on SOLR-12859: - Hmm, It seems that a better approach will be let's the test explicitly let the test set its thread as {{isSolrTestThread}}. > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.f
[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception
[ https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017751#comment-17017751 ] Radar Da Lei commented on SOLR-13240: - [~cpoerschke] Thanks for fixing this issue, we hit similar issue on Solr 7.4.0, do we have the plan to apply this fix to Solr 7.x? Thanks. > UTILIZENODE action results in an exception > -- > > Key: SOLR-13240 > URL: https://issues.apache.org/jira/browse/SOLR-13240 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.6 >Reporter: Hendrik Haddorp >Assignee: Christine Poerschke >Priority: Major > Fix For: master (9.0), 8.3 > > Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, > SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, > SOLR-13240.patch, solr-solrj-7.5.0.jar > > > When I invoke the UTILIZENODE action the REST call fails like this after it > moved a few replicas: > { > "responseHeader":{ > "status":500, > "QTime":40220}, > "Operation utilizenode caused > exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException: > Comparison method violates its general contract!", > "exception":{ > "msg":"Comparison method violates its general contract!", > "rspCode":-1}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Comparison method violates its general contract!", > "trace":"org.apache.solr.common.SolrException: Comparison method violates > its general contract!\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(Cha