date:20200116

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best 
to reuse the excellent work by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
less memory and disks when compared with HNSW. And IVFFlat supports both online 
and offline training.

I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best 
to reuse the excellent work by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval ca

[jira] [Resolved] (LUCENE-7146) "Latest SVN" needs replaced on the website

2020-01-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/LUCENE-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved LUCENE-7146.
-
Resolution: Won't Fix

Closing this. If we feel the need we can always add some git hash to the new 
website.

> "Latest SVN" needs replaced on the website
> --
>
> Key: LUCENE-7146
> URL: https://issues.apache.org/jira/browse/LUCENE-7146
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/website
>Reporter: Chris M. Hostetter
>Priority: Major
>
> Mike ask a little while back on dev@lucene...
> {noformat}
> On the bottom right of Lucene's index.html we have "Latest SVN" but of
> course it only displays this last svn commit:
>   r1726344 LUCENE-6937: moving trunk from SVN to GIT. (lucene) — dweiss
> Does anyone know how to convert this to the "Latest GIT"?
> {noformat}
> This isn't particularly straight forward, so filing an issue to track it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-16 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016707#comment-17016707
 ] 

Dawid Weiss commented on LUCENE-9134:
-

> I have to invoke javacc. Where do we get this from?

You declare a build script dependency and then just import from jar, as usual. 
buildscript dependencies don't need versions.props entries as they're evaluated 
early.

I don't debug those scripts in intellij - a println along the way does the job 
for me. I don't know if breakpoints will work with gradle files - if it's an 
interpreted script and not a precompiled one (which gets translated into a java 
class) then I doubt you can put a breakpoint in there. It is interpreted after 
all.

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch
>
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1176: LUCENE-9143 Add error-prone checks to build, but disabled

2020-01-16 Thread GitBox

dweiss commented on a change in pull request #1176: LUCENE-9143 Add error-prone 
checks to build, but disabled
URL: https://github.com/apache/lucene-solr/pull/1176#discussion_r367298139
 
 

 ##
 File path: gradle/defaults-java.gradle
 ##
 @@ -1,11 +1,51 @@
 // Configure Java project defaults.
 
-allprojects {
-  plugins.withType(JavaPlugin) {
+buildscript {
 
 Review comment:
   I'd prefer if you separated the configuration and application of this plugin 
into a separate file (validation/errorprone.gradle)? Then each file configures 
one thing. Sure - there is an overhead in multiple passes over project 
collection but I think it's worth knowing what each particular file does and it 
makes them shorter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] irvingzhang commented on a change in pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging

2020-01-16 Thread GitBox

irvingzhang commented on a change in pull request #1169: LUCENE-9004: A minor 
feature and patch -- support deleting vector values and fix segments merging
URL: https://github.com/apache/lucene-solr/pull/1169#discussion_r367305642
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90KnnGraphWriter.java
 ##
 @@ -216,8 +216,11 @@ private void mergeKnnGraph(FieldInfo mergeFieldInfo, 
final MergeState mergeState
   int docid;
   while ((docid = sub.nextDoc()) != NO_MORE_DOCS) {
 int mappedDocId = docMap.get(docid);
+/// deleted document (not alive)
+if (mappedDocId < 0) {
 
 Review comment:
   Thanks @mocobeta , I have corrected the condition for deleted docIds.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016760#comment-17016760
 ] 

ASF subversion and git services commented on SOLR-14128:


Commit 543505470c26f1ebb3ecd5ca57c411c03941a6a1 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5435054 ]

SOLR-14128: Tentative fix: put replicas on other nodes than overseer, wait for
all replicas to complete the reload.


> SystemCollectionCompatTest times out waiting for Overseer to do compatibility 
> checks
> 
>
> Key: SOLR-14128
> URL: https://issues.apache.org/jira/browse/SOLR-14128
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: fail.txt, nodeset.patch, pass.txt, 
> thetaphi_Lucene-Solr-master-Linux_25161.log.txt
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text

2020-01-16 Thread Chen Zhixiang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016766#comment-17016766
 ] 

Chen Zhixiang commented on LUCENE-9130:
---

Try to dump terms' positing info:

 private void debugOutputTermsInfo2(IndexReader indexReader, int doc, String 
fieldName) throws IOException {
 Terms terms = MultiTerms.getTerms(indexReader, fieldName);
 TermsEnum termIter = terms.iterator();
 while (termIter.next() != null) {
 PostingsEnum postingsEnum = termIter.postings(null, PostingsEnum.ALL);
 while (postingsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
 int freq = postingsEnum.freq();
 System.out.printf("term: %s, freq: %d,", termIter.term().utf8ToString(), freq);
 while (freq > 0) {
 System.out.printf(" nextPosition: %d,", postingsEnum.nextPosition());
 System.out.printf(" startOffset: %d, endOffset: %d",
 postingsEnum.startOffset(), postingsEnum.endOffset());
 freq--;
 }
 System.out.println();
 }
 }
 }

Output:

term: 1, freq: 1, nextPosition: 7, startOffset: -1, endOffset: -1
term: 2179, freq: 1, nextPosition: 0, startOffset: -1, endOffset: -1
term: 2184, freq: 1, nextPosition: 2, startOffset: -1, endOffset: -1
term: lg, freq: 1, nextPosition: 6, startOffset: -1, endOffset: -1
term: 入, freq: 1, nextPosition: 4, startOffset: -1, endOffset: -1

terms' position info is right(filtered terms take a position number), but no 
offset(invalid -1), is offset info needed in PhraseQuery?

> Failed to match when create PhraseQuery with terms analyzed from long query 
> text
> 
>
> Key: LUCENE-9130
> URL: https://issues.apache.org/jira/browse/LUCENE-9130
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Chen Zhixiang
>Priority: Major
> Attachments: LongTextFieldSearchTest.java
>
>
> When i use a long text (which is euqual to doc's StringField at indexing 
> time) to build a PhraseQuery, i cannot match the document. But BooleanQuery 
> with MUST/AND mode successes.
>  
> long query text is a address string: 
> "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)"
> test case is attached.
> logs:
>  
> 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, 
> 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, 
> lg, 长
> 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, 
> 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, 
> 开, 到, 底下, lg, 2
> 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 
> +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 
> +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg 
> +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 
> +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 
> +address:到 +address:底下 +address:lg +address:2)
> 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - 
> results.totalHits.value=1
> 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, 
> 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, 
> lg, 长
> 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, 
> 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, 
> 开, 到, 底下, lg, 2
> 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 
> 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2
> 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - 
> results.totalHits.value=0
> 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016768#comment-17016768
 ] 

ASF subversion and git services commented on LUCENE-9068:
-

Commit 7ea7ed72aca556f957a5de55911c852124db8715 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ea7ed7 ]

LUCENE-9068: Solr query handling code catches FuzzyTermsException


> Build FuzzyQuery automata up-front
> --
>
> Key: LUCENE-9068
> URL: https://issues.apache.org/jira/browse/LUCENE-9068
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> FuzzyQuery builds a set of levenshtein automata (one for each possible edit 
> distance) at rewrite time, and passes them between different TermsEnum 
> invocations using an attribute source.  This seems a bit needlessly 
> complicated, and also means that things like visiting a query end up building 
> the automata again.  We should instead build the automata at query 
> construction time, which is how AutomatonQuery does it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016767#comment-17016767
 ] 

ASF subversion and git services commented on LUCENE-9068:
-

Commit 89cfb906b6c6d08880ddf277e5792b04cf426a5c in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=89cfb90 ]

LUCENE-9068: Solr query handling code catches FuzzyTermsException


> Build FuzzyQuery automata up-front
> --
>
> Key: LUCENE-9068
> URL: https://issues.apache.org/jira/browse/LUCENE-9068
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> FuzzyQuery builds a set of levenshtein automata (one for each possible edit 
> distance) at rewrite time, and passes them between different TermsEnum 
> invocations using an attribute source.  This seems a bit needlessly 
> complicated, and also means that things like visiting a query end up building 
> the automata again.  We should instead build the automata at query 
> construction time, which is how AutomatonQuery does it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9068) Build FuzzyQuery automata up-front

2020-01-16 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016772#comment-17016772
 ] 

Alan Woodward commented on LUCENE-9068:
---

Should be fixed now - apologies, the failing test is marked as Slow so it was 
skipped when I ran tests locally.

> Build FuzzyQuery automata up-front
> --
>
> Key: LUCENE-9068
> URL: https://issues.apache.org/jira/browse/LUCENE-9068
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> FuzzyQuery builds a set of levenshtein automata (one for each possible edit 
> distance) at rewrite time, and passes them between different TermsEnum 
> invocations using an attribute source.  This seems a bit needlessly 
> complicated, and also means that things like visiting a query end up building 
> the automata again.  We should instead build the automata at query 
> construction time, which is how AutomatonQuery does it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9068) Build FuzzyQuery automata up-front

2020-01-16 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9068.
---
Resolution: Fixed

> Build FuzzyQuery automata up-front
> --
>
> Key: LUCENE-9068
> URL: https://issues.apache.org/jira/browse/LUCENE-9068
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> FuzzyQuery builds a set of levenshtein automata (one for each possible edit 
> distance) at rewrite time, and passes them between different TermsEnum 
> invocations using an attribute source.  This seems a bit needlessly 
> complicated, and also means that things like visiting a query end up building 
> the automata again.  We should instead build the automata at query 
> construction time, which is how AutomatonQuery does it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks

2020-01-16 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016795#comment-17016795
 ] 

Andrzej Bialecki commented on SOLR-14128:
-

Beasting with this fix but on a slow machine produced a different error, which 
occurred when trying to update the schema - this may be a variant of SOLR-13368:
{code:java}
  [beaster]   2> 10461 INFO  (qtp676755392-54) [n:127.0.0.1:34393_solr 
c:.system s:shard1 r:core_node2 x:.system_shard1_replica_n1 ] o.a.s.c.S.Request 
[.system_shard1_replica_n1]  webapp=/solr path=/schema 
params={wt=javabin&version=2} status=0 QTime=4
  [beaster]   2> 10475 ERROR (qtp676755392-49) [n:127.0.0.1:34393_solr 
c:.system s:shard1 r:core_node4 x:.system_shard1_replica_n3 ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error reading 
input String Can't find resource 'schema.xml' in classpath or 
'/configs/.system', cwd=/home/parallels/lucene-solr/solr/build/solr-core/test/J0
  [beaster]   2>at 
org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:94)
  [beaster]   2>at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
  [beaster]   2>at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2582)
  [beaster]   2>at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
  [beaster]   2>at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
  [beaster]   2>at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
  [beaster]   2>at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
  [beaster]   2>at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:166)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  [beaster]   2>at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
  [beaster]   2>at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
  [beaster]   2>at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
  [beaster]   2>at 
org.eclipse.jetty.server.Server.handle(Server.java:500)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
  [beaster]   2>at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
  [beaster]   2>at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
  [beaster]   2>at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
  [beaster]   2>at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
  [beaster]   2>at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
  [beaster]   2>at java.base/java.lang.Thread.run(Thread.java:834)
  [beaster]   2> Caused by: org.apache.solr.core.SolrResourceNotFoundException: 
Can't find resource 'schema.xml' in classpath or '/configs/.system', 
cwd=/home/pa

[jira] [Comment Edited] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks

2020-01-16 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016795#comment-17016795
 ] 

Andrzej Bialecki edited comment on SOLR-14128 at 1/16/20 11:15 AM:
---

Beasting with this fix but on a slow machine produced a different error, which 
occurred when trying to update the schema - this may be a variant of SOLR-13368.

This is fully reproducible when running on a Linux VM (macOS host), it occurs 
just after a couple beast runs.
{code:java}
  [beaster]   2> 10461 INFO  (qtp676755392-54) [n:127.0.0.1:34393_solr 
c:.system s:shard1 r:core_node2 x:.system_shard1_replica_n1 ] o.a.s.c.S.Request 
[.system_shard1_replica_n1]  webapp=/solr path=/schema 
params={wt=javabin&version=2} status=0 QTime=4
  [beaster]   2> 10475 ERROR (qtp676755392-49) [n:127.0.0.1:34393_solr 
c:.system s:shard1 r:core_node4 x:.system_shard1_replica_n3 ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error reading 
input String Can't find resource 'schema.xml' in classpath or 
'/configs/.system', cwd=/home/parallels/lucene-solr/solr/build/solr-core/test/J0
  [beaster]   2>at 
org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:94)
  [beaster]   2>at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
  [beaster]   2>at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2582)
  [beaster]   2>at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
  [beaster]   2>at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
  [beaster]   2>at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
  [beaster]   2>at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
  [beaster]   2>at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:166)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  [beaster]   2>at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
  [beaster]   2>at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
  [beaster]   2>at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
  [beaster]   2>at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717)
  [beaster]   2>at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
  [beaster]   2>at 
org.eclipse.jetty.server.Server.handle(Server.java:500)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
  [beaster]   2>at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
  [beaster]   2>at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
  [beaster]   2>at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
  [beaster]   2>at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
  [beaster]   2>at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
  [beaster]   2>at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
  [beaster]   2>at java.base/java.lang.Thread.run(Thread.java:834)

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. And I 
will try my best to reuse the excellent work by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core. And I will try my best 
to reuse the excellent work by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]].

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]]. And I 
will try my best to reuse the excellent work by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. Howeve

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], in very 
early stage.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], still 
very early.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned imple

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], still 
very early.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]].


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all

[GitHub] [lucene-solr] Sachpat opened a new pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7

2020-01-16 Thread GitBox

Sachpat opened a new pull request #1177: SOLR-13779: Use the safe fork of 
simple-xml for clustering contrib - for 7_7
URL: https://github.com/apache/lucene-solr/pull/1177
 
 
   # Description
   
   This is the fix backported from 8.3 to 7.7 as implemented at 
https://github.com/apache/lucene-solr/commit/2a1d5eea42d2bb372245480dd2961baf6fa06469
 as per the discussion at 
https://issues.apache.org/jira/browse/SOLR-13779?focusedCommentId=17016124&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17016124
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [x] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] Sachpat commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7

2020-01-16 Thread GitBox

Sachpat commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml 
for clustering contrib - for 7_7
URL: https://github.com/apache/lucene-solr/pull/1177#issuecomment-575149368
 
 
   @dweiss Please take a look at this PR. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13779) Use the safe fork of simple-xml for clustering contrib

2020-01-16 Thread Sachin Pattan (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016918#comment-17016918
 ] 

Sachin Pattan commented on SOLR-13779:
--

[~dweiss] I created the PR which is available at 
[https://github.com/apache/lucene-solr/pull/1177] . Also, I had created a PR 
for SOLR-13971 in 7.7x  
https://issues.apache.org/jira/browse/SOLR-13971?focusedCommentId=17014143&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17014143
 . So maybe it makes sense to create a release for 7.7x which includes both the 
fixes. 

> Use the safe fork of simple-xml for clustering contrib
> --
>
> Key: SOLR-13779
> URL: https://issues.apache.org/jira/browse/SOLR-13779
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija merged pull request #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes

2020-01-16 Thread GitBox

gerlowskija merged pull request #1163: SOLR-14186: Enforce CRLF in Windows 
files with .gitattributes
URL: https://github.com/apache/lucene-solr/pull/1163
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016921#comment-17016921
 ] 

ASF subversion and git services commented on SOLR-14186:


Commit 424ace6f5d729a01ed0a150fb126c9ca204e5b66 in lucene-solr's branch 
refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=424ace6 ]

SOLR-14186: Enforce CRLF in Windows files with .gitattributes (#1163)



> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9077) Gradle build

2020-01-16 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016941#comment-17016941
 ] 

Jason Gerlowski commented on LUCENE-9077:
-

Congrats and thanks for all your hard work in getting the gradle build to 
master [~dweiss]!

One question: When I run gradle on master and then later switch to a branch 
that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts 
and files as "Untracked files":

{code}
➜  lucene-solr git:(branch_8x) ✗ git status
On branch branch_8x
Your branch is up to date with 'origin/branch_8x'.

Untracked files:
  (use "git add ..." to include in what will be committed)

.gradle/
buildSrc/
gradle.properties
{code}

Is it reasonable to add those files to .gitignore on branch_8x?  I'm willing to 
file the sub-task and do it myself, just wanted to make sure there's not a 
reason you've avoided it so far.

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
> * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library versions and some JARs are excluded/ 
> moved around. I didn't try to force these as everything seems to work (tests, 
> etc.) – perhaps these differences should  be fixed in the ant build instead.
>  * [EOE] identify and port various "regenerate" tasks from ant builds 
> (javacc, precompiled automata, etc.)
>  * Fill in POM details in gradle/defaults-maven.gradle so that they reflect 
> the previous content better (dependencies aside).
>  * Add any IDE integration layers that should be added (I use IntelliJ and it 
>

[jira] [Created] (LUCENE-9144) Error message on 1D BKDWriter is wrong when adding too many points

2020-01-16 Thread Ignacio Vera (Jira)

Ignacio Vera created LUCENE-9144:


 Summary: Error message on 1D BKDWriter is wrong when adding too 
many points
 Key: LUCENE-9144
 URL: https://issues.apache.org/jira/browse/LUCENE-9144
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


The error message for the 1D BKD writer when adding too many points is wrong 
because:

1) It uses pointCount (which is always 0 at that point) instead of valueCount

2) It concatenate the numbers as a string instead of adding them.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9077) Gradle build

2020-01-16 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016941#comment-17016941
 ] 

Jason Gerlowski edited comment on LUCENE-9077 at 1/16/20 1:50 PM:
--

Congrats and thanks for all your hard work in getting the gradle build to 
master [~dweiss]!

One question: When I run gradle on master and then later switch to a branch 
that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts 
and files as "Untracked files":

{code}
➜  lucene-solr git:(branch_8x) ✗ git status
On branch branch_8x
Your branch is up to date with 'origin/branch_8x'.

Untracked files:
  (use "git add ..." to include in what will be committed)

.gradle/
buildSrc/
gradle.properties
{code}

Is it reasonable to add those files to .gitignore on branch_8x?  I'm willing to 
file the sub-task and do it myself, just wanted to make sure there's not a 
reason you've avoided it so far.

*EDIT* Hmm, it looks like {{ant precommit}} on branch_8x fails because of these 
files.  Maybe it's best not to hide them since they can cause other issues.


was (Author: gerlowskija):
Congrats and thanks for all your hard work in getting the gradle build to 
master [~dweiss]!

One question: When I run gradle on master and then later switch to a branch 
that doesn't have gradle (e.g. branch_8x), git sees gradle's build artifacts 
and files as "Untracked files":

{code}
➜  lucene-solr git:(branch_8x) ✗ git status
On branch branch_8x
Your branch is up to date with 'origin/branch_8x'.

Untracked files:
  (use "git add ..." to include in what will be committed)

.gradle/
buildSrc/
gradle.properties
{code}

Is it reasonable to add those files to .gitignore on branch_8x?  I'm willing to 
file the sub-task and do it myself, just wanted to make sure there's not a 
reason you've avoided it so far.

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
> * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equival

[GitHub] [lucene-solr] iverase opened a new pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter

2020-01-16 Thread GitBox

iverase opened a new pull request #1178: LUCENE-9144: Fix error message on 
OneDimensionBKDWriter
URL: https://github.com/apache/lucene-solr/pull/1178
 
 
   See https://issues.apache.org/jira/browse/LUCENE-9144


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points

2020-01-16 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-9144:
-
Summary: Error message on OneDimensionBKDWriter is wrong when adding too 
many points  (was: Error message on 1D BKDWriter is wrong when adding too many 
points)

> Error message on OneDimensionBKDWriter is wrong when adding too many points
> ---
>
> Key: LUCENE-9144
> URL: https://issues.apache.org/jira/browse/LUCENE-9144
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error message for the 1D BKD writer when adding too many points is wrong 
> because:
> 1) It uses pointCount (which is always 0 at that point) instead of valueCount
> 2) It concatenate the numbers as a string instead of adding them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016966#comment-17016966
 ] 

ASF subversion and git services commented on SOLR-14186:


Commit 8c2e800cae6f8f53b8189b26cc443aa070025a2c in lucene-solr's branch 
refs/heads/branch_8x from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8c2e800 ]

SOLR-14186: Introduce gitattributes to manage EOL


> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016973#comment-17016973
 ] 

ASF subversion and git services commented on SOLR-14130:


Commit 99ec7dcd261ccf4d3f95d5af4ef0a18b91bee3f4 in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=99ec7dc ]

SOLR-14130: Add parsing instructions for different types of query records


> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query but require extra storage for graphs [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Each of them has its merits and demerits. Since HNSW is now under development, 
it may be better to provide IVFFlat for an alternative choice.

I will soon commit my personal implementations.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
much less memory and disks when compared with HNSW [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
And IVFFlat supports both online and offline training.

I'm now trying to introduce the IVFFlat to Lucene core in my person branch 
[[https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]], in very 
early stage.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Summary: Introduce IVFFlat to Lucene for ANN similarity search  (was: 
Introduce IVFFlat for ANN similarity search)

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> IVFFlat and HNSW are the most popular ones among all the algorithms. 
> Recently, implementation of ANN algorithms for Lucene, such as HNSW 
> (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. 
> IVFFlat has smaller index size but requires k-means clustering, while HNSW is 
> faster in query but require extra storage for graphs [indexing 1M 
> vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]].
>  Each of them has its merits and demerits. Since HNSW is now under 
> development, it may be better to provide IVFFlat for an alternative choice.
> I will soon commit my personal implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor

2020-01-16 Thread Zsolt Gyulavari (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016995#comment-17016995
 ] 

Zsolt Gyulavari commented on SOLR-14070:


ZK could have been firewalled if the clients wouldn't talk to them directly.

> Deprecate CloudSolrClient's ZKHost constructor
> --
>
> Key: SOLR-14070
> URL: https://issues.apache.org/jira/browse/SOLR-14070
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> CloudSolrClient can be used by pointing it to a ZK cluster. This is 
> inherently insecure.
> CSC can already be used in all the same ways by pointing it to a Solr cluster.
> Proposing to add a deprecation notice to the following constructor to the 
> CloudSolrClient#Builder:
> {code}
> public Builder(List zkHosts, Optional zkChroot)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor

2020-01-16 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017006#comment-17017006
 ] 

Jan Høydahl commented on SOLR-14070:


bq. ZK could have been firewalled if the clients wouldn't talk to them directly.

CloudSolrClient already supports bootsrapping from Solr URLs, which would be 
the recommended way. It is also easier to implement in Solr clients for other 
languages such as PHP, C# etc without worrying about ZK. So. you are free to 
firewall your ZK today if you wish :)

I'm +0 on the idea. We could deprecate and then not remove it until v10.0, just 
to force users into reconsidering their choice. We could also add more JavaDocs 
to the deprecated constructor, reminding people to limit access to their ZK as 
much as possible.

> Deprecate CloudSolrClient's ZKHost constructor
> --
>
> Key: SOLR-14070
> URL: https://issues.apache.org/jira/browse/SOLR-14070
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> CloudSolrClient can be used by pointing it to a ZK cluster. This is 
> inherently insecure.
> CSC can already be used in all the same ways by pointing it to a Solr cluster.
> Proposing to add a deprecation notice to the following constructor to the 
> CloudSolrClient#Builder:
> {code}
> public Builder(List zkHosts, Optional zkChroot)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #1178: LUCENE-9144: Fix error message on OneDimensionBKDWriter

2020-01-16 Thread GitBox

iverase merged pull request #1178: LUCENE-9144: Fix error message on 
OneDimensionBKDWriter
URL: https://github.com/apache/lucene-solr/pull/1178
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017018#comment-17017018
 ] 

ASF subversion and git services commented on LUCENE-9144:
-

Commit eb13d5bc8b3b0497ce2aca3d99e37884dc54599a in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eb13d5b ]

LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points 
are added to the writer. (#1178)




> Error message on OneDimensionBKDWriter is wrong when adding too many points
> ---
>
> Key: LUCENE-9144
> URL: https://issues.apache.org/jira/browse/LUCENE-9144
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The error message for the 1D BKD writer when adding too many points is wrong 
> because:
> 1) It uses pointCount (which is always 0 at that point) instead of valueCount
> 2) It concatenate the numbers as a string instead of adding them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017032#comment-17017032
 ] 

ASF subversion and git services commented on LUCENE-9144:
-

Commit ced06d7086a870a1dbab6af841132daf1f4c4c68 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ced06d7 ]

LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points 
are added to the writer. (#1178)




> Error message on OneDimensionBKDWriter is wrong when adding too many points
> ---
>
> Key: LUCENE-9144
> URL: https://issues.apache.org/jira/browse/LUCENE-9144
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The error message for the 1D BKD writer when adding too many points is wrong 
> because:
> 1) It uses pointCount (which is always 0 at that point) instead of valueCount
> 2) It concatenate the numbers as a string instead of adding them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points

2020-01-16 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9144.
--
Fix Version/s: 8.5
 Assignee: Ignacio Vera
   Resolution: Fixed

> Error message on OneDimensionBKDWriter is wrong when adding too many points
> ---
>
> Key: LUCENE-9144
> URL: https://issues.apache.org/jira/browse/LUCENE-9144
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The error message for the 1D BKD writer when adding too many points is wrong 
> because:
> 1) It uses pointCount (which is always 0 at that point) instead of valueCount
> 2) It concatenate the numbers as a string instead of adding them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

2020-01-16 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017089#comment-17017089
 ] 

Bruno Roustant commented on LUCENE-9125:


Here it is:

                    Task   QPS trunk      StdDev   QPS patch      StdDev        
        Pct diff

    HighIntervalsOrdered      463.57     (13.2%)      443.74     (19.6%)   
-4.3% ( -32% -   32%)

                 Respell      382.45     (14.7%)      374.88     (21.3%)   
-2.0% ( -33% -   39%)

               OrHighLow     1746.37      (6.8%)     1737.44      (7.0%)   
-0.5% ( -13% -   14%)

              AndHighLow     4208.34      (6.1%)     4186.85      (5.8%)   
-0.5% ( -11% -   12%)

                HighTerm     5697.99      (7.5%)     5673.66      (5.1%)   
-0.4% ( -12% -   13%)

   BrowseMonthTaxoFacets     4679.40      (3.7%)     4664.60      (2.6%)   
-0.3% (  -6% -    6%)

                 Prefix3      442.09     (17.3%)      441.77     (16.6%)   
-0.1% ( -28% -   40%)

    BrowseDateTaxoFacets     4104.50      (3.4%)     4102.05      (2.8%)   
-0.1% (  -6% -    6%)

               OrHighMed      681.54     (11.8%)      681.70     (10.6%)    
0.0% ( -20% -   25%)

             AndHighHigh      978.85      (8.3%)      979.47      (9.9%)    
0.1% ( -16% -   19%)

BrowseDayOfYearTaxoFacets     3615.56      (2.8%)     3620.94      (2.4%)    
0.1% (  -4% -    5%)

                 MedTerm     5964.33      (5.7%)     5980.59      (5.8%)    
0.3% ( -10% -   12%)

                 LowTerm     6555.56      (4.8%)     6576.49      (5.3%)    
0.3% (  -9% -   10%)

                  Fuzzy2       73.24     (16.4%)       73.55     (16.1%)    
0.4% ( -27% -   39%)

                  Fuzzy1      887.86      (5.3%)      892.14      (2.7%)    
0.5% (  -7% -    8%)

              HighPhrase      901.57      (5.7%)      905.94      (6.6%)    
0.5% ( -11% -   13%)

              OrHighHigh      741.70     (11.5%)      745.44      (8.4%)    
0.5% ( -17% -   23%)

   BrowseMonthSSDVFacets     3462.54      (4.2%)     3480.43      (3.0%)    
0.5% (  -6% -    8%)

        HighSloppyPhrase      617.51      (6.9%)      620.74      (7.8%)    
0.5% ( -13% -   16%)

                PKLookup      275.55      (5.2%)      277.01      (5.0%)    
0.5% (  -9% -   11%)

         MedSloppyPhrase     1843.18      (4.7%)     1853.23      (3.8%)    
0.5% (  -7% -    9%)

         LowSloppyPhrase     2085.07      (4.3%)     2098.25      (3.9%)    
0.6% (  -7% -    9%)

BrowseDayOfYearSSDVFacets     2985.60      (2.5%)     3009.10      (2.6%)    
0.8% (  -4% -    6%)

              AndHighMed     1712.96      (5.8%)     1729.47      (4.5%)    
1.0% (  -8% -   12%)

             LowSpanNear     2006.25      (6.2%)     2029.83      (6.0%)    
1.2% ( -10% -   14%)

             MedSpanNear      814.10     (12.3%)      823.97     (10.1%)    
1.2% ( -18% -   26%)

            HighSpanNear      593.47     (10.3%)      600.77     (10.6%)    
1.2% ( -17% -   24%)

   HighTermDayOfYearSort     1035.41      (7.8%)     1050.76      (6.5%)    
1.5% ( -11% -   17%)

                Wildcard      772.44     (10.7%)      791.42     (12.7%)    
2.5% ( -18% -   28%)

               MedPhrase      806.70      (8.7%)      827.27      (8.1%)    
2.5% ( -13% -   21%)

               LowPhrase      805.91      (7.9%)      831.26      (5.3%)    
3.1% (  -9% -   17%)

                  IntNRQ     1898.15      (8.1%)     1967.24      (9.8%)    
3.6% ( -13% -   23%)

       HighTermMonthSort     3150.77     (12.1%)     3300.42     (13.5%)    
4.7% ( -18% -   34%)

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017284#comment-17017284
 ] 

ASF subversion and git services commented on SOLR-14130:


Commit 35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1 in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35d8e3d ]

SOLR-14130: Continue to improve log parsing logic


> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9145) Address warnings found by static analysis

2020-01-16 Thread Mike Drob (Jira)

Mike Drob created LUCENE-9145:
-

 Summary: Address warnings found by static analysis
 Key: LUCENE-9145
 URL: https://issues.apache.org/jira/browse/LUCENE-9145
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Mike Drob
Assignee: Mike Drob






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017288#comment-17017288
 ] 

ASF subversion and git services commented on SOLR-14130:


Commit f48b5f9324532169ddf41e4cb52b5f628b5bc31b in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f48b5f9 ]

SOLR-14130: Continue to improve log parsing logic


> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9145) Address warnings found by static analysis

2020-01-16 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017292#comment-17017292
 ] 

Kevin Risden commented on LUCENE-9145:
--

[~mdrob] I'm all for this and probably easier with just Gradle. I think there 
are a few older jiras related to this as well. Might help to link them? Pmd, 
javac warnings, etc. I can dig them up too if it helps.

> Address warnings found by static analysis
> -
>
> Key: LUCENE-9145
> URL: https://issues.apache.org/jira/browse/LUCENE-9145
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9077) Gradle build

2020-01-16 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017300#comment-17017300
 ] 

Mike Drob commented on LUCENE-9077:
---

Yes, we should add all of those to .gitignore and then figure out how to make 
ant precommit stop complaining as well. They're just like any other build files.

Specifically, you should be able to add just {{.gradle}} and 
{{gradle.properties}}, no need to add buildSrc since the only thing inside of 
it is another .gradle dir.

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
> * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library versions and some JARs are excluded/ 
> moved around. I didn't try to force these as everything seems to work (tests, 
> etc.) – perhaps these differences should  be fixed in the ant build instead.
>  * [EOE] identify and port various "regenerate" tasks from ant builds 
> (javacc, precompiled automata, etc.)
>  * Fill in POM details in gradle/defaults-maven.gradle so that they reflect 
> the previous content better (dependencies aside).
>  * Add any IDE integration layers that should be added (I use IntelliJ and it 
> imports the project out of the box, without the need for any special tuning).
>  * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; 
> currently XSLT...)
>  * I didn't bother adding Solr dist/test-framework to packaging (who'd use it 
> from a binary distribution? 
>  
> *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and 
> Cao Mạnh Đạt but also applies lessons learned from t

[jira] [Commented] (LUCENE-9077) Gradle build

2020-01-16 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017302#comment-17017302
 ] 

Mike Drob commented on LUCENE-9077:
---

Another clean up question is when we would feel comfortable removing the maven 
shadow build from master.

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
> * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library versions and some JARs are excluded/ 
> moved around. I didn't try to force these as everything seems to work (tests, 
> etc.) – perhaps these differences should  be fixed in the ant build instead.
>  * [EOE] identify and port various "regenerate" tasks from ant builds 
> (javacc, precompiled automata, etc.)
>  * Fill in POM details in gradle/defaults-maven.gradle so that they reflect 
> the previous content better (dependencies aside).
>  * Add any IDE integration layers that should be added (I use IntelliJ and it 
> imports the project out of the box, without the need for any special tuning).
>  * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; 
> currently XSLT...)
>  * I didn't bother adding Solr dist/test-framework to packaging (who'd use it 
> from a binary distribution? 
>  
> *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and 
> Cao Mạnh Đạt but also applies lessons learned from those two efforts:
>  * *Do not try to do too many things at once*. If we deviate too far from 
> master, the branch will be hard to merge.
>  * *Do everything in baby-steps* and add small, independent build fragments 
> rep

[jira] [Commented] (LUCENE-9143) Add more static analysis and clean up resulting warnings/errors

2020-01-16 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017303#comment-17017303
 ] 

Kevin Risden commented on LUCENE-9143:
--

[~mdrob] I'm all for this and probably easier with just Gradle. I think there 
are a few older jiras related to this as well. Might help to link them? Pmd, 
javac warnings, etc. I can dig them up too if it helps.

(previously commented on the subtask by mistake from my phone and now also see 
the PR :) )

> Add more static analysis and clean up resulting warnings/errors
> ---
>
> Key: LUCENE-9143
> URL: https://issues.apache.org/jira/browse/LUCENE-9143
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Part of the discussion with Mark Miller was the need for better bug finding - 
> especially in tricky areas like concurrency. One of the ways we can do this 
> is with added static analysis and increased tooling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-9145) Address warnings found by static analysis

2020-01-16 Thread Kevin Risden (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-9145:
-
Comment: was deleted

(was: [~mdrob] I'm all for this and probably easier with just Gradle. I think 
there are a few older jiras related to this as well. Might help to link them? 
Pmd, javac warnings, etc. I can dig them up too if it helps.)

> Address warnings found by static analysis
> -
>
> Key: LUCENE-9145
> URL: https://issues.apache.org/jira/browse/LUCENE-9145
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9143) Add more static analysis and clean up resulting warnings/errors

2020-01-16 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017313#comment-17017313
 ] 

Erick Erickson commented on LUCENE-9143:


I've been going to get around to these forever. I haven't done any work on 
them, mostly here to close if this JIRA takes care of them.

+1 to only working with the Gradle build, and by implication master. especially 
as sometime in the not-too-distant future we won't be supporting Java 8 any 
more.

> Add more static analysis and clean up resulting warnings/errors
> ---
>
> Key: LUCENE-9143
> URL: https://issues.apache.org/jira/browse/LUCENE-9143
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Part of the discussion with Mark Miller was the need for better bug finding - 
> especially in tricky areas like concurrency. One of the ways we can do this 
> is with added static analysis and increased tooling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017314#comment-17017314
 ] 

ASF subversion and git services commented on SOLR-14186:


Commit 7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d in lucene-solr's branch 
refs/heads/branch_8_4 from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d3ac7c ]

SOLR-14186: Introduce gitattributes to manage EOL


> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread Jason Gerlowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-14186:
--

Assignee: Jason Gerlowski

> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017317#comment-17017317
 ] 

Jason Gerlowski commented on SOLR-14186:


I've committed a gitattributes file to all of the branches that might see 
subsequent releases: master, branch_8x and branch_8_4.  That way anything 
released from these branches should avoid accidentally releasing a broken 
Windows script, etc.

> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14186) Ensure Windows files retain CRLF endings

2020-01-16 Thread Jason Gerlowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-14186.

Fix Version/s: 8.5
   master (9.0)
   8.4.2
   Resolution: Fixed

> Ensure Windows files retain CRLF endings
> 
>
> Key: SOLR-14186
> URL: https://issues.apache.org/jira/browse/SOLR-14186
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0), 8.4
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: 8.4.2, master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've had several recent instances where our Windows files (solr.cmd, 
> solr.in.cmd) end up getting their Windows-specific line-endings stripped out. 
>  This causes chunks of those scripts to fail when run on Windows.
> e.g. SOLR-13977 fixed an issue where {{bin\solr.cmd create -c}} failed, and 
> the problem was fixed and recurred again within a week.
> Generally, contributors/committers can prevent this by setting their 
> {{core.autocrlf}} git setting to {{input}}.  But we should also put 
> repository-wide settings in place exempting certain files from line-ending 
> conversion entirely.
> This issue proposes adding a .gitattributes setting to special-case 
> OS-specific files (bash scripts, Windows batch files, etc.)  This will 
> prevent solr.cmd's line endings from being changed by committers who forget 
> to configure the setting on a new machine, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] asfgit merged pull request #1173: LUCENE-8369: Remove obsolete spatial module

2020-01-16 Thread GitBox

asfgit merged pull request #1173: LUCENE-8369: Remove obsolete spatial module
URL: https://github.com/apache/lucene-solr/pull/1173
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017319#comment-17017319
 ] 

ASF subversion and git services commented on LUCENE-8369:
-

Commit 78655239c58a1ed72d6e015dd05a0b355c936999 in lucene-solr's branch 
refs/heads/master from Nicholas Knize
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7865523 ]

LUCENE-8369: Remove obsolete spatial module


> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9077) Gradle build

2020-01-16 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017324#comment-17017324
 ] 

Dawid Weiss commented on LUCENE-9077:
-

There are a number of unanswered questions, actually. For now, I'd consider the 
gradle build structure *not* compatible with ant (whether on master or on 
branch_8x). So I wouldn't recommend doing live branch switches without a full 
clean. This means: when you switch branches (or build systems), run:
{code}
git clean -xfd .
{code}

Ignoring temporary files is one thing but ant happily sucks in jar files for 
sha checks from build/ folders etc. I myself do the above or just keep separate 
branches for 8x and master... 

If you're desperate then feel free to work on this and file a pull request, 
please - I just don't consider it a priority for now.

> Gradle build
> 
>
> Key: LUCENE-9077
> URL: https://issues.apache.org/jira/browse/LUCENE-9077
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This task focuses on providing gradle-based build equivalent for Lucene and 
> Solr (on master branch). See notes below on why this respin is needed.
> The code lives on *gradle-master* branch. It is kept with sync with *master*. 
> Try running the following to see an overview of helper guides concerning 
> typical workflow, testing and ant-migration helpers:
> gradlew :help
> A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. Once you have a 
> patch/ pull request let me (dweiss) know - I'll try to coordinate the merges.
>  * (/) Apply forbiddenAPIs
>  * (/) Generate hardware-aware gradle defaults for parallelism (count of 
> workers and test JVMs).
>  * (/) Fail the build if --tests filter is applied and no tests execute 
> during the entire build (this allows for an empty set of filtered tests at 
> single project level).
>  * (/) Port other settings and randomizations from common-build.xml
>  * (/) Configure security policy/ sandboxing for tests.
>  * (/) test's console output on -Ptests.verbose=true
>  * (/) add a :helpDeps explanation to how the dependency system works 
> (palantir plugin, lockfile) and how to retrieve structured information about 
> current dependencies of a given module (in a tree-like output).
>  * (/) jar checksums, jar checksum computation and validation. This should be 
> done without intermediate folders (directly on dependency sets).
>  * (/) verify min. JVM version and exact gradle version on build startup to 
> minimize odd build side-effects
>  * (/) Repro-line for failed tests/ runs.
>  * (/) add a top-level README note about building with gradle (and the 
> required JVM).
>  * (/) add an equivalent of 'validate-source-patterns' 
> (check-source-patterns.groovy) to precommit.
>  * (/) add an equivalent of 'rat-sources' to precommit.
>  * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) 
> to precommit.
> * (/) javadoc compilation
> Hard-to-implement stuff already investigated:
>  * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
> to be any way to do this in a reasonably efficient way. There are onOutput 
> listeners but they're slow to operate and solr tests emit *tons* of output so 
> it's an overkill.-
>  * (!) (LUCENE-9120) *Tests working with security-debug logs or other 
> JVM-early log output*. Gradle's test runner works by redirecting Java's 
> stdout/ syserr so this just won't work. Perhaps we can spin the ant-based 
> test runner for such corner-cases.
> Of lesser importance:
>  * Add an equivalent of 'documentation-lint" to precommit.
>  * (/) add rendering of javadocs (gradlew javadoc)
>  * Attach javadocs to maven publications.
>  * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
> it'll be difficult to run it sensibly because gradle doesn't offer cwd 
> separation for the forked test runners.
>  * if you diff solr packaged distribution against ant-created distribution 
> there are minor differences in library versions and some JARs are excluded/ 
> moved around. I didn't try to force these as everything seems to work (tests, 
> etc.) – perhaps these differences should  be fixed in the ant build instead.
>  * [EOE] identify and port various "regenerate" tasks from ant builds 
> (javacc, precompiled automata, etc.)
>  * Fill in POM details in gradle/defaults-maven.gradle so that they reflect 
> the previous content better (dependencies aside).
>  * Add any IDE integration layers that should be added (I use IntelliJ and it 
> imports the project out of the box, without the need for any special tuning).
>  * Add Sol

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017333#comment-17017333
 ] 

ASF subversion and git services commented on LUCENE-8369:
-

Commit c0c775799c1c1f69d146336016bcd4c6ffdd2ce8 in lucene-solr's branch 
refs/heads/branch_8x from Nicholas Knize
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c0c7757 ]

LUCENE-8369: Remove obsolete spatial module


> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8369) Remove the spatial module as it is obsolete

2020-01-16 Thread Nick Knize (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Knize updated LUCENE-8369:
---
Resolution: Resolved
Status: Resolved  (was: Patch Available)

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on issue #1173: LUCENE-8369: Remove obsolete spatial module

2020-01-16 Thread GitBox

nknize commented on issue #1173: LUCENE-8369: Remove obsolete spatial module
URL: https://github.com/apache/lucene-solr/pull/1173#issuecomment-575264274
 
 
   Thx @dsmiley! Merged and back ported to 8.x


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader

2020-01-16 Thread Andrzej Bialecki (Jira)

Andrzej Bialecki created SOLR-14192:
---

 Summary: Race condition between SchemaManager and 
ZkIndexSchemaReader
 Key: SOLR-14192
 URL: https://issues.apache.org/jira/browse/SOLR-14192
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.4
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
 Fix For: 8.5


Spin-off from SOLR-14128 and SOLR-13368.

In SolrCloud when a SolrCore is created and it uses managed schema then its 
{{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial 
{{schema.xml}} to {{managed-schema}}. This includes removing the original 
{{schema.xml}} file.

SOLR-13368 added some locking to make sure the changed resource name (i.e. 
{{managed-schema}}) becomes visible only when this process is complete, and 
that in-flight requests to /admin/schema block until this process is complete, 
to avoid returning inconsistent data. This locking mechanism uses simple Object 
monitors.

However, if there's more than 1 node in the cluster the subsequent request to 
retrieve schema may execute on a core that still hasn't reloaded its schema 
({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to 
trigger), and the resource name in that stale schema still points to 
{{schema.xml}}, which by this time no longer exists because it was removed by 
{{ManagedIndexSchemaFactory}} in the first core.

As I see it there are two bugs here:
 # there's no distributed locking when this upgrade is performed, so it's 
natural that there are multiple cores racing against each other to perform this 
upgrade.
 # the upgrade process removes {{schema.xml}} too early - it triggers all other 
cores by creating the {{managed-schema}} file, and then other cores reload from 
the new managed schema - but it should wait until this reload is complete on 
all cores because only then it's safe to delete the non-managed resource as 
it's no longer in use by any core.

Issue 1. can be solved by adding an ephemeral znode lock so that only one core 
can perform the upgrade. Issue 2. can be solved by using 
{{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and 
deleting {{schema.xml}} only after it's done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss merged pull request #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7

2020-01-16 Thread GitBox

dweiss merged pull request #1177: SOLR-13779: Use the safe fork of simple-xml 
for clustering contrib - for 7_7
URL: https://github.com/apache/lucene-solr/pull/1177
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml for clustering contrib - for 7_7

2020-01-16 Thread GitBox

dweiss commented on issue #1177: SOLR-13779: Use the safe fork of simple-xml 
for clustering contrib - for 7_7
URL: https://github.com/apache/lucene-solr/pull/1177#issuecomment-575268931
 
 
   Thank you!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14070) Deprecate CloudSolrClient's ZKHost constructor

2020-01-16 Thread Zsolt Gyulavari (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017358#comment-17017358
 ] 

Zsolt Gyulavari commented on SOLR-14070:


Totally agree, Jan.

I just wanted to give a reason for why it would be a more secure to deprecate 
it (and use a FW).

Happy to hear that's already possible, wasn't aware of it.

> Deprecate CloudSolrClient's ZKHost constructor
> --
>
> Key: SOLR-14070
> URL: https://issues.apache.org/jira/browse/SOLR-14070
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> CloudSolrClient can be used by pointing it to a ZK cluster. This is 
> inherently insecure.
> CSC can already be used in all the same ways by pointing it to a Solr cluster.
> Proposing to add a deprecation notice to the following constructor to the 
> CloudSolrClient#Builder:
> {code}
> public Builder(List zkHosts, Optional zkChroot)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-16 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017369#comment-17017369
 ] 

Dawid Weiss commented on LUCENE-9134:
-

I'll be with my kids for the weekend, perhaps [~mdrob] would like to jump in on 
this one and try to give you a headstart, Erick? I'd really start with 
something super-simple and proceed to attach other tasks from there. Jflex/ 
javacc seem like good candidates to me and Mike has some experience and gut 
feeling where I think it should be going (we discussed it on another issue).

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch
>
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017407#comment-17017407
 ] 

ASF subversion and git services commented on SOLR-14184:


Commit 5f2d7c4855987670489d68884c787e4cfb377fa9 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f2d7c4 ]

SOLR-14184: Internal 'test' variable DirectUpdateHandler2.commitOnClose has 
been removed and replaced with TestInjection.skipIndexWriterCommitOnClose


> replace DirectUpdateHandler2.commitOnClose with something in TestInjection
> --
>
> Key: SOLR-14184
> URL: https://issues.apache.org/jira/browse/SOLR-14184
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14184.patch, SOLR-14184.patch
>
>
> {code:java}
> public static volatile boolean commitOnClose = true;  // TODO: make this a 
> real config option or move it to TestInjection
> {code}
> Lots of tests muck with this (to simulate unclean shutdown and force tlog 
> replay on restart) but there's no garuntee that it is reset properly.
> It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
> up by {{TestInjection.reset()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection

2020-01-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017428#comment-17017428
 ] 

ASF subversion and git services commented on SOLR-14184:


Commit bb48773cdc279403b8c6af82f4f52b247a1e61c1 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bb48773 ]

SOLR-14184: Internal 'test' variable DirectUpdateHandler2.commitOnClose has 
been removed and replaced with TestInjection.skipIndexWriterCommitOnClose

(cherry picked from commit 5f2d7c4855987670489d68884c787e4cfb377fa9)


> replace DirectUpdateHandler2.commitOnClose with something in TestInjection
> --
>
> Key: SOLR-14184
> URL: https://issues.apache.org/jira/browse/SOLR-14184
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14184.patch, SOLR-14184.patch
>
>
> {code:java}
> public static volatile boolean commitOnClose = true;  // TODO: make this a 
> real config option or move it to TestInjection
> {code}
> Lots of tests muck with this (to simulate unclean shutdown and force tlog 
> replay on restart) but there's no garuntee that it is reset properly.
> It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
> up by {{TestInjection.reset()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6336) AnalyzingInfixSuggester needs duplicate handling

2020-01-16 Thread Michal Hlavac (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017458#comment-17017458
 ] 

Michal Hlavac commented on LUCENE-6336:
---

It's not general solution, but I tried to override add method to basically 
create or update existing document and it works. Of course, it doesn't work 
with weightField and payloadField, but in my scenario with only field usage it 
works:
{code:java}
public class DedupAnalyzingInfixSuggester extends AnalyzingInfixSuggester {

public DedupAnalyzingInfixSuggester(Directory dir, Analyzer analyzer) 
throws IOException {
super(dir, analyzer);
}

// ... Other constructors ...

@Override
public void add(BytesRef text, Set contexts, long weight, 
BytesRef payload) throws IOException {
update(text, contexts, weight, payload);
}
}
{code}

> AnalyzingInfixSuggester needs duplicate handling
> 
>
> Key: LUCENE-6336
> URL: https://issues.apache.org/jira/browse/LUCENE-6336
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.10.3, 5.0
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: lookup, suggester
> Attachments: LUCENE-6336.patch
>
>
> Spinoff from LUCENE-5833 but else unrelated.
> Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and 
> stores payload and score together with the suggest text.
> I did some testing with Solr, producing the DocumentDictionary from an index 
> with multiple documents containing the same text, but with random weights 
> between 0-100. Then I got duplicate identical suggestions sorted by weight:
> {code}
> {
>   "suggest":{"languages":{
>   "engl":{
> "numFound":101,
> "suggestions":[{
> "term":"English",
> "weight":100,
> "payload":"0"},
>   {
> "term":"English",
> "weight":99,
> "payload":"0"},
>   {
> "term":"English",
> "weight":98,
> "payload":"0"},
> ---etc all the way down to 0---
> {code}
> I also reproduced the same behavior in AnalyzingInfixSuggester directly. So 
> there is a need for some duplicate removal here, either while building the 
> local suggest index or during lookup. Only the highest weight suggestion for 
> a given term should be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14128) SystemCollectionCompatTest times out waiting for Overseer to do compatibility checks

2020-01-16 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017465#comment-17017465
 ] 

Andrzej Bialecki commented on SOLR-14128:
-

Follow-up for the schema update bug in SOLR-14192.

> SystemCollectionCompatTest times out waiting for Overseer to do compatibility 
> checks
> 
>
> Key: SOLR-14128
> URL: https://issues.apache.org/jira/browse/SOLR-14128
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: fail.txt, nodeset.patch, pass.txt, 
> thetaphi_Lucene-Solr-master-Linux_25161.log.txt
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader

2020-01-16 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14192:

Attachment: SOLR-14192.patch

> Race condition between SchemaManager and ZkIndexSchemaReader
> 
>
> Key: SOLR-14192
> URL: https://issues.apache.org/jira/browse/SOLR-14192
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14192.patch
>
>
> Spin-off from SOLR-14128 and SOLR-13368.
> In SolrCloud when a SolrCore is created and it uses managed schema then its 
> {{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial 
> {{schema.xml}} to {{managed-schema}}. This includes removing the original 
> {{schema.xml}} file.
> SOLR-13368 added some locking to make sure the changed resource name (i.e. 
> {{managed-schema}}) becomes visible only when this process is complete, and 
> that in-flight requests to /admin/schema block until this process is 
> complete, to avoid returning inconsistent data. This locking mechanism uses 
> simple Object monitors.
> However, if there's more than 1 node in the cluster the subsequent request to 
> retrieve schema may execute on a core that still hasn't reloaded its schema 
> ({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to 
> trigger), and the resource name in that stale schema still points to 
> {{schema.xml}}, which by this time no longer exists because it was removed by 
> {{ManagedIndexSchemaFactory}} in the first core.
> As I see it there are two bugs here:
>  # there's no distributed locking when this upgrade is performed, so it's 
> natural that there are multiple cores racing against each other to perform 
> this upgrade.
>  # the upgrade process removes {{schema.xml}} too early - it triggers all 
> other cores by creating the {{managed-schema}} file, and then other cores 
> reload from the new managed schema - but it should wait until this reload is 
> complete on all cores because only then it's safe to delete the non-managed 
> resource as it's no longer in use by any core.
> Issue 1. can be solved by adding an ephemeral znode lock so that only one 
> core can perform the upgrade. Issue 2. can be solved by using 
> {{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and 
> deleting {{schema.xml}} only after it's done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14192) Race condition between SchemaManager and ZkIndexSchemaReader

2020-01-16 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017466#comment-17017466
 ] 

Andrzej Bialecki commented on SOLR-14192:
-

This patch seems to fix it for me, at least I wasn't able to reproduce this 
anymore.

Summary of changes:
 * use an ephemeral ZK lock when upgrading the schema to managed.
 * be more lenient when retrieving the schema - if local core claims to be 
still using {{schema.xml}} but it cannot be found in ZK then try to retrieve 
the backup left over after upgrade, ie. {{schema.xml.bak}}, and if that doesn't 
exist either then simply use the current in-memory schema.

> Race condition between SchemaManager and ZkIndexSchemaReader
> 
>
> Key: SOLR-14192
> URL: https://issues.apache.org/jira/browse/SOLR-14192
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14192.patch
>
>
> Spin-off from SOLR-14128 and SOLR-13368.
> In SolrCloud when a SolrCore is created and it uses managed schema then its 
> {{ManagedIndexSchemaFactory}} performs an automatic upgrade of the initial 
> {{schema.xml}} to {{managed-schema}}. This includes removing the original 
> {{schema.xml}} file.
> SOLR-13368 added some locking to make sure the changed resource name (i.e. 
> {{managed-schema}}) becomes visible only when this process is complete, and 
> that in-flight requests to /admin/schema block until this process is 
> complete, to avoid returning inconsistent data. This locking mechanism uses 
> simple Object monitors.
> However, if there's more than 1 node in the cluster the subsequent request to 
> retrieve schema may execute on a core that still hasn't reloaded its schema 
> ({{ZkIndexSchemaReader}} uses a ZK watcher, which may take some time to 
> trigger), and the resource name in that stale schema still points to 
> {{schema.xml}}, which by this time no longer exists because it was removed by 
> {{ManagedIndexSchemaFactory}} in the first core.
> As I see it there are two bugs here:
>  # there's no distributed locking when this upgrade is performed, so it's 
> natural that there are multiple cores racing against each other to perform 
> this upgrade.
>  # the upgrade process removes {{schema.xml}} too early - it triggers all 
> other cores by creating the {{managed-schema}} file, and then other cores 
> reload from the new managed schema - but it should wait until this reload is 
> complete on all cores because only then it's safe to delete the non-managed 
> resource as it's no longer in use by any core.
> Issue 1. can be solved by adding an ephemeral znode lock so that only one 
> core can perform the upgrade. Issue 2. can be solved by using 
> {{ManagedIndexSchema.waitForSchemaZkVersionAgreement}} after upgrade, and 
> deleting {{schema.xml}} only after it's done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9146) Switch GitHub PR test from ant precommit to gradle

2020-01-16 Thread Mike Drob (Jira)

Mike Drob created LUCENE-9146:
-

 Summary: Switch GitHub PR test from ant precommit to gradle
 Key: LUCENE-9146
 URL: https://issues.apache.org/jira/browse/LUCENE-9146
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Mike Drob






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

2020-01-16 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017518#comment-17017518
 ] 

David Smiley commented on LUCENE-9125:
--

There's an option for lucene-util to format the output for JIRA; I forget what 
it is off-hand.  What data set did you use?  (e.g. wikibigall or...?)

Looking at the results you posted, the optimization seems fairly invisible.  It 
surely would not have improved HighTermMonthSort as there's no fuzzy stuff 
there, and so that's 4.7% of "noise".

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose

2020-01-16 Thread Chris M. Hostetter (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14184:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> replace DirectUpdateHandler2.commitOnClose with (negated) 
> TestInjection.skipIndexWriterCommitOnClose
> 
>
> Key: SOLR-14184
> URL: https://issues.apache.org/jira/browse/SOLR-14184
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-14184.patch, SOLR-14184.patch
>
>
> {code:java}
> public static volatile boolean commitOnClose = true;  // TODO: make this a 
> real config option or move it to TestInjection
> {code}
> Lots of tests muck with this (to simulate unclean shutdown and force tlog 
> replay on restart) but there's no garuntee that it is reset properly.
> It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
> up by {{TestInjection.reset()}}
> 
> It's been replaced with the (negated) option 
> {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset 
> to it's default value of {{false}} by {{TestInjection.reset()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose

2020-01-16 Thread Chris M. Hostetter (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14184:
--
Fix Version/s: 8.5
   master (9.0)
  Description: 
{code:java}
public static volatile boolean commitOnClose = true;  // TODO: make this a real 
config option or move it to TestInjection
{code}

Lots of tests muck with this (to simulate unclean shutdown and force tlog 
replay on restart) but there's no garuntee that it is reset properly.

It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
up by {{TestInjection.reset()}}



It's been replaced with the (negated) option 
{{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset to 
it's default value of {{false}} by {{TestInjection.reset()}}

  was:
{code:java}
public static volatile boolean commitOnClose = true;  // TODO: make this a real 
config option or move it to TestInjection
{code}

Lots of tests muck with this (to simulate unclean shutdown and force tlog 
replay on restart) but there's no garuntee that it is reset properly.

It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
up by {{TestInjection.reset()}}

  Summary: replace DirectUpdateHandler2.commitOnClose with (negated) 
TestInjection.skipIndexWriterCommitOnClose  (was: replace 
DirectUpdateHandler2.commitOnClose with something in TestInjection)

> replace DirectUpdateHandler2.commitOnClose with (negated) 
> TestInjection.skipIndexWriterCommitOnClose
> 
>
> Key: SOLR-14184
> URL: https://issues.apache.org/jira/browse/SOLR-14184
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-14184.patch, SOLR-14184.patch
>
>
> {code:java}
> public static volatile boolean commitOnClose = true;  // TODO: make this a 
> real config option or move it to TestInjection
> {code}
> Lots of tests muck with this (to simulate unclean shutdown and force tlog 
> replay on restart) but there's no garuntee that it is reset properly.
> It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
> up by {{TestInjection.reset()}}
> 
> It's been replaced with the (negated) option 
> {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset 
> to it's default value of {{false}} by {{TestInjection.reset()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-16 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017521#comment-17017521
 ] 

Mike Drob commented on LUCENE-9134:
---

For reference, this is what I told Erick over Slack. Didn't realize the 
question was on JIRA as well.

{quote}
I think what you want to do is define a configuration that includes a 
dependency for  net.java.dev.javacc:javacc:5.0 and then you can refer to the 
classpath of that

I never got breakpoints to work with tasks, so if you figure it out please 
share!

declare the dep something like 
https://github.com/apache/lucene-solr/blob/master/gradle/validation/rat-sources.gradle#L27
 (edited) 
and then use it something like 
https://github.com/apache/lucene-solr/blob/master/gradle/validation/rat-sources.gradle#L143
 (edited) 

{quote}

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch
>
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

2020-01-16 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017635#comment-17017635
 ] 

Michael McCandless commented on LUCENE-9125:


That drop is because I tried JDK 13 for one (maybe two) runs, and it's a big 
slowdown for many queries!!

Then I switched to JDK 12 and most queries are as fast JDK 11.

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

2020-01-16 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017636#comment-17017636
 ] 

Michael McCandless commented on LUCENE-9125:


[~broustant] those QPS numbers are crazy high – which {{-source}} did you use?

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

Recently, the implementation of ANN algorithms for Lucene, such as HNSW 
(Hierarchical Navigable Small World, LUCENE-9004), has made great progress, and 
draws attention of those who are interested in Lucene and hope to use HNSW with 
Solr/Lucene. 

As another alternative for ANN similarity search problems, IVFFlat is also very 
popular with many users and supporters.  Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for graphs [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Both of them have their merits and demerits. Another advantage is that IVFFlat 
can be faster and more accurate when enables GPU parallel computing (current 
not support in Java). Since HNSW is now under development, it may be better to 
provide both algorithm implementations for potential users who have very 
different applications and scenarios.

I will soon commit my personal implementations.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query but require extra storage for graphs [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Each of them has its merits and demerits. Since HNSW is now under development, 
it may be better to provide IVFFlat for an alternative choice.

I will soon commit my personal implementations.


> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous a

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-01-16 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
LUCENE-9004) for Lucene, has made great progress. The issue draws attention of 
those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 

As an alternative for solving ANN similarity search problems, IVFFlat is also 
very popular with many users and supporters. Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for saving graphs 
[indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Another advantage is that IVFFlat can be faster and more accurate when enables 
GPU parallel computing (current not support in Java). Both algorithms have 
their merits and demerits. Since HNSW is now under development, it may be 
better to provide both implementations (HNSW && IVFFlat) for potential users 
who are faced with very different scenarios and want to more choices.

I will soon commit my personal implementations.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

Recently, the implementation of ANN algorithms for Lucene, such as HNSW 
(Hierarchical Navigable Small World, LUCENE-9004), has made great progress, and 
draws attention of those who are interested in Lucene and hope to use HNSW with 
Solr/Lucene. 

As another alternative for ANN similarity search problems, IVFFlat is also very 
popular with many users and supporters.  Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for graphs [indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Both of them have their merits and demerits. Another advantage is that IVFFlat 
can be faster and more accurate when enables GPU parallel computing (current 
not support in Java). Since HNSW is now under development, it may be better to 
provide both algorithm implementations for potential users who

[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth

2020-01-16 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689
 ] 

Cao Manh Dat commented on SOLR-12859:
-

To be honest, I'm never fully understand the current authentication framework 
of Solr. When I did the HTTP/2 things, I basically convert the current 
interceptor of Apache HttpClient to an equivalent version. After spend sometime 
to look at the current code and the documentation. I'm guessing that 
{{isSolrThread()}} is a naive/workaround way to check whether the request is 
about to send to another node was actually sent by a Solr node or not?

Let's look into this comment
{quote}
//if this is not running inside a Solr threadpool (as in testcases)
// then no need to add any header
{quote}
above comment will make sense if we notice how the interceptors was added for 
Apache HttpClient
{{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added 
into a static variable. This is ok if a JVM only host one node, but with test, 
a JVM will host several nodes, so there are several PKI interceptors will be 
added to that static variable. Moreoever every Apache HttpClient created by 
HttpClientUtil will share the same list of interceptors even with client 
created in test. So how we can discriminate a request sent from a client inside 
a node with a request sent from a client inside a test method if all client 
will use a same list of interceptors? The naive solution was setting a flag 
called {{isSolrThread}} to distinguish these two case.
In most of cases, a request sent by a node will be sent from a thread from a 
threadPool created by {{ExecutorUtil}}. So to make auth tests pass  
{{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. 

With all of these context less review the mystery code again 
{code}
SolrRequestInfo reqInfo = getRequestInfo();
String usr;
if (reqInfo != null) {
  // 1.Author's idea: Ok, the thread is holding a request, if 
authentication is enabled, the req must hold a Principal
  Principal principal = reqInfo.getUserPrincipal();
  if (principal == null) {
// 2. Author's idea: the req did not pass authentication since 
Principal is not set, do not need to do anything here!
// my comment: this is not true, SolrRequestInfo is also used as a 
garbage to put data into so many place rely on data inside SolrRequestInfo, the 
present of SolrRequestInfo does not mean that it comes from outside.
return Optional.empty();
  } else {
usr = principal.getName();
  }
} else {
  if (!isSolrThread()) {
// 3. Author's idea: so the req is not sent inside a thread created by 
ExecutorUtil, it must come from test code or outside world
// my comment: it is not true, since in 
{{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was 
used instead of a threadPool created by ExecutorUtil
return Optional.empty();
  }
  // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out 
from this node.
  usr = "$"; //special name to denote the user is the node itself
}
{code}



> DocExpirationUpdateProcessorFactory does not work with BasicAuth
> 
>
> Key: SOLR-12859
> URL: https://issues.apache.org/jira/browse/SOLR-12859
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.5
>Reporter: Varun Thacker
>Priority: Major
> Attachments: SOLR-12859.patch
>
>
> I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( 
> DocExpirationUpdateProcessorFactory ) to auto-delete documents.
>  
> Turns out it doesn't work when Basic Auth is enabled. I get the following 
> stacktrace from the logs
> {code:java}
> 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [   ] 
> o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
> deletion of expired docs: Async exception during distributed update: Error 
> from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: 
> require authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Error from server at 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require 
> authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
>     at 
> org.apache.so

[jira] [Comment Edited] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth

2020-01-16 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689
 ] 

Cao Manh Dat edited comment on SOLR-12859 at 1/17/20 4:15 AM:
--

To be honest, I'm never fully understand the current authentication framework 
of Solr. When I did the HTTP/2 things, I basically convert the current 
interceptor of Apache HttpClient to an equivalent version. After spend sometime 
to look at the current code and the documentation. I'm guessing that 
{{isSolrThread()}} is a naive/workaround way to check whether the request is 
about to send to another node was actually sent by a Solr node or not?

Let's look into this comment
{quote}
//if this is not running inside a Solr threadpool (as in testcases)
// then no need to add any header
{quote}
above comment will make sense if we notice how the interceptors was added for 
Apache HttpClient
{{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added 
into a static variable. This is ok if a JVM only host one node, but with test, 
a JVM will host several nodes, so there are several PKI interceptors will be 
added to that static variable. Moreoever every Apache HttpClient created by 
HttpClientUtil will share the same list of interceptors even with client 
created in test. So how we can discriminate a request sent from a client inside 
a node with a request sent from a client inside a test method if all client 
will use a same list of interceptors? The naive solution was setting a flag 
called {{isSolrThread}} to distinguish these two case.
In most of cases, a request sent by a node will be sent from a thread from a 
threadPool created by {{ExecutorUtil}}. So to make auth tests pass  
{{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. 

With all of these context less review the mystery code again 
{code}
SolrRequestInfo reqInfo = getRequestInfo();
String usr;
if (reqInfo != null) {
  // 1.Author's idea: Ok, the thread is holding a request, if 
authentication is enabled, the req must hold a Principal
  Principal principal = reqInfo.getUserPrincipal();
  if (principal == null) {
// 2. Author's idea: the req did not pass authentication since 
Principal is not set, do not need to do anything here!
// my comment: this is not true, SolrRequestInfo is also used as a 
garbage to put data into so many place rely on data inside SolrRequestInfo, the 
present of SolrRequestInfo does not mean that it comes from outside.
return Optional.empty();
  } else {
usr = principal.getName();
  }
} else {
  if (!isSolrThread()) {
// 3. Author's idea: so the req is not sent inside a thread created by 
ExecutorUtil, it must come from test code or outside world
// my comment: it is not true, since in 
{{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was 
used instead of a threadPool created by ExecutorUtil
return Optional.empty();
  }
  // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out 
from this node.
  usr = "$"; //special name to denote the user is the node itself
}
{code}

But with new HTTP/2 client, interceptor is added to each client object, so 
there are no single static variable here -> no sharing interceptors between 
clients of nodes and clients of test -> if the interceptor's code is called it 
must be sent from a node. So the mystery block can be changed to for the 
interceptor of HTTP/2 client
{code}
SolrRequestInfo reqInfo = getRequestInfo();
String usr = NODE_IS_USER;
if (reqInfo != null && reqInfo.getUserPrincipal() != null) 
   usr = reqInfo.getUserPrincipal().getName()
{code}


was (Author: caomanhdat):
To be honest, I'm never fully understand the current authentication framework 
of Solr. When I did the HTTP/2 things, I basically convert the current 
interceptor of Apache HttpClient to an equivalent version. After spend sometime 
to look at the current code and the documentation. I'm guessing that 
{{isSolrThread()}} is a naive/workaround way to check whether the request is 
about to send to another node was actually sent by a Solr node or not?

Let's look into this comment
{quote}
//if this is not running inside a Solr threadpool (as in testcases)
// then no need to add any header
{quote}
above comment will make sense if we notice how the interceptors was added for 
Apache HttpClient
{{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added 
into a static variable. This is ok if a JVM only host one node, but with test, 
a JVM will host several nodes, so there are several PKI interceptors will be 
added to that static variable. Moreoever every Apache HttpClient created by 
HttpClientUtil will share the same list of interceptors even with client 
created in test. So how we can discriminate

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-01-16 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017696#comment-17017696
 ] 

Julie Tibshirani commented on LUCENE-9004:
--

Hello and thank you for this very exciting work! We have been doing research 
into nearest neighbor search on high-dimensional vectors and I wanted to share 
some thoughts here in the hope that they're helpful.

Related to Adrien's comment about search filters, I am wondering how deleted 
documents would be handled. If I'm understanding correctly, a segment's deletes 
are applied 'on top of' the query. So if the k nearest neighbors to the query 
vector all happen to be deleted, then the query won't bring back any documents. 
From a user's perspective, I could see this behavior being surprising or hard 
to work with. One approach would be to keep expanding the search while skipping 
over deleted documents, but I'm not sure about the performance + accuracy it 
would give (there's a [short 
discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] 
in the hnswlib repo on this point).

The recent paper [Graph based Nearest Neighbor Search: Promises and 
Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based 
approaches and claims that the hierarchy of layers only really helps in low 
dimensions (Figure 4). In these experiments, they see that a 'flat' version of 
HNSW performs very similarly to the original above around 16 dimensions. The 
original HNSW paper also cites the hierarchy as most helpful in low dimensions. 
This seemed interesting in that it may be possible to avoid some complexity if 
the focus is not on low-dimensional vectors. (It also suggests that graph-based 
kNN is an active research area and that there are likely to be improvements + 
new approaches that come out. One such new approach is [DiskANN: Fast Accurate 
Billion-point Nearest Neighbor Search on a Single 
Node|[https://suhasjs.github.io/files/diskann_neurips19.pdf]]).

On the subject of testing recall, we are working on adding [sentence 
embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep 
image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] 
datasets to the ann-benchmarks repo. Hopefully that will help provide some 
realistic shared data to test against.

 

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the g

[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2020-01-16 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017696#comment-17017696
 ] 

Julie Tibshirani edited comment on LUCENE-9004 at 1/17/20 4:36 AM:
---

Hello and thank you for this very exciting work! We have been doing research 
into nearest neighbor search on high-dimensional vectors and I wanted to share 
some thoughts here in the hope that they're helpful.

Related to Adrien's comment about search filters, I am wondering how deleted 
documents would be handled. If I'm understanding correctly, a segment's deletes 
are applied 'on top of' the query. So if the k nearest neighbors to the query 
vector all happen to be deleted, then the query won't bring back any documents. 
From a user's perspective, I could see this behavior being surprising or hard 
to work with. One approach would be to keep expanding the search while skipping 
over deleted documents, but I'm not sure about the performance + accuracy it 
would give (there's a [short 
discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] 
in the hnswlib repo on this point).

The recent paper [Graph based Nearest Neighbor Search: Promises and 
Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based 
approaches and claims that the hierarchy of layers only really helps in low 
dimensions (Figure 4). In these experiments, they see that a 'flat' version of 
HNSW performs very similarly to the original above around 16 dimensions. The 
original HNSW paper also cites the hierarchy as most helpful in low dimensions. 
This seemed interesting in that it may be possible to avoid some complexity if 
the focus is not on low-dimensional vectors. (It also suggests that graph-based 
kNN is an active research area and that there are likely to be improvements + 
new approaches that come out. One such new approach is [DiskANN Fast Accurate 
Billion-point Nearest Neighbor Search on a Single 
Node|https://suhasjs.github.io/files/diskann_neurips19.pdf]).

On the subject of testing recall, we are working on adding [sentence 
embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep 
image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] 
datasets to the ann-benchmarks repo. Hopefully that will help provide some 
realistic shared data to test against.

 


was (Author: jtibshirani):
Hello and thank you for this very exciting work! We have been doing research 
into nearest neighbor search on high-dimensional vectors and I wanted to share 
some thoughts here in the hope that they're helpful.

Related to Adrien's comment about search filters, I am wondering how deleted 
documents would be handled. If I'm understanding correctly, a segment's deletes 
are applied 'on top of' the query. So if the k nearest neighbors to the query 
vector all happen to be deleted, then the query won't bring back any documents. 
From a user's perspective, I could see this behavior being surprising or hard 
to work with. One approach would be to keep expanding the search while skipping 
over deleted documents, but I'm not sure about the performance + accuracy it 
would give (there's a [short 
discussion|https://github.com/nmslib/hnswlib/issues/4#issuecomment-378739892] 
in the hnswlib repo on this point).

The recent paper [Graph based Nearest Neighbor Search: Promises and 
Failures|https://arxiv.org/abs/1904.02077] compares HNSW to other graph-based 
approaches and claims that the hierarchy of layers only really helps in low 
dimensions (Figure 4). In these experiments, they see that a 'flat' version of 
HNSW performs very similarly to the original above around 16 dimensions. The 
original HNSW paper also cites the hierarchy as most helpful in low dimensions. 
This seemed interesting in that it may be possible to avoid some complexity if 
the focus is not on low-dimensional vectors. (It also suggests that graph-based 
kNN is an active research area and that there are likely to be improvements + 
new approaches that come out. One such new approach is [DiskANN: Fast Accurate 
Billion-point Nearest Neighbor Search on a Single 
Node|[https://suhasjs.github.io/files/diskann_neurips19.pdf]]).

On the subject of testing recall, we are working on adding [sentence 
embedding|https://github.com/erikbern/ann-benchmarks/issues/144] and [deep 
image descriptor|https://github.com/erikbern/ann-benchmarks/issues/143] 
datasets to the ann-benchmarks repo. Hopefully that will help provide some 
realistic shared data to test against.

 

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layer

[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth

2020-01-16 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017713#comment-17017713
 ] 

Cao Manh Dat commented on SOLR-12859:
-

I attached a draft patch for fixing the problem, the idea are
* Setting isSolrThread inside {{DefaultSolrThreadFactory}}. That class is 
belonging to solr-core so it will always created by a node
* if isSolrThread == true, set usr = "$"  even incase of principal == null

> DocExpirationUpdateProcessorFactory does not work with BasicAuth
> 
>
> Key: SOLR-12859
> URL: https://issues.apache.org/jira/browse/SOLR-12859
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.5
>Reporter: Varun Thacker
>Priority: Major
> Attachments: SOLR-12859.patch
>
>
> I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( 
> DocExpirationUpdateProcessorFactory ) to auto-delete documents.
>  
> Turns out it doesn't work when Basic Auth is enabled. I get the following 
> stacktrace from the logs
> {code:java}
> 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [   ] 
> o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
> deletion of expired docs: Async exception during distributed update: Error 
> from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: 
> require authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Error from server at 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require 
> authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
>     at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0

[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth

2020-01-16 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017714#comment-17017714
 ] 

Cao Manh Dat commented on SOLR-12859:
-

Hmm, It seems that a better approach will be let's the test explicitly let the 
test set its thread as {{isSolrTestThread}}. 

> DocExpirationUpdateProcessorFactory does not work with BasicAuth
> 
>
> Key: SOLR-12859
> URL: https://issues.apache.org/jira/browse/SOLR-12859
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.5
>Reporter: Varun Thacker
>Priority: Major
> Attachments: SOLR-12859.patch
>
>
> I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( 
> DocExpirationUpdateProcessorFactory ) to auto-delete documents.
>  
> Turns out it doesn't work when Basic Auth is enabled. I get the following 
> stacktrace from the logs
> {code:java}
> 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [   ] 
> o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
> deletion of expired docs: Async exception during distributed update: Error 
> from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: 
> require authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  Async exception during distributed update: Error from server at 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require 
> authentication
> request: 
> http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2
>     at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80)
>  ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - 
> jimczi - 2018-09-18 13:07:55]
>     at 
> org.apache.solr.update.processor.UpdateRequestProcessor.f

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2020-01-16 Thread Radar Da Lei (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017751#comment-17017751
 ] 

Radar Da Lei commented on SOLR-13240:
-

[~cpoerschke]

Thanks for fixing this issue, we hit similar issue on Solr 7.4.0, do we have 
the plan to apply this fix to Solr 7.x? Thanks.

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Assignee: Christine Poerschke
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(Cha

84 matches

Mail list logo